This document provides an overview of indexes in MongoDB, including index types and creation options. For operational guidelines and procedures, see the Indexing Operations document. For strategies and practical approaches, see the Indexing Strategies document.
An index is a data structure that allows you to quickly locate documents based on the values stored in certain specified fields. Fundamentally, indexes in MongoDB are similar to indexes in other database systems. MongoDB supports indexes on any field or sub-field contained in documents within a MongoDB collection.
MongoDB indexes have the following core features:
MongoDB 按照集合 层定义索引。
You can create indexes on a single field or on multiple fields using a compound index.
Indexes enhance query performance, often dramatically. However, each index also incurs some overhead for every write operation. Consider the queries, the frequency of these queries, the size of your working set, the insert load, and your application’s requirements as you create indexes in your MongoDB environment.
All MongoDB indexes use a B-tree data structure. MongoDB can use this representation of the data to optimize query responses.
Every query, including update operations, uses one and only one index. The query optimizer selects the index empirically by occasionally running alternate query plans and by selecting the plan with the best response time for each query type. You can override the query optimizer using the cursor.hint() method.
An index “covers” a query if:
When an index covers a query, the server can both match the query conditions and return the results using only the index; MongoDB does not need to look at the documents, only the index, to fulfill the query. Querying the index can be faster than querying the documents outside of the index.
See Create Indexes that Support Covered Queries for more information.
Using queries with good index coverage reduces the number of full documents that MongoDB needs to store in memory, thus maximizing database performance and throughput.
If an update does not change the size of a document or cause the document to outgrow its allocated area, then MongoDB will update an index only if the indexed fields have changed. This improves performance. Note that if the document has grown and must move, all index keys must then update.
This section enumerates the types of indexes available in MongoDB. For all collections, MongoDB creates the default _id index. You can create additional indexes with the ensureIndex() method on any single field or sequence of fields within any document or sub-document. MongoDB also supports indexes of arrays, called multi-key indexes.
The _id index is a unique index [1] on the _id field, and MongoDB creates this index by default on all collections. [2] You cannot delete the index on _id.
The _id field is the primary key for the collection, and every document must have a unique _id field. You may store any unique value in the _id field. The default value of _id is an ObjectID on every insert() operation. An ObjectId is a 12-byte unique identifiers suitable for use as the value of an _id field.
注意
In sharded clusters, if you do not use the _id field as the shard key, then your application must ensure the uniqueness of the values in the _id field to prevent errors. This is most-often done by using a standard auto-generated ObjectId.
| [1] | Although the index on _id is unique, the getIndexes() method will not print unique: true in the mongo shell. |
| [2] | Before version 2.2 capped collections did not have an _id field. In 2.2, all capped collections have an _id field, except those in the local database. See the release notes for more information. |
All indexes in MongoDB are secondary indexes. You can create indexes on any field within any document or sub-document. Additionally, you can create compound indexes with multiple fields, so that a single query can match multiple components using the index while scanning fewer whole documents.
In general, you should create indexes that support your primary, common, and user-facing queries. Doing so requires MongoDB to scan the fewest number of documents possible.
In the mongo shell, you can create an index by calling the ensureIndex() method. Arguments to ensureIndex() resemble the following:
{ "field": 1 }
{ "product.quantity": 1 }
{ "product": 1, "quantity": 1 }
For each field in the index specify either 1 for an ascending order or -1 for a descending order, which represents the order of the keys in the index. For indexes with more than one key (i.e. compound indexes) the sequence of fields is important.
You can create indexes on fields that hold sub-documents as in the following example:
示例
Given the following document in the factories collection:
{ "_id": ObjectId(...), metro: { city: "New York", state: "NY" } } )
You can create an index on the metro key. The following queries would then use that index, and both would return the above document:
db.factories.find( { metro: { city: "New York", state: "NY" } } );
db.factories.find( { metro: { $gte : { city: "New York" } } } );
The second query returns the document because { city: "New York" } is less than { city: "New York", state: "NY" } The order of comparison is in ascending key order in the order the keys occur in the BSON document.
You can create indexes on fields in sub-documents, just as you can index top-level fields in documents. [3] These indexes allow you to use a “dot notation,” to introspect into sub-documents.
Consider a collection named people that holds documents that resemble the following example document:
{"_id": ObjectId(...)
"name": "John Doe"
"address": {
"street": "Main"
"zipcode": 53511
"state": "WI"
}
}
您可以使用下面的规范为 address.zipcode 字段创建索引:
db.people.ensureIndex( { "address.zipcode": 1 } )
| [3] | Indexes on Sub-documents, by contrast allow you to index fields that hold documents, including the full content, up to the maximum Index Size of the sub-document in the index. |
MongoDB supports “compound indexes,” where a single index structure holds references to multiple fields within a collection’s documents. Consider a collection named products that holds documents that resemble the following document:
{
"_id": ObjectId(...)
"item": "Banana"
"category": ["food", "produce", "grocery"]
"location": "4th Street Store"
"stock": 4
"type": cases
"arrival": Date(...)
}
If most applications queries include the item field and a significant number of queries will also check the stock field, you can specify a single compound index to support both of these queries:
db.products.ensureIndex( { "item": 1, "location": 1, "stock": 1 } )
Compound indexes support queries on any prefix of the fields in the index. [4] For example, MongoDB can use the above index to support queries that select the item field and to support queries that select the item field and the location field. The index, however, would not support queries that select the following:
Important
You may not create compound indexes that have hashed index fields. You will receive an error if you attempt to create a compound index that includes a hashed index.
When creating an index, the number associated with a key specifies the direction of the index. The options are 1 (ascending) and -1 (descending). Direction doesn’t matter for single key indexes or for random access retrieval but is important if you are doing sort queries on compound indexes.
The order of fields in a compound index is very important. In the previous example, the index will contain references to documents sorted first by the values of the item field and, within each value of the item field, sorted by the values of location, and then sorted by values of the stock field.
| [4] | Index prefixes are the beginning subset of fields. For example, given the index { a: 1, b: 1, c: 1 } both { a: 1 } and { a: 1, b: 1 } are prefixes of the index. |
Indexes store references to fields in either ascending or descending order. For single-field indexes, the order of keys doesn’t matter, because MongoDB can traverse the index in either direction. However, for compound indexes, if you need to order results against two fields, sometimes you need the index fields running in opposite order relative to each other.
要指定降序索引,请使用下面的格式:
db.products.ensureIndex( { "field": -1 } )
More typically in the context of a compound index, the specification would resemble the following prototype:
db.products.ensureIndex( { "fieldA": 1, "fieldB": -1 } )
请考虑包含用户名和时间戳的事件数据集合。它适用于您希望返回的事件列表按用户名排序,且最新的事件先列出。要创建这种索引,请使用下面的命令:
db.events.ensureIndex( { "username" : 1, "timestamp" : -1 } )
If you index a field that contains an array, MongoDB indexes each value in the array separately, in a “multikey index.”
示例
Given the following document:
{ "_id" : ObjectId("..."),
"name" : "Warm Weather",
"author" : "Steve",
"tags" : [ "weather", "hot", "record", "april" ] }
Then an index on the tags field would be a multikey index and would include these separate entries:
{ tags: "weather" }
{ tags: "hot" }
{ tags: "record" }
{ tags: "april" }
Queries could use the multikey index to return queries for any of the above values.
注意
For hashed indexes, MongoDB collapses sub-documents and computes the hash for the entire value, but does not support multi-key (i.e. arrays) indexes. For fields that hold sub-documents, you cannot use the index to support queries that introspect the sub-document.
You can use multikey indexes to index fields within objects embedded in arrays, as in the following example:
示例
Consider a feedback collection with documents in the following form:
{
"_id": ObjectId(...)
"title": "Grocery Quality"
"comments": [
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the cheddar selection." },
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the mustard selection." },
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the olive selection." }
]
}
An index on the comments.text field would be a multikey index and would add items to the index for all of the sub-documents in the array.
With an index, such as { comments.text: 1 }, consider the following query:
db.feedback.find( { "comments.text": "Please expand the selection." } )
This would select the document, that contains the following document in the comments.text array:
{ author_id: ObjectId(...)
date: Date(...)
text: "Please expand the olive selection." }
Compound Multikey Indexes May Only Include One Array Field
While you can create multikey compound indexes, at most one field in a compound index may hold an array. For example, given an index on { a: 1, b: 1 }, the following documents are permissible:
{a: [1, 2], b: 1}
{a: 1, b: [1, 2]}
However, the following document is impermissible, and MongoDB cannot insert such a document into a collection with the {a: 1, b: 1 } index:
{a: [1, 2], b: [1, 2]}
If you attempt to insert a such a document, MongoDB will reject the insertion, and produce an error that says cannot index parallel arrays. MongoDB does not index parallel arrays because they require the index to include each value in the Cartesian product of the compound keys, which could quickly result in incredibly large and difficult to maintain indexes.
A unique index causes MongoDB to reject all documents that contain a duplicate value for the indexed field. To create a unique index on the user_id field of the members collection, use the following operation in the mongo shell:
db.addresses.ensureIndex( { "user_id": 1 }, { unique: true } )
By default, unique is false on MongoDB indexes.
If you use the unique constraint on a compound index then MongoDB will enforce uniqueness on the combination of values, rather than the individual value for any or all values of the key.
If a document does not have a value for the indexed field in a unique index, the index will store a null value for this document. MongoDB will only permit one document without a unique value in the collection because of this unique constraint. You can combine with the sparse index to filter these null values from the unique index.
You may not specify a unique constraint on a hashed index.
Sparse indexes only contain entries for documents that have the indexed field. [5] Any document that is missing the field is not indexed. The index is “sparse” because of the missing documents when values are missing.
By contrast, non-sparse indexes contain all documents in a collection, and store null values for documents that do not contain the indexed field. Create a sparse index on the xmpp_id field, of the members collection, using the following operation in the mongo shell:
db.addresses.ensureIndex( { "xmpp_id": 1 }, { sparse: true } )
By default, sparse is false on MongoDB indexes.
警告
使用这些索引有时将会造成结果筛选或排序中的结果不完整,因为稀疏索引不是集合中所有文档的完整索引。
注意
Do not confuse sparse indexes in MongoDB with block-level indexes in other databases. Think of them as dense indexes with a specific filter.
You can combine the sparse index option with the unique indexes option so that mongod will reject documents that have duplicate values for a field, but that ignore documents that do not have the key.
| [5] | All documents that have the indexed field are indexed in a sparse index, even if that field stores a null value in some documents. |
New in version 2.4.
Hashed indexes maintain entries with hashes of the values of the indexed field. The hashing function collapses sub-documents and computes the hash for the entire value but does not support multi-key (i.e. arrays) indexes.
MongoDB can use the hashed index to support equality queries, but hashed indexes do not support range queries.
You may not create compound indexes that have hashed index fields or specify a unique constraint on a hashed index; however, you can create both a hashed index and an ascending/descending (i.e. non-hashed) index on the same field: MongoDB will use the scalar index for range queries.
警告
hashed indexes truncate floating point numbers to 64-bit integers before hashing. For example, a hashed index would store the same value for a field that held a value of 2.3, 2.2 and 2.9. To prevent collisions, do not use a hashed index for floating point numbers that cannot be consistently converted to 64-bit integers (and then back to floating point.) hashed indexes do not support floating point values larger than 253.
Create a hashed index using an operation that resembles the following:
db.active.ensureIndex( { a: "hashed" } )
This operation creates a hashed index for the active collection on the a field.
| [6] | The hash stored in the hashed index is 64 bits of the 128 bit md5 hash. |
The default name for an index is the concatenation of the indexed keys and each key’s direction in the index (1 or -1).
示例
Issue the following command to create an index on item and quantity:
db.products.ensureIndex( { item: 1, quantity: -1 } )
The resulting index is named: item_1_quantity_-1.
Optionally, you can specify a name for an index instead of using the default name.
示例
Issue the following command to create an index on item and quantity and specify inventory as the index name:
db.products.ensureIndex( { item: 1, quantity: -1 } , {name: "inventory"} )
The resulting index is named: inventory.
To view the name of an index, use the getIndexes() method.
You specify index creation options in the second argument in ensureIndex().
The options sparse, unique, and TTL affect the kind of index that MongoDB creates. This section addresses, background construction and duplicate dropping, which affect how MongoDB builds the indexes.
By default, creating an index is a blocking operation. Building an index on a large collection of data can take a long time to complete. To resolve this issue, the background option can allow you to continue to use your mongod instance during the index build.
For example, to create an index in the background of the zipcode field of the people collection you would issue the following:
db.people.ensureIndex( { zipcode: 1}, {background: true} )
By default, background is false for building MongoDB indexes.
您可以结合背景选项与其他选项:方法如下:
db.people.ensureIndex( { zipcode: 1}, {background: true, sparse: true } )
在索引构建背景下请注意下面的行为:
A mongod instance can build more than one index in the background concurrently.
Changed in version 2.4: Before 2.4, a mongod instance could only build one background index per database at a time.
Changed in version 2.2: Before 2.2, a single mongod instance could only build one index at a time.
The indexing operation runs in the background so that other database operations can run while creating the index. However, the mongo shell session or connection where you are creating the index will block until the index build is complete. Open another connection or mongo instance to continue using commands to the database.
The background index operation use an incremental approach that is slower than the normal “foreground” index builds. If the index is larger than the available RAM, then the incremental process can take much longer than the foreground build.
If your application includes ensureIndex() operations, and an index doesn’t exist for other operational concerns, building the index can have a severe impact on the performance of the database.
Make sure that your application checks for the indexes at start up using the getIndexes() method or the equivalent method for your driver and terminates if the proper indexes do not exist. Always build indexes in production instances using separate application code, during designated maintenance windows.
在次节点上构建索引
Background index operations on a replica set primary become foreground indexing operations on secondary members of the set. All indexing operations on secondaries block replication.
To build large indexes on secondaries the best approach is to restart one secondary at a time in standalone mode and build the index. After building the index, restart as a member of the replica set, allow it to catch up with the other members of the set, and then build the index on the next secondary. When all the secondaries have the new index, step down the primary, restart it as a standalone, and build the index on the former primary.
Remember, the amount of time required to build the index on a secondary node must be within the window of the oplog, so that the secondary can catch up with the primary.
See Build Indexes on Replica Sets for more information on this process.
在“恢复”模式下,次成员上的索引总是在前景中构建,以便于尽快同步。
See Build Indexes on Replica Sets for a complete procedure for rebuilding indexes on secondaries.
注意
If MongoDB is building an index in the background, you cannot perform other administrative operations involving that collection, including repairDatabase, drop that collection (i.e. db.collection.drop(),) and compact. These operations will return an error during background index builds.
在索引构建完成之前,查询不会使用这些索引。
MongoDB cannot create a unique index on a field that has duplicate values. To force the creation of a unique index, you can specify the dropDups option, which will only index the first occurrence of a value for the key, and delete all subsequent values.
警告
在所有唯一索引中,如果文档不包含索引字段,MongoDB 将会把它归入“空”值索引。
If subsequent fields do not have the indexed field, and you have set {dropDups: true}, MongoDB will remove these documents from the collection when creating the index. If you combine dropDups with the sparse option, this index will only include documents in the index that have the value, and the documents without the field will remain in the database.
要创建删除了 accounts 集合的 username 字段重复值的 唯一索引,请使用以下格式的命令:
db.accounts.ensureIndex( { username: 1 }, { unique: true, dropDups: true } )
警告
Specifying { dropDups: true } will delete data from your database. Use with extreme caution.
By default, dropDups is false.
TTL 索引是一类特殊索引,MongoDB 可使用这些索引在特定时间后自动从集合中删除文档。这极适合某些类型的信息,例如只需要在数据库中存在有限时间的机器生成事件数据、日志和会话信息。
这些索引存在以下限制:
注意
TTL indexes expire data by removing documents in a background task that runs every 60 seconds. As a result, the TTL index provides no guarantees that expired documents will not exist in the collection. Consider that:
In all other respects, TTL indexes are normal indexes, and if appropriate, MongoDB can use these indexes to fulfill arbitrary queries.
MongoDB 提供“地理空间索引”以支持基于位置的查询和二维坐标系中的其他相似查询。例如,需要收集包含坐标的文档时使用地理空间索引,可返回若干包含“近似”指定坐标对的选项。
To create a geospatial index, your documents must have a coordinate pair. For maximum compatibility, these coordinate pairs should be in the form of a two element array, such as [ x , y ]. Given the field of loc, that held a coordinate pair, in the collection places, you would create a geospatial index as follows:
db.places.ensureIndex( { loc : "2d" } )
MongoDB 将会拒绝 loc 字段中的值超出最小值和最大值的文档。
注意
MongoDB 仅允许每个集合中存在一个地理空间索引。虽然 MongoDB 允许客户端创建多个地理空间索引,但单个查询只能使用一个索引。
See the $near, and the database command geoNear for more information on accessing geospatial data.
In addition to conventional geospatial indexes, MongoDB also provides a bucket-based geospatial index, called “geospatial haystack indexes.” These indexes support high performance queries for locations within a small area, when the query must filter along another dimension.
示例
If you need to return all documents that have coordinates within 25 miles of a given point and have a type field value of “museum,” a haystack index would be provide the best support for these queries.
Haystack 索引允许您根据数据分布调整组大小,因此通常您只需搜索极小的二维空间区域以便快速找到文档。在最接近的文档与组大小相比过远时,这些索引不适合距离特定位置最近的文档。
New in version 2.4.
MongoDB provides text indexes to support the search of string content in documents of a collection. text indexes are case-insensitive and can include any field that contains string data. text indexes drop language-specific stop words (e.g. in English, “the,” “an,” “a,” “and,” etc.) and uses simple language-specific suffix stemming. See Text Search Languages for the supported languages.
You can only access the text index with the text command.
See Text Search for more information.
每个集合中不能超过 64 个索引。
索引键的长度不能超过 1024 字节。
Documents with fields that have values greater than this size cannot be indexed.
To query for documents that were too large to index, you can use a command similar to the following:
db.records.find({<key>: <value too large to index>}).hint({$natural: 1})
The name of an index, including the namespace must be shorter than 128 characters.
索引有存储要求,因此会在一定程度上影响插入/更新速度。
创建索引以支持查询和其他操作,但是不要保留您的 MongoDB 实例无法使用或不使用的索引。
For queries with the $or operator, each clause of an $or query executes in parallel, and can each use a different index.
For queries that use the sort() method and use the $or operator, the query cannot use the indexes on the $or fields.
2d geospatial queries do not support queries that use the $or operator.
If your application is write-heavy, then be careful when creating new indexes, since each additional index with impose a write-performance penalty. In general, don’t be careless about adding indexes. Add indexes to complement your queries. Always have a good reason for adding a new index, and be sure to benchmark alternative strategies.
MongoDB must update all indexes associated with a collection after every insert, update, or delete operation. For update operations, if the updated document does not move to a new location, then MongoDB only modifies the updated fields in the index. Therefore, every index on a collection adds some amount of overhead to these write operations. In almost every case, the performance gains that indexes realize for read operations are worth the insertion penalty. However, in some cases: