Write Operations

All operations that create or modify data in the MongoDB instance are write operations. MongoDB represents data as BSON documents stored in collections. Write operations target one collection and are atomic on the level of a single document: no single write operation can atomically affect more than one document or more than one collection.

This document introduces the write operators available in MongoDB as well as presents strategies to increase the efficiency of writes in applications.

Write Operators

For information on write operators and how to write data to a MongoDB database, see the following pages:

For information on specific methods used to perform write operations in the mongo shell, see the following:

For information on how to perform write operations from within an application, see the MongoDB Drivers and Client Libraries documentation or the documentation for your client library.

Write Concern

Write concern is a quality of every write operation issued to a MongoDB deployment, and describes the amount of concern the application has for the outcome of the write operation. With weak or disabled write concern, the application can send an write operation to MongoDB and then continue without waiting for a response from the database. With stronger write concerns, write operations wait until MongoDB acknowledges or confirms a successful write operation. MongoDB provides different levels of write concern to better address the specific needs of applications.

注意

The driver write concern change created a new connection class in all of the MongoDB drivers, called MongoClient with a different default write concern. See the release notes for this change, and the release notes for the driver you’re using for more information about your driver’s release.

Bulk Inserts

In some situations you may need to insert or ingest a large amount of data into a MongoDB database. These bulk inserts have some special considerations that are different from other write operations.

The insert() method, when passed an array of documents, will perform a bulk insert, and inserts each document atomically. Drivers provide their own interface for this kind of operation.

New in version 2.2: insert() in the mongo shell gained support for bulk inserts in version 2.2.

Bulk insert can significantly increase performance by amortizing write concern costs. In the drivers, you can configure write concern for batches rather than on a per-document level.

Drivers also have a ContinueOnError option in their insert operation, so that the bulk operation will continue to insert remaining documents in a batch even if an insert fails.

注意

New in version 2.0: Support for ContinueOnError depends on version 2.0 of the core mongod and mongos components.

If the bulk insert process generates more than one error in a batch job, the client will only receive the most recent error. All bulk operations to a sharded collection run with ContinueOnError, which applications cannot disable. See Strategies for Bulk Inserts in Sharded Clusters section for more information on consideration for bulk inserts in sharded clusters.

For more information see your driver documentation for details on performing bulk inserts in your application. Also consider the following resources: Sharded Clusters, Strategies for Bulk Inserts in Sharded Clusters, and Import and Export MongoDB Data.

Indexing

After every insert, update, or delete operation, MongoDB must update every index associated with the collection in addition to the data itself. Therefore, every index on a collection adds some amount of overhead for the performance of write operations. [1]

In general, the performance gains that indexes provide for read operations are worth the insertion penalty; however, when optimizing write performance, be careful when creating new indexes and always evaluate the indexes on the collection and ensure that your queries are actually using these indexes.

For more information on indexes in MongoDB consider Indexes and Indexing Strategies.

[1]The overhead for sparse indexes inserts and updates to un-indexed fields is less than for non-sparse indexes. Also for non-sparse indexes, updates that don’t change the record size have less indexing overhead.

Isolation

When a single write operation modifies multiple documents, the operation as a whole is not atomic, and other operations may interleave. The modification of a single document, or record, is always atomic, even if the write operation modifies multiple sub-document within the single record.

No other operations are atomic; however, you can attempt to isolate a write operation that affects multiple documents using the isolation operator.

To isolate a sequence of write operations from other read and write operations, see Perform Two Phase Commits.

Updates

Each document in a MongoDB collection has allocated record space which includes the entire document and a small amount of padding. This padding makes it possible for update operations to increase the size of a document slightly without causing the document to outgrow the allocated record size.

Documents in MongoDB can grow up to the full maximum BSON document size. However, when documents outgrow their allocated record size MongoDB must allocate a new record and move the document to the new record. Update operations that do not cause a document to grow, (i.e. in-place updates,) are significantly more efficient than those updates that cause document growth. Use data models that minimize the need for document growth when possible.

For complete examples of update operations, see Update.

Padding Factor

If an update operation does not cause the document to increase in size, MongoDB can apply the update in-place. Some updates change the size of the document, for example using the $push operator to append a sub-document to an array can cause the top level document to grow beyond its allocated space.

When documents grow, MongoDB relocates the document on disk with enough contiguous space to hold the document. These relocations take longer than in-place updates, particularly if the collection has indexes that MongoDB must update all index entries. If collection has many indexes, the move will impact write throughput.

To minimize document movements, MongoDB employs padding. MongoDB adaptively learns if documents in a collection tend to grow, and if they do, adds a paddingFactor so that the documents have room to grow on subsequent writes. The paddingFactor indicates the padding for new inserts and moves.

New in version 2.2: You can use the collMod command with the usePowerOf2Sizes flag so that MongoDB allocates document space in sizes that are powers of 2. This helps ensure that MongoDB can efficiently reuse the space freed as a result of deletions or document relocations. As with all padding, using document space allocations with power of 2 sizes minimizes, but does not eliminate, document movements.

To check the current paddingFactor on a collection, you can run the db.collection.stats() operation in the mongo shell, as in the following example:

db.myCollection.stats()

Since MongoDB writes each document at a different point in time, the padding for each document will not be the same. You can calculate the padding size by subtracting 1 from the paddingFactor, for example:

padding size = (paddingFactor - 1) * <document size>.

For example, a paddingFactor of 1.0 specifies no padding whereas a paddingFactor of 1.5 specifies a padding size of 0.5 or 50 percent (50%) of the document size.

Because the paddingFactor is relative to the size of each document, you cannot calculate the exact amount of padding for a collection based on the average document size and padding factor.

If an update operation causes the document to decrease in size, for instance if you perform an $unset or a $pop update, the document remains in place and effectively has more padding. If the document remains this size, the space is not reclaimed until you perform a compact or a repairDatabase operation.

注意

The following operations remove padding:

However, with the compact command, you can run the command with a paddingFactor or a paddingBytes parameter.

Padding is also removed if you use mongoexport from a collection. If you use mongoimport into a new collection, mongoimport will not add padding. If you use mongoimport with an existing collection with padding, mongoimport will not affect the existing padding.

When a database operation removes padding, subsequent update that require changes in record sizes will have reduced throughput until the collection’s padding factor grows. Padding does not affect in-place, and after compact, repairDatabase, and replica set initial sync the collection will require less storage.

Architecture

Replica Sets

In replica sets, all write operations go to the set’s primary, which applies the write operation then records the operations on the primary’s operation log or oplog. The oplog is a reproducible sequence of operations to the data set. Secondary members of the set are continuously replicating the oplog and applying the operations to themselves in an asynchronous process.

Large volumes of write operations, particularly bulk operations, may create situations where the secondary members have difficulty applying the replicating operations from the primary at a sufficient rate: this can cause the secondary’s state to fall behind that of the primary. Secondaries that are significantly behind the primary present problems for normal operation of the replica set, particularly failover in the form of rollbacks as well as general read consistency.

To help avoid this issue, you can customize the write concern to return confirmation of the write operation to another member [2] of the replica set every 100 or 1,000 operations. This provides an opportunity for secondaries to catch up with the primary. Write concern can slow the overall progress of write operations but ensure that the secondaries can maintain a largely current state with respect to the primary.

For more information on replica sets and write operations, see Replica Acknowledged, Oplog, Oplog Internals, and Change the Size of the Oplog.

[2]Calling getLastError intermittently with a w value of 2 or majority will slow the throughput of write traffic; however, this practice will allow the secondaries to remain current with the state of the primary.

Sharded Clusters

In a sharded cluster, MongoDB directs a given write operation to a shard and then performs the write on a particular chunk on that shard. Shards and chunks are range-based. Shard keys affect how MongoDB distributes documents among shards. Choosing the correct shard key can have a great impact on the performance, capability, and functioning of your database and cluster.

For more information, see Sharded Cluster Administration and Bulk Inserts.