监控是所有数据库管理的一个重要部分。牢牢掌握 MongoDB 的报告可让您评估数据库状态,维持部署免于发生危机。此外,MongoDB 正常运行参数的判断将可让您在遇到问题时作出诊断,而不是等到发生危机。
This document provides an overview of the available tools and data provided by MongoDB as well as an introduction to diagnostic strategies, and suggestions for monitoring instances in MongoDB’s replica sets and sharded clusters.
注意
10gen provides a hosted monitoring service which collects and aggregates these data to provide insight into the performance and operation of MongoDB deployments. See the MongoDB Monitoring Service (MMS) and the MMS documentation for more information.
There are two primary methods for collecting data regarding the state of a running MongoDB instance. First, there are a set of tools distributed with MongoDB that provide real-time reporting of activity on the database. Second, several database commands return statistics regarding the current database state with greater fidelity. Both methods allow you to collect data that answers a different set of questions, and are useful in different contexts.
该部分提供了这些实用工具和统计的概览,以及各方法最适合帮助您解决的问题类型的示例。
MongoDB 发布版包括多个实用工具,可快速返回有关实例性能和活动的统计。这对于诊断问题和评估正常运行状态通常最为有用。
mongotop tracks and reports the current read and write activity of a MongoDB instance. mongotop provides per-collection visibility into use. Use mongotop to verify that activity and use match expectations. See the mongotop manual for details.
mongostat captures and returns counters of database operations. mongostat reports operations on a per-type (e.g. insert, query, update, delete, etc.) basis. This format makes it easy to understand the distribution of load on the server. Use mongostat to understand the distribution of operation types and to inform capacity planning. See the mongostat manual for details.
MongoDB provides a REST interface that exposes a diagnostic and monitoring information in a simple web page. Enable this by setting rest to true, and access this page via the local host interface using the port numbered 1000 more than that the database port. In default configurations the REST interface is accessible on 28017. For example, to access the REST interface on a locally running mongod instance: http://localhost:28017
MongoDB provides a number of commands that return statistics about the state of the MongoDB instance. These data may provide finer granularity regarding the state of the MongoDB instance than the tools above. Consider using their output in scripts and programs to develop custom alerts, or to modify the behavior of your application in response to the activity of your instance.
Access serverStatus data by way of the serverStatus command. This document contains a general overview of the state of the database, including disk usage, memory use, connection, journaling, index accesses. The command returns quickly and does not impact MongoDB performance.
While this output contains a (nearly) complete account of the state of a MongoDB instance, in most cases you will not run this command directly. Nevertheless, all administrators should be familiar with the data provided by serverStatus.
另见
View the replSetGetStatus data with the replSetGetStatus command (rs.status() from the shell). The document returned by this command reflects the state and configuration of the replica set. Use this data to ensure that replication is properly configured, and to check the connections between the current host and the members of the replica set.
The dbStats data is accessible by way of the dbStats command (db.stats() from the shell). This command returns a document that contains data that reflects the amount of storage used and data contained in the database, as well as object, collection, and index counters. Use this data to check and track the state and storage of a specific database. This output also allows you to compare utilization between databases and to determine average document size in a database.
The collStats data is accessible using the collStats command (db.printCollectionStats() from the shell). It provides statistics that resemble dbStats on the collection level: this includes a count of the objects in the collection, the size of the collection, the amount of disk space used by the collection, and information about the indexes.
In addition to status reporting, MongoDB provides a number of introspection tools that you can use to diagnose and analyze performance and operational conditions. Consider the following documentation:
A number of third party monitoring tools have support for MongoDB, either directly, or through their own plugins.
These are monitoring tools that you must install, configure and maintain on your own servers, usually open source.
| Tool | Plugin | Description |
|---|---|---|
| Ganglia | mongodb-ganglia | Python script to report operations per second, memory usage, btree statistics, master/slave status and current connections. |
| Ganglia | gmond_python_modules | Parses output from the serverStatus and replSetGetStatus commands. |
| Motop | None | Realtime monitoring tool for several MongoDB servers. Shows current operations ordered by durations every second. |
| mtop | None | A top like tool. |
| Munin | mongo-munin | Retrieves server statistics. |
| Munin | mongomon | Retrieves collection statistics (sizes, index sizes, and each (configured) collection count for one DB). |
| Munin | munin-plugins Ubuntu PPA | Some additional munin plugins not in the main distribution. |
| Nagios | nagios-plugin-mongodb | A simple Nagios check script, written in Python. |
| Zabbix | mikoomi-mongodb | Monitors availability, resource utilization, health, performance and other important metrics. |
Also consider dex, an index and query analyzing tool for MongoDB that compares MongoDB log files and indexes to make indexing recommendations.
These are monitoring tools provided as a hosted service, usually on a subscription billing basis.
| Name | 备注 |
|---|---|
| Scout | Several plugins including: MongoDB Monitoring, MongoDB Slow Queries and MongoDB Replica Set Monitoring. |
| Server Density | Dashboard for MongoDB, MongoDB specific alerts, replication failover timeline and iPhone, iPad and Android mobile apps. |
During normal operation, mongod and mongos instances report information that reflect current operation to standard output, or a log file. The following runtime settings control these options.
quiet. Limits the amount of information written to the log or output.
verbose. Increases the amount of information written to the log or output.
You can also specify this as v (as in -v.) Set multiple v, as in vvvv = True for higher levels of verbosity. You can also change the verbosity of a running mongod or mongos instance with the setParameter command.
logpath. Enables logging to a file, rather than standard output. Specify the full path to the log file to this setting.
logappend. Adds information to a log file instead of overwriting the file.
Additionally, the following database commands affect logging:
Degraded performance in MongoDB can be the result of an array of causes, and is typically a function of the relationship among the quantity of data stored in the database, the amount of system RAM, the number of connections to the database, and the amount of time the database spends in a lock state.
In some cases performance issues may be transient and related to traffic load, data access patterns, or the availability of hardware on the host system for virtualized environments. Some users also experience performance limitations as a result of inadequate or inappropriate indexing strategies, or as a consequence of poor schema design patterns. In other situations, performance issues may indicate that the database may be operating at capacity and that it is time to add additional capacity to the database.
MongoDB uses a locking system to ensure consistency. However, if certain operations are long-running, or a queue forms, performance slows as requests and operations wait for the lock. Because lock related slow downs can be intermittent, look to the data in the globalLock section of the serverStatus response to assess if the lock has been a challenge to your performance. If globalLock.currentQueue.total is consistently high, then there is a chance that a large number of requests are waiting for a lock. This indicates a possible concurrency issue that might affect performance.
If globalLock.totalTime is high in context of uptime then the database has existed in a lock state for a significant amount of time. If globalLock.ratio is also high, MongoDB has likely been processing a large number of long running queries. Long queries are often the result of a number of factors: ineffective use of indexes, non-optimal schema design, poor query structure, system architecture issues, or insufficient RAM resulting in page faults and disk reads.
由于 MongoDB 使用了存储器映射文件来存储数据,考虑到大小足够的数据集,MongoDB 进程将在系统上分配所有可用存储供其使用。由于操作系统的运作方式,所分配的 RAM 量并不是 MongoDB 状态的有用映像。
While this is part of the design, and affords MongoDB superior performance, the memory mapped files make it difficult to determine if the amount of RAM is sufficient for the data set. Consider memory usage statuses to better understand MongoDB’s memory utilization. Check the resident memory use (i.e. mem.resident:) if this exceeds the amount of system memory and there’s a significant amount of data on disk that isn’t in RAM, you may have exceeded the capacity of your system.
Also check the amount of mapped memory (i.e. mem.mapped.) If this value is greater than the amount of system memory, some operations will require disk access page faults to read data from virtual memory with deleterious effects on performance.
Page faults represent the number of times that MongoDB requires data not located in physical memory, and must read from virtual memory. To check for page faults, see the extra_info.page_faults value in the serverStatus command. This data is only available on Linux systems.
单独出现时,页面故障较为轻微,会很快结束;但是,如果出现在聚合情况中,大量页面故障通常表示 MongoDB 正在从磁盘读取过多数据,而且可能表示很多潜在的原因和建议。在很多情况下,MongoDB 的读取锁定将在页面故障后作出“让步”,以让其他进程进行读取,避免在等待下一个页面读取存储器时发生障碍。该方法可改进并发状况,在大容量系统中,还能改进整体吞吐量。
If possible, increasing the amount of RAM accessible to MongoDB may help reduce the number of page faults. If this is not possible, you may want to consider deploying a sharded cluster and/or adding one or more shards to your deployment to distribute load among mongod instances.
In some cases, the number of connections between the application layer (i.e. clients) and the database can overwhelm the ability of the server to handle requests which can produce performance irregularities. Check the following fields in the serverStatus document:
注意
除非受到系统范围的限制,否则 MongoDB 具有 2 万连接的硬连接限制。您可以使用 ulimit 命令,或通过编辑系统的 /etc/sysctl 文件,修改系统限制。
If requests are high because there are many concurrent application requests, the database may have trouble keeping up with demand. If this is the case, then you will need to increase the capacity of your deployment. For read-heavy applications increase the size of your replica set and distribute read operations to secondary members. For write heavy applications, deploy sharding and add one or more shards to a sharded cluster to distribute load among mongod instances.
连接数达到峰值也可能是应用程序或驱动程度错误的结果。10gen 支持的所有 MongoDB 驱动程序都会实施连接池,允许客户端更有效地使用和重复使用连接。连接数极高,特别是没有对应工作负荷时,通常表明驱动程序或其他配置错误。
MongoDB contains a database profiling system that can help identify inefficient queries and operations. Enable the profiler by setting the profile value using the following command in the mongo shell:
db.setProfilingLevel(1)
注意
由于数据库探查器可能对性能具有影响,因此仅对策略性间隔启用分析功能,并尽可能少在生产系统上使用。
You may enable profiling on a per-mongod basis. This setting will not propagate across a replica set or sharded cluster.
以下分析级别均可用:
| 级别 | 设置 |
| 0 | 关。无分析。 |
| 1 | 开。仅包括慢操作。 |
| 2 | 开。包括所有操作。 |
See the output of the profiler in the system.profile collection of your database. You can specify the slowms setting to set a threshold above which the profiler considers operations “slow” and thus included in the level 1 profiling data. You may configure slowms at runtime, as an argument to the db.setProfilingLevel() operation.
Additionally, mongod records all “slow” queries to its log, as defined by slowms. The data in system.profile does not persist between mongod restarts.
You can view the profiler’s output by issuing the show profile command in the mongo shell, with the following operation.
db.system.profile.find( { millis : { $gt : 100 } } )
This returns all operations that lasted longer than 100 milliseconds. Ensure that the value specified here (i.e. 100) is above the slowms threshold.
另见
Optimization Strategies for MongoDB addresses strategies that may improve the performance of your database queries and operations.
The primary administrative concern that requires monitoring with replica sets, beyond the requirements for any MongoDB instance, is “replication lag.” This refers to the amount of time that it takes a write operation on the primary to replicate to a secondary. Some very small delay period may be acceptable; however, as replication lag grows, two significant problems emerge:
For causes of replication lag, see Replication Lag.
Replication issues are most often the result of network connectivity issues between members or the result of a primary that does not have the resources to support application and replication traffic. To check the status of a replica, use the replSetGetStatus or the following helper in the shell:
rs.status()
See the replSetGetStatus document for a more in depth overview view of this output. In general watch the value of optimeDate. Pay particular attention to the difference in time between the primary and the secondary members.
The size of the operation log is only configurable during the first run using the --oplogSize argument to the mongod command, or preferably the oplogSize in the MongoDB configuration file. If you do not specify this on the command line before running with the --replSet option, mongod will create a default sized oplog.
默认情况下,oplog 在 64 位系统上是占 5% 的总可用磁盘空间。
另见
In most cases the components of sharded clusters benefit from the same monitoring and analysis as all other MongoDB instances. Additionally, clusters require monitoring to ensure that data is effectively distributed among nodes and that sharding operations are functioning appropriately.
另见
See the Sharding page for more information.
The config database provides a map of documents to shards. The cluster updates this map as chunks move between shards. When a configuration server becomes inaccessible, some sharding operations like moving chunks and starting mongos instances become unavailable. However, clusters remain accessible from already-running mongos instances.
Because inaccessible configuration servers can have a serious impact on the availability of a sharded cluster, you should monitor the configuration servers to ensure that the cluster remains well balanced and that mongos instances can restart.
The most effective sharded cluster deployments require that chunks are evenly balanced among the shards. MongoDB has a background balancer process that distributes data such that chunks are always optimally distributed among the shards. Issue the db.printShardingStatus() or sh.status() command to the mongos by way of the mongo shell. This returns an overview of the entire cluster including the database name, and a list of the chunks.
In nearly every case, all locks used by the balancer are automatically released when they become stale. However, because any long lasting lock can block future balancing, it’s important to insure that all locks are legitimate. To check the lock status of the database, connect to a mongos instance using the mongo shell. Issue the following command sequence to switch to the config database and display all outstanding locks on the shard database:
use config
db.locks.find()
For active deployments, the above query might return a useful result set. The balancing process, which originates on a randomly selected mongos, takes a special “balancer” lock that prevents other balancing activity from transpiring. Use the following command, also to the config database, to check the status of the “balancer” lock.
db.locks.find( { _id : "balancer" } )
如果该锁定存在,请确保平衡器进程正有效使用该锁定。