MongoDB分片

On this page 在本页面

Sharding is a method for distributing data across multiple machines. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.

分片是一种用于在多台计算机之间分配数据的方法。MongoDB使用分片来支持具有非常大的数据集和高吞吐量操作的部署。

Database systems with large data sets or high throughput applications can challenge the capacity of a single server. For example, high query rates can exhaust the CPU capacity of the server. Working set sizes larger than the system’s RAM stress the I/O capacity of disk drives.

具有大数据集或高吞吐量应用程序的数据库系统可能会挑战单个服务器的容量。例如,高查询率可能会耗尽服务器的CPU容量。大于系统RAM的工作集大小会增加磁盘驱动器的I / O容量。

There are two methods for addressing system growth: vertical and horizontal scaling.

解决系统增长的方法有两种:垂直缩放和水平缩放。

Vertical Scaling involves increasing the capacity of a single server, such as using a more powerful CPU, adding more RAM, or increasing the amount of storage space. Limitations in available technology may restrict a single machine from being sufficiently powerful for a given workload. Additionally, Cloud-based providers have hard ceilings based on available hardware configurations. As a result, there is a practical maximum for vertical scaling.

垂直扩展涉及增加单个服务器的容量,例如使用功能更强大的CPU,添加更多RAM或增加存储空间量。可用技术的局限性可能会让一台计算机对于给定的工作负载没有足够的功能。此外,基于云的提供程序具有基于可用硬件配置的严格上限。由此垂直缩放有一个实际的最大值。

Horizontal Scaling involves dividing the system dataset and load over multiple servers, adding additional servers to increase capacity as required. While the overall speed or capacity of a single machine may not be high, each machine handles a subset of the overall workload, potentially providing better efficiency than a single high-speed high-capacity server. Expanding the capacity of the deployment only requires adding additional servers as needed, which can be a lower overall cost than high-end hardware for a single machine. The trade off is increased complexity in infrastructure and maintenance for the deployment.

水平扩展涉及划分系统数据集并在多台服务器上加载,并添加其他服务器以根据需要增加容量。虽然单台计算机的整体速度或容量可能不高,但是每台计算机只能处理一部分整体工作负载,因此与单台高速大容量服务器相比,可能提供更高的效率。扩展部署的容量仅需要根据需要添加其他服务器,这可以比一台机器的高端硬件降低总体成本。折衷方案是增加基础结构和部署维护的复杂性。

MongoDB supports horizontal scaling through sharding.

MongoDB通过分片支持水平扩展

Sharded Cluster 分片群集

A MongoDB sharded cluster consists of the following components:

  • shard: Each shard contains a subset of the sharded data. Each shard can be deployed as a replica set.

  • mongos: The mongos acts as a query router, providing an interface between client applications and the sharded cluster. Starting in MongoDB 4.4, mongos can support hedged reads to minimize latencies.

  • config servers: Config servers store metadata and configuration settings for the cluster.

The following graphic describes the interaction of components within a sharded cluster:

MongoDB分片群集由以下组件组成:

  • 分片:每个分片包含分片数据的子集。每个分片都可以部署为副本集

  • mongosmongos充当查询路由器,在客户端应用程序和分片群集之间提供接口。从MongoDB 4.4开始,mongos可以支持 对冲读取(hedged reads)以最大程度地减少延迟。

  • config服务器:配置服务器存储集群的元数据和配置设置。

下图描述了分片群集中组件的交互:

Diagram of a sample sharded cluster for production purposes. Contains exactly 3 config servers, 2 or more ``mongos`` query routers, and at least 2 shards. The shards are replica sets.

MongoDB shards data at the collection level, distributing the collection data across the shards in the cluster.

MongoDB在集合级别分片数据,将收集数据分布在集群中的各个分片上。

Shard Keys 分片键

MongoDB uses the shard key to distribute the collection’s documents across shards. The shard key consists of a field or multiple fields in the documents.

  • Starting in version 4.4, documents in sharded collections can be missing the shard key fields. Missing shard key fields are treated as having null values when distributing the documents across shards but not when routing queries. For more information, see Missing Shard Key.

  • In version 4.2 and earlier, shard key fields must exist in every document for a sharded collection.

MongoDB使用分片在各个分片之间分发集合的文档。分片键由文档中的一个或多个字段组成。

  • 从版本4.4开始,分片集合中的文档可能缺少分片键字段。在跨分片分布文档时,缺少分片键字段将被视为具有空值,但在路由查询时则不会。有关更多信息,请参见 分片键缺失

  • 在4.2及更早版本中,分片键字段必须在每个文档中存在一个分片集合。

You select the shard key when sharding a collection.

  • Starting in MongoDB 4.4, you can refine a collection’s shard key by adding a suffix field or fields to the existing key. See refineCollectionShardKey for details.

  • In MongoDB 4.2 and earlier, the choice of shard key cannot be changed after sharding.

分片集合时选择分片键。

  • 从MongoDB 4.4开始,您可以通过向现有键中添加一个或多个后缀字段来优化集合的分片键。有关详细信息,请参见refineCollectionShardKey

  • 在MongoDB 4.2和更低版本中,无法在分片后更改分片键的选择。

A document’s shard key value determines its distribution across the shards.

  • Starting in MongoDB 4.2, you can update a document’s shard key value unless your shard key field is the immutable _id field. See Change a Document’s Shard Key Value for more information.

  • In MongoDB 4.0 and earlier, a document’s shard key field value is immutable.

文档的分片键值决定了其在各个分片中的分布。

  • 从MongoDB 4.2开始,您可以更新文档的分片键值,除非您的分片键字段为不可变_id字段。有关更多信息,请参见 更改文档的分片键值

  • 在MongoDB 4.0和更早版本中,文档的分片键字段值是不可变的。

Shard Key Index 分片键索引

To shard a populated collection, the collection must have an index that starts with the shard key. When sharding an empty collection, MongoDB creates the supporting index if the collection does not already have an appropriate index for the specified shard key. See Shard Key Indexes.

要对已填充的集合进行分片,该集合必须具有以分片键开头的索引。分片一个空集合时,如果该集合还没有针对指定分片键的适当索引,则MongoDB会创建支持索引。请参阅分片键索引

Shard Key Strategy 分片键策略

The choice of shard key affects the performance, efficiency, and scalability of a sharded cluster. A cluster with the best possible hardware and infrastructure can be bottlenecked by the choice of shard key. The choice of shard key and its backing index can also affect the sharding strategy that your cluster can use.

See Choosing a Shard Key documentation for more information.

分片键的选择会影响分片群集的性能,效率和可伸缩性。选择分片键可以使具有最佳硬件和基础结构的群集成为瓶颈。分片键及其后备索引的选择也会影响群集可以使用的分片策略

有关更多信息,请参见选择分片键文档。

Chunks 块

MongoDB partitions sharded data into chunks. Each chunk has an inclusive lower and exclusive upper range based on the shard key.

MongoDB将分片数据拆分成。每个分块都有一个基于分片键的上下限范围 。

Balancer and Even Chunk Distribution 均衡器和均匀块分配

In an attempt to achieve an even distribution of chunks across all shards in the cluster, a balancer runs in the background to migrate chunks across the shards .

See Data Partitioning with Chunks for more information.

为了在整个集群中的所有分片上实现块的均匀分布,平衡器在后台运行,以在各分上迁移

有关更多信息,请参见使用块进行数据分区

Advantages of Sharding 分片的优势

Reads / Writes 读/写

MongoDB distributes the read and write workload across the shards in the sharded cluster, allowing each shard to process a subset of cluster operations. Both read and write workloads can be scaled horizontally across the cluster by adding more shards.

MongoDB将读写工作负载分布在分集群中的各个分上,从而允许每个分片处理集群操作的子集。通过添加更多分片,可以在集群中水平扩展读写工作负载。

For queries that include the shard key or the prefix of a compound shard key, mongos can target the query at a specific shard or set of shards. These targeted operations are generally more efficient than broadcasting to every shard in the cluster.

对于包含分片键或复合分片键的前缀的查询,mongos可以将查询定位到特定的分片或一组分片。这些目标操作通常比广播到群集中的每个分片更有效 。

Starting in MongoDB 4.4, mongos can support hedged reads to minimize latencies.

从MongoDB 4.4开始,mongos可以支持对冲读取(hedged reads)以最大程度地减少延迟。

Storage Capacity 存储容量

Sharding distributes data across the shards in the cluster, allowing each shard to contain a subset of the total cluster data. As the data set grows, additional shards increase the storage capacity of the cluster.

分片横跨分发数据碎片在集群中,允许每个碎片以包含总簇数据的子集。随着数据集的增长,其他分片将增加群集的存储容量。

High Availability 高可用性

The deployment of config servers and shards as replica sets provide increased availability.

将配置服务器和分片作为副本集进行部署可提高可用性。

Even if one or more shard replica sets become completely unavailable, the sharded cluster can continue to perform partial reads and writes. That is, while data on the unavailable shard(s) cannot be accessed, reads or writes directed at the available shards can still succeed.

即使一个或多个分片副本集变得完全不可用,分片群集也可以继续执行部分读取和写入。也就是说,虽然无法访问不可用分片上的数据,但是针对可用分片的读取或写入仍然可以成功。

Considerations Before Sharding 分片前的注意事项

Sharded cluster infrastructure requirements and complexity require careful planning, execution, and maintenance.

分片群集基础结构的要求和复杂性要求仔细计划,执行和维护。

Once a collection has been sharded, MongoDB provides no method to unshard a sharded collection.

分片后,MongoDB不会提供任何方法来对分片集群进行分片。

Careful consideration in choosing the shard key is necessary for ensuring cluster performance and efficiency. See Choosing a Shard Key.

为了确保群集的性能和效率,在选择分片键时需要仔细考虑。请参阅 选择分片键

Sharding has certain operational requirements and restrictions. See Operational Restrictions in Sharded Clusters for more information.

分片有一定的操作要求和限制。有关更多信息,请参见 分片群集中的操作限制

If queries do not include the shard key or the prefix of a compound shard key, mongos performs a broadcast operation, querying all shards in the sharded cluster. These scatter/gather queries can be long running operations.

如果查询包含分片键或复合分片键的前缀 ,请mongos执行广播操作,查询分 片群集中的所有分片。这些分散/聚集查询可能是长时间运行的操作。

NOTE

If you have an active support contract with MongoDB, consider contacting your account representative for assistance with sharded cluster planning and deployment.

注意

如果您与MongoDB签订了有效的支持合同,请考虑与您的客户代表联系,以获取分片群集计划和部署方面的帮助。

Sharded and Non-Sharded Collections 分片和非分片集合

A database can have a mixture of sharded and unsharded collections. Sharded collections are partitioned and distributed across the shards in the cluster. Unsharded collections are stored on a primary shard. Each database has its own primary shard.

数据库可以包含分片和未分片集合的混合。分片集合在群集中的分分区和分布 。未分片的集合存储在主分片上。每个数据库都有其自己的主分片。

Diagram of a primary shard. A primary shard contains non-sharded collections as well as chunks of documents from sharded collections. Shard A is the primary shard.

Connecting to a Sharded Cluster 连接到分片群集

You must connect to a mongos router to interact with any collection in the sharded cluster. This includes sharded and unsharded collections. Clients should never connect to a single shard in order to perform read or write operations.

您必须连接到mongos路由器才能与分片群集中的任何集合进行交互。这包括分片未分片的集合。客户端永远不要连接到单个分片以执行读取或写入操作。

Diagram of applications/drivers issuing queries to mongos for unsharded collection as well as sharded collection. Config servers not shown.

You can connect to a mongos the same way you connect to a mongod, such as via the mongo shell or a MongoDB driver.

您可以通过与mongod相同的方式连接mongos ,例如通过mongo shell 或MongoDB 驱动程序

Sharding Strategy 分片策略

MongoDB supports two sharding strategies for distributing data across sharded clusters.

MongoDB支持两种分片策略,用于在分片群集之间分布数据。

Hashed Sharding 哈希分片

Hashed Sharding involves computing a hash of the shard key field’s value. Each chunk is then assigned a range based on the hashed shard key values.

哈希分片涉及计算分片键字段值的哈希值。然后,根据散列的分片键值为每个分配一个范围。

TIP 提示

MongoDB automatically computes the hashes when resolving queries using hashed indexes. Applications do not need to compute hashes.

使用哈希索引解析查询时,MongoDB自动计算哈希值。应用程序也不会需要计算哈希值。

Diagram of the hashed based segmentation.

While a range of shard keys may be “close”, their hashed values are unlikely to be on the same chunk. Data distribution based on hashed values facilitates more even data distribution, especially in data sets where the shard key changes monotonically.

尽管一系列分片键可能是“接近”的,但它们的哈希值不太可能在同一块上。基于哈希值的数据分发有助于更均匀的数据分发,尤其是在分片键单调更改的数据集中。

However, hashed distribution means that range-based queries on the shard key are less likely to target a single shard, resulting in more cluster wide broadcast operations

See Hashed Sharding for more information.

但是,哈希分布意味着对分片键的基于范围的查询不太可能针对单个分片,从而导致更多集群范围的广播操作

有关更多信息,请参见哈希分片

Ranged Sharding 范围分片

Ranged sharding involves dividing data into ranges based on the shard key values. Each chunk is then assigned a range based on the shard key values.

范围分片涉及根据分片键值将数据划分为多个范围。然后,根据分片键值为每个分配一个范围。

Diagram of the shard key value space segmented into smaller ranges or chunks.

A range of shard keys whose values are “close” are more likely to reside on the same chunk. This allows for targeted operations as a mongos can route the operations to only the shards that contain the required data.

值“接近”的一系列分片键更有可能驻留在同一块上。这允许有针对性的操作,因为mongos可以将操作仅路由到包含所需数据的分片。

The efficiency of ranged sharding depends on the shard key chosen. Poorly considered shard keys can result in uneven distribution of data, which can negate some benefits of sharding or can cause performance bottlenecks. See shard key selection for range-based sharding.

See Ranged Sharding for more information.

范围分片的效率取决于选择的分片键。分片键考虑不周全会导致数据分布不均,这可能会削弱分片的某些优势或导致性能瓶颈。有关基于范围的分片,请参见 分片键选择

有关更多信息,请参见范围分片

Zones in Sharded Clusters 分片群集中的区域

Zones can help improve the locality of data for sharded clusters that span multiple data centers.

Zones区域可以帮助提高跨多个数据中心的分片群集的数据局部性。

In sharded clusters, you can create zones of sharded data based on the shard key. You can associate each zone with one or more shards in the cluster. A shard can associate with any number of zones. In a balanced cluster, MongoDB migrates chunks covered by a zone only to those shards associated with the zone.

在分片群集中,您可以基于分片键创建分片数据区域。您可以将每个区域与集群中的一个或多个分片关联。分片可以与任意数量的区域关联。在平衡群集中,MongoDB仅将区域覆盖的迁移到与该区域关联的分片。

Each zone covers one or more ranges of shard key values. Each range a zone covers is always inclusive of its lower boundary and exclusive of its upper boundary.

每个区域覆盖一个或多个分片键值范围。区域覆盖的每个范围始终包括其下边界和上边界。

Diagram of data distribution based on zones in a sharded cluster

You must use fields contained in the shard key when defining a new range for a zone to cover. If using a compound shard key, the range must include the prefix of the shard key. See shard keys in zones for more information.

在定义要覆盖的区域的新范围时,必须使用分片键中包含的字段。如果使用复合分片键,则范围必须包含分片键的前缀。有关更多信息,请参见区域中的分片键

The possible use of zones in the future should be taken into consideration when choosing a shard key.

选择分片键时,应考虑将来可能使用的区域。

TIP 提示

Starting in MongoDB 4.0.3, setting up zones and zone ranges before you shard an empty or a non-existing collection allows for a faster setup of zoned sharding.

从MongoDB 4.0.3开始,在 对空集合或不存在的集合进行分片之前设置区域和区域范围可以更快地设置区域分片。

See zones for more information.

有关更多信息,请参见区域

Collations in Sharding 分片中的排序规则

Use the shardCollection command with the collation : { locale : "simple" } option to shard a collection which has a default collation. Successful sharding requires that:

  • The collection must have an index whose prefix is the shard key

  • The index must have the collation { locale: "simple" }

When creating new collections with a collation, ensure these conditions are met prior to sharding the collection.

使用带有collation : { locale : "simple" }选项的shardCollection命令可以分片具有默认排序规则的集合 。成功的分片需要:

  • 集合必须具有前缀为分片键的索引

  • 索引必须具有{ locale: "simple" }排序规则

使用排序规则创建新集合时,在分片集合之前,请确保满足这些条件。

NOTE

Queries on the sharded collection continue to use the default collation configured for the collection. To use the shard key index’s simple collation, specify {locale : "simple"} in the query’s collation document.

注意

分片集合上的查询继续使用为集合配置的默认排序规则。要使用分片索引的simple归类,请在查询的归类文档中指定{locale : "simple"}

See shardCollection for more information about sharding and collation.

请参阅shardCollection以获取有关分片和整理的更多信息。

Change Streams 变更流

Starting in MongoDB 3.6, change streams are available for replica sets and sharded clusters. Change streams allow applications to access real-time data changes without the complexity and risk of tailing the oplog. Applications can use change streams to subscribe to all data changes on a collection or collections.

从MongoDB 3.6开始,变更流可用于副本集和分片群集。更改流允许应用程序访问实时数据更改,而不会带来复杂性和拖延操作日志的风险。应用程序可以使用变更流来订阅一个或多个集合上的所有数据更改。

Transactions 事务

Starting in MongoDB 4.2, with the introduction of distributed transactions, multi-document transactions are available on sharded clusters.

从MongoDB 4.2开始,随着分布式事务的引入,分片群集上可以使用多文档事务。

Until a transaction commits, the data changes made in the transaction are not visible outside the transaction.

在提交事务之前,在事务外部看不到在事务中进行的数据更改。

However, when a transaction writes to multiple shards, not all outside read operations need to wait for the result of the committed transaction to be visible across the shards. For example, if a transaction is committed and write 1 is visible on shard A but write 2 is not yet visible on shard B, an outside read at read concern ["local"](https://docs.mongodb.com/manual/reference/read-concern-local/#readconcern."local") can read the results of write 1 without seeing write 2.

但是,当事务写入多个分片时,并非所有外部读取操作都需要等待已提交事务的结果在所有分片上可见。例如,如果提交了一个事务,并且在分片A上可以看到写1,但是在分片B上仍然看不到写2,则在读问题上进行的外部读取 ["local"](https://docs.mongodb.com/manual/reference/read-concern-local/#readconcern."local")可以读取写1的结果而看不到写2。

For details, see:

Replica Set Member States Sharded Cluster Components

有关详细信息,请参见:

副本集成员状态 分片集群组件

原文链接:https://docs.mongodb.com/manual/sharding/

MongoDB中文社区

MongoDB中文社区—MongoDB爱好者技术交流平台

资源列表推荐

资源入口

MongoDB中文社区官网

https://mongoing.com/

微信服务号 ——最新资讯和优质文章

Mongoing中文社区(mongoing-mongoing)

微信订阅号 ——发布文档翻译内容

MongoDB中文用户组(mongoing123)

官方微信号 —— 官方最新资讯

MongoDB数据库(MongoDB-China)

MongoDB中文社区组委会成员介绍

https://mongoing.com/core-team-members

MongoDB中文社区翻译小组介绍

https://mongoing.com/translators

MongoDB中文社区微信技术交流群

添加社区助理小芒果微信(ID:mongoingcom),并备注 mongo

MongoDB中文社区会议及文档资源

https://mongoing.com/resources

MongoDB中文社区大咖博客

基础知识 性能优化 原理解读 运维监控 最佳实践

MongoDB白皮书

https://mongoing.com/mongodb-download-white-paper

MongoDB初学者教程-7天入门

https://mongoing.com/mongodb-beginner-tutorial

社区活动邮件订阅

https://sourl.cn/spszjN