Which scenario justifies proposing sharding?

Test your Systems Design Concepts knowledge with our comprehensive quiz. Utilize flashcards and multiple choice questions to enhance your study session. Prepare thoroughly with detailed explanations for each answer and ace your examination!

Multiple Choice

Which scenario justifies proposing sharding?

Explanation:
Sharding is horizontal partitioning of data across multiple databases or nodes to spread load and storage, so a system can scale its writes and keep performance as data grows. It’s justified when a single shard becomes a bottleneck for writes, when the dataset becomes large enough that one node can’t store or efficiently manage it, or when you truly need data locality across regions. The scenario uses concrete signals—write throughput above 10k TPS, storage around 50 TiB, or a genuine need for geographic distribution. These are clear triggers: they indicate the single-node setup would struggle to handle the workload or meet latency requirements, so distributing data across multiple shards makes sense to maintain performance and availability. Read replicas help with reads but don’t solve write bottlenecks or distribution issues, so they don’t justify sharding by themselves. Sharding also isn’t guaranteed to reduce latency in every case; adding cross-shard coordination can introduce overhead, and if the workload fits on a single shard, sharding could even add unnecessary latency.

Sharding is horizontal partitioning of data across multiple databases or nodes to spread load and storage, so a system can scale its writes and keep performance as data grows. It’s justified when a single shard becomes a bottleneck for writes, when the dataset becomes large enough that one node can’t store or efficiently manage it, or when you truly need data locality across regions.

The scenario uses concrete signals—write throughput above 10k TPS, storage around 50 TiB, or a genuine need for geographic distribution. These are clear triggers: they indicate the single-node setup would struggle to handle the workload or meet latency requirements, so distributing data across multiple shards makes sense to maintain performance and availability.

Read replicas help with reads but don’t solve write bottlenecks or distribution issues, so they don’t justify sharding by themselves. Sharding also isn’t guaranteed to reduce latency in every case; adding cross-shard coordination can introduce overhead, and if the workload fits on a single shard, sharding could even add unnecessary latency.

Subscribe

Get the latest from Examzify

You can unsubscribe at any time. Read our privacy policy