System Design

Sharding

0
(0)


Source: Scalability, availability, stability, patterns

Sharding distributes data across different databases such that each database can only manage a subset of the data. Taking a users database as an example, as the number of users increases, more shards are added to the cluster.

Similar to the advantages of federation, sharding results in less read and write traffic, less replication, and more cache hits. Index size is also reduced, which generally improves performance with faster queries. If one shard goes down, the other shards are still operational, although you’ll want to add some form of replication to avoid data loss. Like federation, there is no single central master serializing writes, allowing you to write in parallel with increased throughput.

Common ways to shard a table of users is either through the user’s last name initial or the user’s geographic location.

Disadvantage(s): sharding
  • You’ll need to update your application logic to work with shards, which could result in complex SQL queries.
  • Data distribution can become lobsided in a shard. For example, a set of power users on a shard could result in increased load to that shard compared to others.
    • Rebalancing adds additional complexity. A sharding function based on consistent hashing can reduce the amount of transferred data.
  • Joining data from multiple shards is more complex.
  • Sharding adds more hardware and additional complexity.
Source(s) and further reading: sharding

How useful was this post?

Click on a star to rate it!

Average rating 0 / 5. Vote count: 0

No votes so far! Be the first to rate this post.

As you found this post useful...

Follow us on social media!

We are sorry that this post was not useful for you!

Let us improve this post!

Tell us how we can improve this post?

0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments