Sharding is an architectural approach that distributes a single logical database system into a cluster of machines.
Sharding is Horizontal partitioning design scheme. In this database design rows of a database table are stored separately, instead of splitting into columns (like in normalization and vertical partitioning). Each partition is called a shard, which can be independently located on a separate database server or physical location.
Sharding makes a database system highly scalable. The total number of rows in each table in each database is reduced since the tables are divided and distributed into multiple servers. This reduces the index size, which generally means improved search performance.
The most common approach for creating shards is by the use of consistent hashing of a unique id in the application (e.g. user id).
The downsides of sharding are,
- It requires the application to be aware of the data location.
- Any addition or deletion of nodes from the system will require some rebalance to be done in the system.
- If you require a lot of cross-node join queries then your performance will be really bad. Therefore, knowing how the data will be used for querying becomes really important.
- A wrong sharding logic may result in worse performance. Therefore make sure you shard based on the application need.