A Primer To Understanding the Database Sharding

Nov 5, 2022
Featured image for database sharding.

Creating a website is the very first step to making your first appearance in the Internet. To thrive long-term it is essential to ensure that your site can scale to handle expansion. And one of the first actions is to set up a database that can scale along with your growth. If not, you could experience poor performance in queries or databases that fail to function.

This article will explain the ways you can utilize database sharding to achieve the highest scalability and accessibility of your data. In addition, we will discuss the disadvantages of sharding as well as the various sharding strategies which you are able to utilize.

What is the Database Sharding?

Sharding is an efficiency method that allows tables to be distributed across various databases. It's similar to partitioning in that sense that both break the data down into smaller chunks. Sharding distributes these subsets to various servers, while partitioning keeps )them in one database. They use identical database engines and hardware type to achieve an identical performance level across all shards.

Sharding seeks to create an all-shared architecture that eliminates the bottlenecks in processing and one-off failure points.

An illustration to explain database sharding.
A good example of sharding. (Image Source: Analytics Vidhya)

In this regard, sharding is like partitioning, which divides large tables into smaller ones.

Horizontal sharding works well for databases where most queries return a subset of rows, like a database for customers that returns data (like names, addresses, email, and so on) at once.

Vertical sharding is effective for databases that have queries that only return one column. For example, if the customer database returned the customer's name or email separately, you could divide the email and name into various clusters.

Benefits of Data Sharding

Here are a few benefits of database sharding.

Improved Horizontal Scaling

Your database can be scaled horizontally or vertically. Vertical scaling is the process of adding more central processing units (CPU) and RAM. Random access Memory (RAM) on the server to improve performance. Vertical scaling is a helpful solution for smaller to medium databases. As your database expands, vertical scaling is infeasible. There's only a certain amount of potential power that you can bring to the server in a single.

Horizontal scaling can be more flexible. It allows you to expand your database as needed by adding servers to your system. Each server is able to provide services to the various database shards. The result is that the load is distributed and improves the system's capability to handle more requests.

Speedier Query Response Time

Improved reliability in outage situations

Database outages happen for various reasons. These include accidentally deleted data or connection issues, as well as cybersecurity attacks. The sharding process helps minimize the impact of outages. Since every shard has its own independent and independent, only the shard that is affected faces downtime. As an example, if have four shards that have an outage in just one of them, 25 percent of operations will be affected.

The drawbacks of Sharding

Although sharding improves a database's reliability and availability, implementing it is difficult. The wrong choice of sharding structure may slow down the performance of your system and result in losing data.

Choose the sharding method that permits a balanced data distribution across all shards. If you don't have this equilibrium, you run the risk of making hotspots for your database. These occur when one shard holds all of the information but the rest of the shards stay empty. It reduces write performance to the one individual shard.

For this to be solved it is possible to divide the unbalanced shard in the future, but this can be difficult and could slow down your database while the data is transferred.

Do you want to know how we increased our visitors by 1000%?

Join 20,000+ others who receive our weekly newsletter that contains insider WordPress tricks!

Another disadvantage to sharding is the fact that SQL joins between tables in different shards can get too slow and decrease the performance. However, with the right structure, it is possible to get around this issue.

Sharding Architectures

Sharding is possible using three architectures:

  • Key-based Sharding
  • Range-based sharding
  • Directory-based sharding

The type of architecture you select is contingent on the purpose for which you are using it.

Key-Based Sharding

In a key-or hashed-based sharding architecture an application for database utilizes a shard's key to locate a particular shard. A hashing function hashes the key that is used to shard and then outputs data to a particular shard. The basic hashing algorithm is the modulus of the key and the number of shreds.

The hash function may take several sharding keys. This is why key-based sharding works well for records of data that include keys that are shared. The algorithmic distribution of data reduces the likelihood of creating database hotspots in which one shard has greater amounts of data than another.

Since distribution is based exclusively on the hashing process and is not able to logicically connect data. Thus, any database operation that require data from multiple shards could be unproductive since they require reading information from every shard.

Range-Based Sharding

Sharding based on range involves the sharding of databases depending on a specified range of values.

It utilizes a sharding key to decide which shard to assign a value to. The database software determines which shard matches the sharding key within an index table and records the data. This is why range-based sharding can be simple to create and to implement.

As an example, you can make use of the user ID number in a user database for the sharding key. You could store users with IDs between to 0-2,000 on one shard, the ones between 2,000 and 4,400 on a different shard and the list goes on.

Sharding that is based on the range of the database can result in hotspots. Think about a database for users where the majority of your user IDs lie between 2,001 and 4000. The process assigns them to one shard, which causes an imbalance in time. Sharding based on range is best suited to evenly distributed information.

Sharding using Directory-Based Sharding

Directory-based Sharding is a method of combining logically related data within one shard. It utilizes an index table that contains a list of mappings for every entity within the database. Each mapping is corresponding to a shard of the database.

The sharding based on directory is more adaptable than key-based or range-based sharding as you can add information to shards dynamically. There's no sharding function that you have to follow or range values to stay within. This makes it easier to increase the database's effectiveness: It can be stored all of your data related to it in the same shard. This means that the execution of common queries will take shorter time.

If, for instance, you utilized directory-based sharding and group users by their geographical location, and then retrieved individuals from a certain location You only need to query the shard once.

Database Sharding using

Most modern database engines provide database sharding support. One of them is MariaDB which is a commercially-supported version of MySQL. MariaDB is a highly-performing open-source database system that is used by corporations like IBM, GitHub, and Wikimedia. It's also an element of the high-performance server stack at .

MariaDB offers built-in sharding features via the spider storage engine. It a cluster-forming engine which supports partitioning as well as extended architecture (XA) transactions. It allows you to treat tables in remote instances as if they are in the same instance. Once you create an instance of a table within the spider storage engine the table is linked to another table on that distant MariaDB server. Once establishing the connection, the storage engine shares the link to all tables which are part of the identical transaction.

Summary

Sharding of databases is a technique that partitions tables into smaller sets and then distributes them across multiple servers called shards. Sharding can be implemented using different methods, including key-based or range-based, Sharding, as well as directory-based sharding.

While sharding improves a database's capacity as well as reliability and availability, it's very complex to put into place. Additionally, after you've created the shard, it's easy to revert the database to its unsharded state. Therefore, you must use sharding for optimization only in cases where the other options to scale won't be effective.

Cut down on time, expenses and maximize site performance with:

  • 24/7 help and support assistance from WordPress hosting specialists, 24 hours a day.
  • Cloudflare Enterprise integration.
  • Global audience reach with 35 data centers across the globe.
  • Optimization using the built-in Application Performance Monitoring.