This article is written assuming that you are familiar with blockchain technology. Nonetheless, we’ll briefly overview this emerging and relatively young technology.

Blockchain is a network of interconnected nodes that validates and records digital currency transactions, supply-chain records, and other data requiring validity, integrity, and distributed access. The three core traits of blockchain are — decentralization, scalability, and security.

Blockchain is decentralized as it keeps a record of everything without the need for a central authority. All nodes involved are entitled to verify network transactions making it impossible to alter or manipulate the data. Even if an attacker plans to make changes on the network, they need to access each node in the network, which is cumbersome. Hence security is assured with blockchain.

Well..it’s understood blockchain is decentralized and secured. What about scalability?

With the increase in the adoption rate of blockchain, there is a vital necessity for the blockchain network to scale enough to facilitate transactions with the lowest latency. The general-purpose blockchain, Ethereum throughput is only about 15–20 transactions/second. Decentralization and security comes at the cost of the network scalability. Thus transaction throughput must be scaled in some way to make the most of this technology.

Sharding to scale the network

Have you ever considered how large the Bitcoin blockchain is in terms of memory usage? When writing this article, the Bitcoin blockchain was approximately 380 GB in size. Moreover, if the current trend continues, it is expected to cross 1 TB by 2030. In the case of the general-purpose blockchain Ethereum, the chain size has already crossed 1 TB. This growth in blockchain size will continue to grow exponentially. This brings in some concerns about the scalability of the network.

Suppose a new node wants to participate in validating the network transactions. In that case, it needs to download the entire transaction history starting from the genesis block up to the last transactions made. This brings a memory overhead to nodes using commodity hardware to participate in the network. Also, this method slows down the network overall, as every transaction needs to be updated on every single node. The more the number of nodes, the slower the network is, and the less throughput it can produce.

This problem is expected to be solved using sharding.

Sharding is a database splitting technique where the data is split and stored on multiple devices when the original data is too large to be held on a single machine.

Sharding can be done horizontally or vertically. For example, see the below figure:

Here in the above figure, vertical sharding and horizontal sharding are demonstrated.

In the case of blockchain, what do you think will be the sharding scheme used? You probably guessed it right. Blockchain is sharded horizontally.

Each shard (part of the blockchain split by sharding) is stored on multiple nodes in the network. Transactions no longer need to be updated on all nodes but simply on a subset of the network, lowering latency and boosting transaction speed.

Ethereum is a strong advocate of sharding. With the debut of their Beacon Chain system, which uses a Proof-of-Stake consensus method, Ethereum plans to split its blockchain into 64 shard chains, which will be made public.

Some security concerns

The number of validators present in a blockchain is its strength. So, it reduces the security of the chain. Here we are reducing it using sharding.

Instead of the blockchain authenticating every node transaction, sharding spreads the workload, allowing a single-shard attack to occur.

This opens up many more chances for an attacker, who only has to deal with one shard now. Once one shard is breached, the attacker will have little trouble breaching other shards. However, without sharding, an attacker would have to assault every workstation on the network, which is far more complex. A hacker or a cyber-attack may take control of a shard if we consider each shard to be its own blockchain network with its own verified users and data. The attacker might then insert fake transactions or a malicious application into the system. Corrupting the nodes in specific shard results in the permanent loss of the corresponding portion of data. For example, in the Ethereum network model, nodes are randomly assigned to a shard and reassigned again randomly. The goal is to make it challenging for an intruder to predict, or force, the shard to which their (malicious) node is assigned. This makes a Byzantine takeover of a single shard more difficult.

Sharding adds much weight to an already complicated software stack. It would be like attempting to stack skyscrapers on top of one other. The structure would most certainly tip over or collapse under its weight at some point.

Blockchains, unlike cryptocurrencies, do not typically deal with ‘addresses’. Hyperledger blockchains are concerned with maintaining a global state (think database), and the consensus mechanism governs updates to that state. In contrast, the blockchain securely stores state updates.

To ensure that those nodes have a complete picture of the blockchain state while partitioned among shards, there is a challenge in dealing with “thin” clients, also known as SPV (Simplified Payment Verification) wallets. To address the visibility issue associated with sharding, thin clients share information via separate networks and keep local state copies for each shard.

Hyperledger networks, like Ethereum, can be sharded vertically, because it does not split up an address space, and it is free to experiment with different sharding techniques.

Future of Sharding

Sharding has its own set of challenges that slows down its effective implementation, but it still represents an opportunity to address the more significant scalability issue that blockchain technology faces.

With Facebook’s interest in blockchain sharding, it is expected that new complementary technologies will be developed to solve some of the problems, such as cross-sharding communication.

Final thoughts

One of the impediments to blockchain’s widespread adoption is scalability. Technology has a better chance of eventually replacing traditional data infrastructures with the concept of sharding. However, blockchain sharding is still afflicted by a few obstacles that must be resolved before this can occur. With big data companies like Facebook expressing interest in the technology, we can soon expect solutions to the challenges it faces.