Cassandra Replication Factor Strategy

What is Cassandra Replication Factor?

Cassandra replication factor strategy concerns with the data replication on multiple nodes. Cassandra is Distributed nosql database. so high availability is one of its main features. 

In this Tutorial you will learn;

Apache Cassandra’s Architecture is designed to handle large amount of data. Cassandra store’s data on multiple machines (nodes). And provide no single point of failure.

This feature Cassandra Replication Factor Strategy is developed to cater the issue of hardware failure. As multiple nodes take a part in Cassandra operations. so in case any node goes down, data is already replicated on multiple nodes by Cassandra. And each read and writes request to Cassandra are handled from other pairing nodes during the failure time.

Data is stored on different machines using a peer to peer distributed architecture.

All communication between Cassandra nodes is done using Gossip protocol. And Cassandra also detect some faulty node if exist in cluster. Gossip is a communication protocol in Apache Cassandra.

The picture below depicts how the data replication works in Cassandra.

Cassandra Replication Factor
Cassandra Replication Factor

Data Replication in Cassandra

Hardware failure can occur at any time. And also network between Cassandra nodes can be down at any time during Cassandra data Operations. To provide no point of failure, Cassandra provides a backup strategy. Apache Cassandra replicates its entire data for assuring the high availability.

Cassandra replicates data on different nodes based on these two factors.

  • First is Replication Strategy, it tells where to place the replica of some specific node.
  • Replication Factor, its tells how many replicas it had to made for a specific node.

if you set Replication factor equals One. It means that there Cassandra will maintain only a single copy of data. if you set Replication factor equals 3. Cassandra will maintain three copies of each data piece on three different nodes.

For a Cassandra production environment Cassandra replication factor should be 3 or more to ensure no point of failure and high availability.

Data Replication strategies in Cassandra

There are two kinds of replication strategies in apache Cassandra.

 SimpleStrategy

SimpleStrategy is used when there is only one data center. It selects the first node to place replica told by the partitioner. And then, place remaining replicas in clockwise direction in the Cluster. This strategy will not consider the rack or data center in replication.

The picture depicts how the SimpleStrategy works in single cluster environment.

Cassandra Replication factor Strategy: SimpleStrategy
Cassandra Replication factor Strategy: SimpleStrategy

NetworkTopologyStrategy

NetworkTopologyStrategy is mostly used in production environment where your Cassandra Cluster is deployed more than one data centers.

In this Strategy, Cassandra place replicas for each data center separately. NetworkTopologyStrategy also places replicas in the clockwise direction in the ring till the first node in other rack.

This strategy place replicas of one rack node on different racks machine in the same data center. The reason is that sometimes power failure or problem can occur in one rack. Then Cassandra operations are directed to second rack nodes which handles the read and write operations.

The picture below depicts how the NetworkTopologyStrategy works in actual.

Cassandra replication factor strategy: NetworkTopologyStrategy
Cassandra replication factor strategy: NetworkTopologyStrategy

Updating the replication factor in Cassandra

User can also update the replication factor even after you created the keyspace. This will increase the copies of the keyspace data stored in Cassandra cluster.

The change the replication factor of a keyspace will affect each node that holds the copy of keyspace data. Below is the complete procedure to update the replication factor of the keyspace

Syntax for update of replication factor

Update a keyspace in the cluster and change its replication strategy options.

cqlsh> ALTER KEYSPACE University WITH REPLICATION =
{'class' : 'NetworkTopologyStrategy', 'dc1' : 3, 'dc2' : 2};

 Or if you want to switch to SimpleStrategy:

cqlsh> ALTER KEYSPACE University WITH REPLICATION =
{ 'class' : 'SimpleStrategy', 'replication_factor' : 3 };

 

In Cassandra, Data center names are case sensitive. So verify the data center names before you execute the query.

  1. On each node baring the keyspace, run nodetool repair command with the -full option.
  2. Wait the node to complete repair command, then follow the same procedure for remaining nodes.

Best Practices Cassandra Data Replication Factor

There are two main considerations when you are deciding that how many replicas are best for your data center.

  1. Your Cluster can handle read request in datacenter without incurring the neighboring datacenter.
  2. How your cluster behave in failure scenarios.

 The best practices to handle these considerations when you are designing your Cassandra cluster on multiple data centers are given below.

  1. Place two replicas in same datacenter. This is due to when some single node goes down your read and write requests are still handled locally by the same datacenter with the consistency level One. As we place two copies of each data in the same datacenter.
  2. Place three replicas in each datacenter. This configuration will handle either the failure of one node in one replication group or multi node failure scenario with a strong consistency level ONE.

One of the best practice people usually use is Asymmetrical replication groups. For example, place three replicas in one datacenter to handle read and write requests and use a single replica on some other datacenter analytics purpose.

Summary:

This Article explains the complete step by step procedure how data replication works in Cassandra. What is Cassandra replication factor? How to update the Cassandra replication factor of existing key spaces? And also the best practices to configure the cassandra replication factor strategy.