Cassandra Architecture

This tutorial is related to Cassandra Architecture. And it explain how does different operations are carried out during cassandra in production environment. Apache Cassandra architecture is not very complex. It is mainly designed to store large amount of structured and semi structured data as well. (Semi structured data is that in which column values allows null value.) And these properties are achieved due to Cassandra data distribution strategy. It distributes data on multiple nodes say machines. Also retrieves data from multiple nodes at a time. In this way, multiple tasks are done through collective effort of multiple machines at a time.

Cassandra leaves no pint of failure. As the multiple machines are connected with each other, Cassandra uses the replication strategy. There is a chance that at some time one of machine of more than one goes down. To cater this problem cassandra replicate its data on multiple machines. In case if some node goes down, the queries are handled by other node upon which the data is replicated.

Data is distributed on different nodes using peer to peer architecture in a distributed way. 

All nodes in Cassandra cluster has the status and play the same role. Cassandra nodes are independent of each other but interconnected with other.

All the nodes (server or machine) has the ability to serve the read as well as write requests coming from client side. It does not matter whether the requested data present on the requested node or node.

In case, if any nodes goes offline(Down) all read andwrite request from client will be served from adjacent other nodes without anydelay. That is is main benefit of data replication. 

To Download Cassandra complete setup.Click Here.

Gossip Protocol In Cassandra:

All Nodes use the Gossip protocol to communicate with each other. Gossip is a protocol by which Cassandra machines or nodes communicate with each other in a secured way.

Components of Cassandra architecture:

cassandra architecture 1
cassandra Architecture

Cassandra Data-center:

A collection of multiple Cassandra nodes or servers make a data center. Basically this concept is related to multiple Cassandra (nodes)placed in same physical place make a Cassandra data center. In above figure one circle of nodes represents a data center.

Cassandra Node:

Cassandra node is the basic unit of Cassandra cluster or architecture. Cassandra node basically represents the machine or servers physically connected with each other. And Data is stored on these nodes.

Cassandra Cluster: 

Multiple Cassandra data centers combined together to make a Cassandra cluster.

Commit Log in Cassandra:

All write operation write the data to commit log first. Commit log is mainly used to recover Query in case of query crash.

Mem-Table in Cassandra:

After the data is written in commit log. Then in next phase the data is written to Mem-table. Data written in Mem-table is for temporary use. It’s like a buffer.

SSTable in Cassandra:

As already stated above that Mem-table is works like a buffer. When the Mem-table reaches a certain threshold then the data in Mem-table is flushed into SS table for permanent storage. And Mem-table got free for new entries.

Concept of Seeds in Cassandra Cluster: 

While making Cassandra cluster we may select one or more than nodes as seeds for our Cassandra cluster. It means we put the ip’s of seed nodes while Cassandra installation on some individual machine. Seed is like the pivot of Cassandra cluster. It is the seed value in Cassandra configuration file by which one individual node knows to which cluster it has to join.

CQL(Cassandra Query Language):

End user can access the Cassandra database through any of its Node (server or machine) by using the Cassandra query Language (CQL).Cassandra Query Language (CQL) treats the Keyspace (database) as a container of multiple tables. The use of cqlsh: prompt to work with Cassandra database is also a very well-known approach by the developers.

User can connect the CQL to any of the node in Cassandra cluster. If data is available at that node, Request is fulfilled by that particular node or the particular node work as a proxy between the client application and the node which holds the actual data.

Write Operations in Cassandra: 

When Client requests a write operation to replicas, Cassandra check if all the replicas are healthy, they all will receive write request regardless of their consistency level.

Consistency level tells about how many nodes will respond back with the success acknowledgment of each write request.

The nodes acknowledge with the success Message if data is written successfully written to the commit log and mem-Table.

Each write operation to Cassandra nodes is first dumped to the commit logs written on nodes.In 2nd phase the data is dumped into mem-table. Mem-table is like a buffer. When mem-table is full the data is dumped into SStable the data file of node. Data portioning is done by Cassandra. Data replication is also done according to already set replication factor by Cassandra automatically.Cassandra periodically merges the SSTables, discarding unnecessary data.

Read Operations in Cassandra:

In Cassandra there are total three type of read requests that a client send to the some specific Node (replica).

  1. Direct request
  2. Read repair request
  3. Digest request

In 1st Phase client sends direct read request to one of the Node(replicas). In 2nd phase the client sends the digest request to multiple replicas specified by the consistency level and also checks whether the data returned by above query is an updated version of requested data or not.

In 3rd Phase, the client sends digest request to all leftover replicas. If any replica (node) returns older value, an independent background read repair request will update that data on specific node. This process is read repair mechanism.

On every read operation first see the mem-table and get the bloom filter from mem-table from where it gets the idea of exact SSTable (node) that holds the actual requested data.

Summary of Above cassandra Tutorial: 

This is detailed tutorial that explains the Cassandra architecture and how Cassandra write and read requests are handled at different stages. In addition to, here it completely explains that how Cassandra architecture work to maintains the consistency level throughout its functionality.

Read Also: Step by step tutorial to learn cassandra in 24 hours