Cassandra 2.0 Architecture

In this article, I will cover various key structures that makes up Cassandra. We will also see what structure resides in memory and what resides on disk.

In next article, I will give an overview of various key components that uses these structure for successfully running Cassandra. Further articles will cover more details about each structure/components in details

Cassandra Node Architecture:

Cassandra is a cluster software. Meaning, it has to be installed/deployed on multiple servers which forms the cluster of Cassandra. In my previous article, I have mentioned how to install Cassandra on single server using CCM tool which simulates Cassandra cluster on single server.

Each server which are part of cluster is called Node. So node is essentially a server which is running Cassandra software and holds some part of data.

Cassandra distributes data on all nodes in cluster. So every node is responsible for owning part of data.
Node architecture of Cassandra looks like below. It forms ring of nodes.

cassandra_architecture2

 

 

 

 

 

 

Structures in Cassandra

Following are the various structure of Cassandra which is present on each nodes of Cassandra (either on memory or on disk):-

  • CommitLog
  • SSTable
  • MemTable
  • RowCache
  • KeyCache
  • SSTableIndex
  • SSTableIndexSumamry
  • BloomFilter
  • Compression offset

Lets have an overview of each of these structures

CommitLog [Disk]:

Commit log is a disk level file which stores log record of every transaction happening in Cassandra on that node. This file is stored at disk level for each node configured in cluster. When ever transaction happens on a node in Cassandra, commit log on disk is updated first with changed data, followed by MemTable in memory. This ensures durability. People who are familiar with Oracle terminology can consider commit log as online redo logs.

MemTable [Memory]:

Memtable is dedicated in-memory cache created for each Cassandra table. It contains recently read/modified data. When ever a data from a table is read from a node, it will first check if latest data is present in MemTable or not. If latest data is not present, it will read data from disk (from SSTable) and cache the same in MemTable. We have separate MemTable for each Cassandra table so there is no blocking of read or write for individual tables. Multiple updates on single column will result in multiple entries in commit log, and single entry in MemTable. It will be flushed to disk, when predefined criteria are met, like maximum size, timeout, or number of mutations.

SSTable [Disk]:

These are on disk tables. Every Cassandra table has a SSTable files created on disk. SSTable comprises of 6 files on disk. All these files represent single SSTable.
Following are the 6 files present on disk for each SSTable
1) Bloom Filter
2) Index
3) Data
4) Index Summary
5) Compression Info
6) Statistics

Data file (# 3 above) contains data from the table.
All other files are explained when we see the respective components below.

RowCache [Memory]:

This is off-heap memory structure on each node which caches complete row in a table if that table has rowCache enabled. We can control enabling/disabling rowCache on a table while creating table or alter table at later point. For every table in Cassandra, we have a parameter “caching” whose valid values are
None – No Caching
KEYS_ONLY – Only key caching
ROWS_ONLY – Only complete row caching
ALL – Both row and key caching
When a requested row is found in memory in rowCache(latest version), Cassandra can skip all the steps to check and retrive row from on disk SSTable. This provides huge performance benefit.

KeyCache [Memory]:

This is on-heap memory structure on each node which contains partition keys and its offsets in SSTable on disk. This helps in reducing disk seeks while reading data from SSTable. This is configurable at table level and can be enabled using KEYS_ONLY or ALL setting of caching variable for a table. So when a read is requested, Cassandra first check if a record is present in row cache (if its enabled for that table). If record is not present in row cache, it goes to bloom filter which tells whether data might exists on SSTable or that it definitely does not exists in SSTable. Based on result from bloom filter, Cassandra checks for keys in key cache and directly gets the offset position of those keys in SSTable on disk.

SSTableIndex [Disk]:

Primary key index on each SSTable is stored on separate file on disk (#2 file above). This index is used for faster lookups in SSTable. Primary key is mandatory for a table in Cassandra so that it can uniquely identify a row in table. Many times primary key is same as partition key based on which data is partitioned and distributed to various nodes in cluster.

Partition Summary [Memory and Disk]:

Partition summary is an off-heap in-memory sampling of partition index to speedup the access to index on disk. Default sampling ratio is 128, meaning that for every 128 records for a index in index file, we have 1 records in partition summary. Each of these records of partition summary will hold key value and offset position in index. So when read requests comes for a record and if its not found in row cache and key cache, it checks for index summary to check offset of that key in index file on disk. Since all index records are not stored in summary, it gets a rough estimate of offset it has to check in index file. This reduces disk seeks.
Partition summary is also stored on disk in a file (#4 file above).
partition summary looks like below

partition_summary

 

 

 

 

 

 

 

 

Bloom Filter[Memory and Disk]:

Bloom filter is a off-heap in-mmeory hash based probabilistic algorithm that is used to test if a specific member is part of set or not. This can give false positive, but it can never give false negative. Meaning that a bloom filter can tell that a record might be present in that table on disk and we may not find that record, but it can never say that record is not present when its actually present on disk. This helps in reducing unnecessary seeks for data which is not present at all.
Bloom filter is also present on disk file (#1 file above) and contains serialized bloom filter for partition keys.

Compression offset maps[Memory and Disk]:

Compression offset maps holds the offset information for compressed blocks. By default all tables in Cassandra are compressed and more the compression ratio larger the compression offset table. When Cassandra needs to look for data, it looks up the in-memory compression offset maps and unpacks the data chunk to get to the columns. Both writes and reads are affected because of the chunks that have to be compressed and uncompressed. Compression offset maps is stored as off-heap component of memory. Its also saved on disk in separate file for each SSTable (#5 file above).

So if we put all above structure together and identify them what all present on disk, on-heap memory and off-heap memory, it will look like below

cass_arch

 

 

 

 

 

 

 

 

 

 

 

Hope this helps !!

Advertisements

Apache Cassandra – NoSQL storage solution

These days I am exploring another storage solution – Cassandra.

Apache Cassandra datastore was originally developed by Facebook as open source NoSQL data storage system. Its actually based on Amazon’s dynamoDB database. Apache Cassandra is an open source distributed database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients.

Datastax Technology has created enterprise edition of Cassandra which is built on Apache Cassandra. Today we have multiple flavors of Cassandra available from Apache as well as datastax.

Cassandra is a NoSQL database storage solution and it stores the data using simple key-value pairs. Along with enterprise software, datastax also provide huge documentation for learning Cassandra. They also provide self-paced training and instructor led training for learning Cassandra.

I have started learning Cassandra using self-paced training available at following location – https://academy.datastax.com/courses

Apart from that, datastax also has very active blog where they discuss different issues and features available in Cassandra – http://www.datastax.com/dev/blog/

Installation:

You can either go with full installation of Cassandra on multiple physical nodes and creating a cluster or you can simulate a cluster on single node using CCM (Cassandra Cluster Manager).

Going for official Cassandra software on multiple physical nodes might not be feasible for everyone. Thats why CCM is the best utility to learn Cassandra.

You can find instruction to install CCM at following location – http://www.datastax.com/dev/blog/ccm-a-development-tool-for-creating-local-cassandra-clusters

Valid Versions:

At the time of this writing, most stable version of Apache Cassandra is 2.1.14. Latest version of Apache Cassandra released is 2.1.15. You have older version like 2.0.9 which was also stable.

You can get complete list of Apache Cassandra at – http://archive.apache.org/dist/cassandra/

You can check datastax community versions of Cassandra at http://planetcassandra.org/cassandra/

Community version is for learning and is free to download, install and play around.

CCM Installation Issue:

I faced following issue when I installed CCM on my ubuntu 12.04 machine.

ccm create --version=2.0.9 --nodes=6 deo
Downloading http://archive.apache.org/dist/cassandra/2.0.9/apache-cassandra-2.0.9-src.tar.gz to /tmp/ccm-2oKzAH.tar.gz (10.810MB)
  11335077  [100.00%]
Extracting /tmp/ccm-2oKzAH.tar.gz as version 2.0.9 ...
Compiling Cassandra 2.0.9 ...
Deleted /home/local/advaitd/.ccm/repository/2.0.9 due to error
Traceback (most recent call last):
  File "/usr/local/bin/ccm", line 5, in <module>
    pkg_resources.run_script('ccm==2.0.3.1', 'ccm')
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 499, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/lib/python2.7/dist-packages/pkg_resources.py", line 1235, in run_script
    execfile(script_filename, namespace, namespace)
  File "/usr/local/lib/python2.7/dist-packages/ccm-2.0.3.1-py2.7.egg/EGG-INFO/scripts/ccm", line 72, in <module>
    cmd.run()
  File "/usr/local/lib/python2.7/dist-packages/ccm-2.0.3.1-py2.7.egg/ccmlib/cmds/cluster_cmds.py", line 127, in run
    cluster = Cluster(self.path, self.name, install_dir=self.options.install_dir, version=self.options.version, verbose=True)
  File "/usr/local/lib/python2.7/dist-packages/ccm-2.0.3.1-py2.7.egg/ccmlib/cluster.py", line 51, in __init__
    dir, v = self.load_from_repository(version, verbose)
  File "/usr/local/lib/python2.7/dist-packages/ccm-2.0.3.1-py2.7.egg/ccmlib/cluster.py", line 64, in load_from_repository
    return repository.setup(version, verbose)
  File "/usr/local/lib/python2.7/dist-packages/ccm-2.0.3.1-py2.7.egg/ccmlib/repository.py", line 40, in setup
    download_version(version, verbose=verbose, binary=binary)
  File "/usr/local/lib/python2.7/dist-packages/ccm-2.0.3.1-py2.7.egg/ccmlib/repository.py", line 221, in download_version
    raise e
ccmlib.common.CCMError: Error compiling Cassandra. See /home/local/advaitd/.ccm/repository/last.log for details

 

I posted the same error on github community and immediately got a solution – https://github.com/pcmanus/ccm/issues/268

Suggested me to use binary version of Cassandra for download -v binary:2.0.9. Cluster creation was successful after using binary version.

You can create as many nodes cluster as you want to. All it does is, it creates those many directories and treat them as separate nodes.

I create six node cluster on my ubuntu machine.

advaitd@desktop:~$ ccm status
Cluster: 'deo'
--------------
node1: UP
node3: UP
node2: UP
node5: UP
node4: UP
node6: UP

CCM Installation details:

CCM creates hidden directory under your home directory and a separate installation directory for each node under that hidden directory as shown below.

advaitd@desktop:~$ pwd
/home/local/advaitd
advaitd@desktop:~$ ls -rlt .ccm
total 12
drwxr-xr-x 3 advaitd domain^users 4096 Apr 23 12:15 repository
-rw-r--r-- 1 advaitd domain^users    4 Apr 23 12:15 CURRENT
drwxr-xr-x 8 advaitd domain^users 4096 Apr 23 12:15 deo
advaitd@desktop:~$ ls -rlt .ccm/deo/
total 28
-rw-r--r-- 1 advaitd domain^users  291 Apr 23 12:15 cluster.conf
drwxr-xr-x 8 advaitd domain^users 4096 Apr 23 12:16 node2
drwxr-xr-x 8 advaitd domain^users 4096 Apr 23 12:16 node1
drwxr-xr-x 8 advaitd domain^users 4096 Apr 23 12:16 node5
drwxr-xr-x 8 advaitd domain^users 4096 Apr 23 12:16 node3
drwxr-xr-x 8 advaitd domain^users 4096 Apr 23 12:16 node6
drwxr-xr-x 8 advaitd domain^users 4096 Apr 23 12:16 node4
advaitd@desktop:~$

So in each of the above node directory we have a Cassandra software installed. Each of above node directory is considered as separate node and cluster is created.

Cassandra binary is running from each of the node directory. So we should be seeing 6 cassandra processes running on that host as shown below.

advaitd@desktop:~$ ps -aef | grep cassandra | grep -v grep | wc -l
6

I will be doing more learning and posting articles on Cassandra as well.

References:

http://cassandra.apache.org/

http://www.datastax.com/

http://en.wikipedia.org/wiki/Apache_Cassandra

https://github.com/pcmanus/ccm/issues

http://www.datastax.com/blog

http://docs.datastax.com/en/index.html