Subscribe to Posts by Email

Subscriber Count

    701

Disclaimer

All information is offered in good faith and in the hope that it may be of use for educational purpose and for Database community purpose, but is not guaranteed to be correct, up to date or suitable for any particular purpose. db.geeksinsight.com accepts no liability in respect of this information or its use. This site is independent of and does not represent Oracle Corporation in any way. Oracle does not officially sponsor, approve, or endorse this site or its content and if notify any such I am happy to remove. Product and company names mentioned in this website may be the trademarks of their respective owners and published here for informational purpose only. This is my personal blog. The views expressed on these pages are mine and learnt from other blogs and bloggers and to enhance and support the DBA community and this web blog does not represent the thoughts, intentions, plans or strategies of my current employer nor the Oracle and its affiliates or any other companies. And this website does not offer or take profit for providing these content and this is purely non-profit and for educational purpose only. If you see any issues with Content and copy write issues, I am happy to remove if you notify me. Contact Geek DBA Team, via geeksinsights@gmail.com

Pages

Cassandra for Oracle DBA’s Part 4 – Data Distribution across Nodes

Shared Disk architecture in RAC where the data is shared across the nodes and at any point of time , every instance will give the same data using cache fusion.

But in Cassandra, the important part is storage is not shared storage, instead its sharded means partitioned and stored in each node. 

And when you see RAC architecture we see an horizontal landscape (the hub nodes), like below figure 1, where in Cassandra nodes formed as ring (the peer architecture or hub-leaf spoke technology). In a ring architecture every node acts as coordinator there is no master in cluster unlike RAC.

Note: In 12c we use the same mechanism called Flex Cluster, but still the Hub Nodes aka database uses shared storage. Do not  confuse with it.

Diagrams RAC & Cassandra

Image result for flex cluster oracle rac architecture                       

             RAC 12c Flex Cluster Architecture                                          Cassandra Sharded Cluster Nodes

 

Well, if the data is not shared , we as a DBA the common questions that come to our mind are, 🙂

1. How does the data is distributed to the nodes?

2. How does nodes see a consistent data if a read happens? 

3. How does node knows if a write happen in one node the other node knows about it?

4. How about data availability?What if the one of the node corrupted or disk failed?

Before to answer this questions, Lets catch up our ASM Diskgroup Mirroring and Striping methods

ASM uses diskgrup level redundancy aka mirroring with Normal (two copies) and High (three copies), so at any time it maintains the data that stored in data diskgroup that many copies. Even a diskgroup/disk failure will not have any data loss.

ASM uses disk level striping so it evenly distribute the data to the number of disks in that diskgroup, when a disk added or deleted the data is distributed to other disk first called re-balancing and then allow disk operations on it. This ensures the data can be read / write evenly to avoid a single point of contention on disks.

Keeping those in mind, Just mark yourself rather diskgroup level, in Cassandra the data is maintained at node level replication and stripingSo each node store its own set of data and may be a replicated data of other nodes.

In Cassandra, Mirroring is achieved by Replication Factor and Striping is achieved by Token Range of the nodes called Partitioner ( I am not covering Vnodes here )

Partitioner/Tokens:- Cassandra uses a hashing mechanism to assign a token range for a node. When a node is added and deleted one must re-assign the tokens (not the vnodes configuration) so that data will be redistributed to the other/new nodes as like the ASM Diskgroup Rebalancing. Now with token in place for each node, there is another thing that required is table level partition key, which is based on a column to store certain partitioned data into a separate node. And then while any DML operation performed that row will be hashed and checked with node token range and insert into that particular node. In this way each node has a non shared data means a part of data for a table.

Cassandra uses different types of hash mechanism to provide token to nodes and A partitioner determines how data is distributed across the nodes in the cluster (including replicas). Basically a partitioner is a function for deriving a token representing a row from its partion key, typically by hashing. Each row of data is then distributed across the cluster by the value of the token. Both the Murmur3Partitioner and RandomPartitioner use tokens to help assign equal portions of data to each node and evenly distribute data from all the tables throughout the ring or other grouping, such as a keyspace. This is true even if the tables use different partition key, such as usernames or timestamps. Moreover, the read and write requests to the cluster are also evenly distributed and load balancing is simplified because each part of the hash range receives an equal number of rows on average.

You can set the partitioner type in cassandra.yaml file using

  • Murmur3Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
  • RandomPartitioner: org.apache.cassandra.dht.RandomPartitioner
  • ByteOrderedPartitioner: org.apache.cassandra.dht.ByteOrderedPartitioner

Mirroring/Replication :- Cassandra stores replicas on multiple nodes to ensure reliability and fault tolerance. A replication strategy determines the nodes where replicas are placed. The total number of replicas across the cluster is referred to as the replication factor. 

Replication Factor can be configured at Key Space Level. i.e A tablespace/schema level, and then each object/table in that Keyspace will maintain that Replicas in nodes. A replication factor of 1 means that there is only one copy of each row on one node. A replication factor of 2 means two copies of each row, where each copy is on a different node. All replicas are equally important; there is no primary or master replica. As a general rule, the replication factor should not exceed the number of nodes in the cluster. However, you can increase the replication factor and then add the desired number of nodes later. 

Replication Strategies can be determined by Using class option at Key Space Level so that each data center will maintain a set of data replicas. 

Two replication strategies are available:

 

  • SimpleStrategy: Use only for a single data center and one rack. If you ever intend more than one data center, use the NetworkTopologyStrategy.
  • NetworkTopologyStrategy: Highly recommended for most deployments because it is much easier to expand to multiple data centers when required by future expansion.

With above, look at the example with replication factor 3 with single and multiple datacenter.

Mutli-datacenter clusters for geograpically separate data

 

 

 

 

 

 

 

 

 

Now lets answer the questions above.

1. How does the data is distributed to the nodes?

Ans: Data Distribution of nodes is based on Token Range assigned to the node. So each node stores different data.

2. How about data availability?What if the one of the node corrupted or disk failed?

Ans: As such the data is replicated to other nodes, even if a node / disk failed the other nodes can get the data. This way data high availability is maintained.

3. As if the data is not shared how about read consistency and write consistency.

Ans: Cassandra Maintains the eventual consistency in forms of multiple read requests to the nodes if it sees the data digest is not matching. So it try to read the most recent data from other replicated nodes.  

Read Next post for Read/Write Paths.

Thanks

Geek DBA

Comments are closed.