Adding a Node, is straight forward,
1. Generate a token list by using script and so you can get the token range for new node.
2. Download the cassandra software, unpack it and change the cassandra.yaml of three important following parameters
cluster_name: 'geek_cluster'
seeds: "127.0.0.1, 127.0.0.2,127.0.0.3"
listen_address: 127.0.0.4
rpc_address: 127.0.0.4
token:
3. Start the Cassandra
$CASSANDRA_HOME/bin/cassandra -f
Now when the new node bootstraps to cluster, there's the behind the scenes that start working, data rebalance.
If you recollect the ASM Disk operations, when you add / delete the disk at diskgroup level, the existing data should be rebalanced to other or to new disks with in the diskgroup. similarly cassandra does the same but at node level with token range that node have.
So with three nodes, the ownership data shows 33.3% of data its own,
root@wash-i-16ca26c8-prod ~/.ccm $ ccm node1 nodetool ring
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: datacenter1
Address Rack Status State Load Owns Token
3074457345618258602
127.0.0.1 rack1 Up Normal 24.84 KB 33.33% -9223372036854775808
127.0.0.2 rack1 Up Normal 24.8 KB 33.33% -3074457345618258603
127.0.0.3 rack1 Up Normal 24.87 KB 33.33% 3074457345618258602
Added a node with CCM rather manually,
ccm add --itf 127.0.0.4 --jmx-port 7400 -b node4
Check the status again , after a nodetool repair
root@wash-i-16ca26c8-prod ~/.ccm $ ccm node1 nodetool ring
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: datacenter1
==========
Address Rack Status State Load Owns Token
3074457345618258602
127.0.0.1 rack1 Up Normal 43.59 KB 33.33% -9223372036854775808
127.0.0.4 rack1 Up Normal 22.89 KB 16.67% -6148914691236517206
127.0.0.2 rack1 Up Normal 48.36 KB 16.67% -3074457345618258603
127.0.0.3 rack1 Up Normal 57.37 KB 33.33% 3074457345618258602
As you see, with three nodes the each own 33%, where with four nodes two nodes have rebalanced it data of 16.67% each due to new token range assigned to it.
This way node additions/deletions would not have an impact of data loss since the rebalance operation is online and behind the scenes as like ASM.
While doing rebalancing one can check the following to understand how much completed and how much pending, as like v$asm_operation.
root@wash-i-16ca26c8-prod ~/.ccm/repository/4.5.2/demos/portfolio_manager/bin $ ccm node1 nodetool netstats
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name Active Pending Completed
Commands n/a 0 140361
Responses n/a 0 266253
If a node is leaving from the cluster, this will also visible with nodetool netstats command
root@wash-i-16ca26c8-prod ~/.ccm/repository/4.5.2/demos/portfolio_manager/bin $ ccm node4 nodetool netstats
Mode: LEAVING
Not sending any streams.
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name Active Pending Completed
Commands n/a 0 159
Responses n/a 0 238788
Further, to delete a node, nodetool decommission should be used rather remove since remove directly drop the node and delete data without rebalance. Here i directly removed the node4
root@wash-i-16ca26c8-prod ~ $ ccm node4 remove
Status shows only three nodes are up,
root@wash-i-16ca26c8-prod ~ $ ccm status
Cluster: 'geek_cluster'
node1: UP
node3: UP
node2: UP
root@wash-i-16ca26c8-prod ~ $ ccm node1 nodetool status
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: Cassandra
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN 127.0.0.1 2.09 MB 1 33.3% 1dc82d65-f88d-4b79-9c1b-dc5aa2a55534 rack1
UN 127.0.0.2 3.02 MB 1 23.6% ab247945-5989-48f3-82b3-8f44a3aaa375 rack1
UN 127.0.0.3 3.22 MB 1 33.3% 023a4514-3a74-42eb-be49-feaa69bf098c rack1
DN 127.0.0.4 3.39 MB 1 9.8% 9d5b4aee-6707-4639-a2d8-0af000c25b45 rack1
See the Node 4 status showing DN, and it holds 9.8% of data which seems to be lost due to direct remove command, and I stopped the Cassandra and started again, and here is the result.
root@wash-i-16ca26c8-prod ~/.ccm/repository/4.5.2/resources/cassandra/conf $ ccm start
[node1 ERROR] org.apache.cassandra.io.sstable.CorruptSSTableException: org.apache.cassandra.io.compress.CorruptBlockException: (/var/lib/cassandra/data/system/local/system-local-jb-93-Data.db): corruption detected, chunk at 0 of length 261.
Tried to do the nodetool repair, to repair the data whilst it wont allowed to do the repair on node4
root@wash-i-16ca26c8-prod ~/.ccm/repository/4.5.2/bin $ ccm node1 nodetool repair
Traceback (most recent call last):
File "/usr/local/bin/ccm", line 86, in <module>
cmd.run()
File "/usr/local/lib/python2.7/site-packages/ccmlib/cmds/node_cmds.py", line 267, in run
stdout, stderr = self.node.nodetool(" ".join(self.args[1:]))
File "/usr/local/lib/python2.7/site-packages/ccmlib/dse_node.py", line 264, in nodetool
raise NodetoolError(" ".join(args), exit_status, stdout, stderr)
ccmlib.node.NodetoolError: Nodetool command '/root/.ccm/repository/4.5.2/bin/nodetool -h localhost -p 7100 repair' failed; exit status: 1; stdout: [2016-07-13 01:19:08,567] Nothing to repair for keyspace 'system'
[2016-07-13 01:19:08,573] Starting repair command #1, repairing 2 ranges for keyspace PortfolioDemo
[2016-07-13 01:19:10,719] Repair session cc495b80-4897-11e6-9deb-e7c99fc0dbe2 for range (-3074457345618258603,3074457345618258602] finished
[2016-07-13 01:19:10,720] Repair session cd8beda0-4897-11e6-9deb-e7c99fc0dbe2 for range (3074457345618258602,-9223372036854775808] failed with error java.io.IOException: Cannot proceed on repair because a neighbor (/127.0.0.4) is dead: session failed
[2016-07-13 01:19:10,720] Repair command #1 finished
[2016-07-13 01:19:10,728] Starting repair command #2, repairing 4 ranges for keyspace dse_system
[2016-07-13 01:19:10,735] Repair session cd8e1080-4897-11e6-9deb-e7c99fc0dbe2 for range (-3074457345618258603,3074457345618258602] failed with error java.io.IOException: Cannot proceed on repair because a neighbor (/127.0.0.4) is dead: session failed
[2016-07-13 01:19:10,736] Repair session cd8e3790-4897-11e6-9deb-e7c99fc0dbe2 for range (3074457345618258602,-9223372036854775808] failed with error java.io.IOException: Cannot proceed on repair because a neighbor (/127.0.0.4) is dead: session failed
[2016-07-13 01:19:10,737] Repair session cd8eacc0-4897-11e6-9deb-e7c99fc0dbe2 for range (-9223372036854775808,-7422755166451980864] failed with error java.io.IOException: Cannot proceed on repair because a neighbor (/127.0.0.4) is dead: session failed
[2016-07-13 01:19:10,738] Repair session cd8ed3d0-4897-11e6-9deb-e7c99fc0dbe2 for range (-7422755166451980864,-3074457345618258603] failed with error java.io.IOException: Cannot proceed on repair because a neighbor (/127.0.0.4) is dead: session failed
[2016-07-13 01:19:10,738] Repair command #2 finished
So best way to do the Node deletion is with decommission once the node show decommission you can remove the node.
root@wash-i-16ca26c8-prod ~/.ccm/repository/4.5.2/resources/cassandra/conf $ ccm node4 nodetool ring
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: Cassandra
=====================
Address Rack Status State Load Owns Token
3074457345618258602
127.0.0.1 rack1 Up Normal 3.05 MB 33.33% -9223372036854775808
127.0.0.2 rack1 Up Normal 2.99 MB 33.33% -3074457345618258603
127.0.0.3 rack1 Up Normal 3.5 MB 33.33% 3074457345618258602
root@wash-i-16ca26c8-prod ~/.ccm/repository/4.5.2/resources/cassandra/conf $ ccm node1 nodetool netstats
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 2
Mismatch (Blocking): 0
Mismatch (Background): 1
Pool Name Active Pending Completed
Commands n/a 0 5116
Responses n/a 0 243591
root@wash-i-16ca26c8-prod ~/.ccm/repository/4.5.2/resources/cassandra/conf $ ccm node1 nodetool netstats
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 2
Mismatch (Blocking): 0
Mismatch (Background): 1
Pool Name Active Pending Completed
Commands n/a 0 5116
Responses n/a 0 243607
root@wash-i-16ca26c8-prod ~/.ccm/repository/4.5.2/resources/cassandra/conf $ ccm node2 nodetool netstats
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name Active Pending Completed
Commands n/a 0 5955
Responses n/a 0 245289
root@wash-i-16ca26c8-prod ~/.ccm/repository/4.5.2/resources/cassandra/conf $ ccm node3 nodetool netstats
Mode: NORMAL
Not sending any streams.
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name Active Pending Completed
Commands n/a 0 4652
Responses n/a 0 243249
root@wash-i-16ca26c8-prod ~/.ccm/repository/4.5.2/resources/cassandra/conf $ ccm node4 nodetool netstats
Mode: DECOMMISSIONED
Not sending any streams.
Read Repair Statistics:
Attempted: 0
Mismatch (Blocking): 0
Mismatch (Background): 0
Pool Name Active Pending Completed
Commands n/a 0 4431
Responses n/a 0 280491
root@wash-i-16ca26c8-prod ~/.ccm/repository/4.5.2/resources/cassandra/conf $ ccm node4 nodetool removenode
If you recollect the Oracle node delete we first deconfig the crs and then delete the node.
-Thanks
GEEK DBA
Follow Me!!!