Cluster integrity and cluster membership will be governed by occsd (oracle cluster synchronization daemon) monitors the nodes using 2 communication channels:
- Private Interconnect aka Network Heartbeat
- Voting Disk based communication aka Disk Heartbeat
Network heartbeat:-
Each node in the cluster is “pinged” every second
- Nodes must respond in css_misscount time (defaults to 30 secs.)
– Reducing the css_misscount time is generally not supported - Network heartbeat failures will lead to node evictions
- CSSD-log:
[date / time] [CSSD][1111902528]
clssnmPollingThread: node mynodename (5) at 75% heartbeat fatal, removal in 6.7 sec
Disk Heartbeat:-
Each node in the cluster “pings” (r/w) the Voting Disk(s) every second
- Nodes must receive a response in (long / short) diskTimeout time
– IF I/O errors indicate clear accessibility problems timeout is irrelevant - Disk heartbeat failures will lead to node evictions
- CSSD-log: …
[CSSD] [1115699552] >TRACE: clssnmReadDskHeartbeat:node(2) is down. rcfg(1) wrtcnt(1) LATS(63436584) Disk lastSeqNo(1)
Now, we know with above possibilities (network, disk heartbeat failures can lead to node eviction, but sometime when the server/occsd/resource request also makes the node get evicted which are extreme cases)
Why nodes should be evicted?
Evicting (fencing) nodes is a preventive measure (it’s a good thing)!
- Nodes are evicted to prevent consequences of a split brain:
– Shared data must not be written by independently operating nodes
– The easiest way to prevent this is to forcibly remove a node from the cluster
How are nodes evicted? – STONITH
Once it is determined that a node needs to be evicted,
- A “kill request” is sent to the respective node(s)
– Using all (remaining) communication channels - A node (CSSD) is requested to “kill itself” - “STONITH like”
– “STONITH” foresees that a remote node kills the node to be evicted
- EXAMPLE: Voting Disk Failure
- Voting Disks and heartbeat communication is used to determine the node
- In a 2 node cluster, the node with the lowest node number should survive
- In a n-node cluster, the biggest sub-cluster should survive (votes based)
EXAMPLE: Network heartbeat failure
- The network heartbeat between nodes has failed
– It is determined which nodes can still talk to each other
– A “kill request” is sent to the node(s) to be evicted - Using all (remaining) communication channels Voting Disk(s)
- A node is requested to “kill itself”; executer: typically CSSD
EXAMPLE: What if CSSD is stuck or server itself is not responding?
A node is requested to “kill itself”
- BUT CSSD is “stuck” or “sick” (does not execute) – e.g.:
– CSSD failed for some reason
– CSSD is not scheduled within a certain margin
- OCSSDMONITOR (was: oprocd) will take over and execute
EXAMPLE: Cluster member (rac instance) can request a to kill another member (RAC Instance)
A cluster member (rac instance ) can request a kill another member in order to protect the data integrity, in such cases like control file progress record not written proper by the failure instance(read here) , then occsd tries to kill that member, if not possible its tries to evict the node.
11gR2 Changes –> Important, in 11GR2, the fencing (eviction) does not to reboot.
- Until Oracle Clusterware 11.2.0.2, fencing (eviction) meant “re-boot”
- With Oracle Clusterware 11.2.0.2, re-boots will be seen less, because:
– Re-boots affect applications that might run an a node, but are not protected
– Customer requirement: prevent a reboot, just stop the cluster – implemented...
How does this works?
With Oracle Clusterware 11.2.0.2, re-boots will be seen less: Instead of fast re-booting the node, a graceful shutdown of the cluster stack is attempted
- It starts with a failure – e.g. network heartbeat or interconnect failure
- Then IO issuing processes are killed; it is made sure that no IO process remains
– For a RAC DB mainly the log writer and the database writer are of concern- Once all IO issuing processes are killed, remaining processes are stopped
– IF the check for a successful kill of the IO processes, fails → reboot- Once all remaining processes are stopped, the stack stops itself with a “restart flag”
- OHASD will finally attempt to restart the stack after the graceful shutdown
Exception to above:-
- IF the check for a successful kill of the IO processes fails → reboot
- IF CSSD gets killed during the operation → reboot
- IF cssdmonitor (oprocd replacement) is not scheduled → reboot
- IF the stack cannot be shutdown in “short_disk_timeout”-seconds → reboot
[…] Read here […]
Very useful information. I have also listed Top 4 reasons for Node reboot or node Eviction at http://www.dbas-oracle.com/2013/06/Top-4-Reasons-Node-Reboot-Node-Eviction-in-Real-Application-Cluster-RAC-Environment.html.
Bro!! –
Simple & wonderfull!!.. Good work.. added your blog to my list..
Thanks Geek DBA.
Raj.
Hi whatever you publish are really valuable and makes sense.
Thanks for the needful information and it means a lot.
I hope it continues and rocks
Hi
I have a cluster with 3 nodes node1,node2,node3. Suppose if a node eviction happens then how can i get the information that which node got evicted?
Eg. Suppose if i am logged into node1 and node2 got evicted then can i get the info that which node got evicted from the node1 itself or i have to log in to the all the machines and i have to check which node or nodes got evicted.
please help
Thank you