Subscribe to Posts by Email

Subscriber Count

    696

Disclaimer

All information is offered in good faith and in the hope that it may be of use for educational purpose and for Database community purpose, but is not guaranteed to be correct, up to date or suitable for any particular purpose. db.geeksinsight.com accepts no liability in respect of this information or its use. This site is independent of and does not represent Oracle Corporation in any way. Oracle does not officially sponsor, approve, or endorse this site or its content and if notify any such I am happy to remove. Product and company names mentioned in this website may be the trademarks of their respective owners and published here for informational purpose only. This is my personal blog. The views expressed on these pages are mine and learnt from other blogs and bloggers and to enhance and support the DBA community and this web blog does not represent the thoughts, intentions, plans or strategies of my current employer nor the Oracle and its affiliates or any other companies. And this website does not offer or take profit for providing these content and this is purely non-profit and for educational purpose only. If you see any issues with Content and copy write issues, I am happy to remove if you notify me. Contact Geek DBA Team, via geeksinsights@gmail.com

Pages

Oracle RAC: Node evictions & 11gR2 node eviction means restart of cluster stack not reboot of node

Cluster integrity and cluster membership will be governed by occsd (oracle cluster synchronization daemon) monitors the nodes using 2 communication channels:

- Private Interconnect  aka Network Heartbeat
- Voting Disk based communication aka  Disk Heartbeat

Network heartbeat:-

Each node in the cluster is “pinged” every second

  • Nodes must respond in css_misscount time (defaults to 30 secs.)
          – Reducing the css_misscount time is generally not supported
  • Network heartbeat failures will lead to node evictions
  • CSSD-log:
        [date / time] [CSSD][1111902528]
        clssnmPollingThread: node mynodename (5) at 75% heartbeat fatal, removal  in 6.7 sec

Disk Heartbeat:-

Each node in the cluster “pings” (r/w) the Voting Disk(s) every second

  • Nodes must receive a response in (long / short) diskTimeout time
            – IF I/O errors indicate clear accessibility problems  timeout is irrelevant
  • Disk heartbeat failures will lead to node evictions
  • CSSD-log: …
         [CSSD] [1115699552] >TRACE: clssnmReadDskHeartbeat:node(2) is down. rcfg(1) wrtcnt(1) LATS(63436584) Disk lastSeqNo(1)

Now, we know with above possibilities (network, disk heartbeat failures can lead to node eviction, but sometime when the server/occsd/resource request also makes the node get evicted which are extreme cases)

Why nodes should be evicted?

Evicting (fencing) nodes is a preventive measure (it’s a good thing)!

  • Nodes are evicted to prevent consequences of a split brain:
        – Shared data must not be written by independently operating nodes
        – The easiest way to prevent this is to forcibly remove a node from the cluster

How are nodes evicted? – STONITH
Once it is determined that a node needs to be evicted,

  • A “kill request” is sent to the respective node(s)
        – Using all (remaining) communication channels
  • A node (CSSD) is requested to “kill itself” - “STONITH like”
        – “STONITH” foresees that a remote node kills the node to be evicted
    EXAMPLE: Voting Disk Failure
    Voting Disks and heartbeat communication is used to determine the node

  • In a 2 node cluster, the node with the lowest node number should survive
  • In a n-node cluster, the biggest sub-cluster should survive (votes based)

EXAMPLE: Network heartbeat failure

  • The network heartbeat between nodes has failed
          – It is determined which nodes can still talk to each other
          – A “kill request” is sent to the node(s) to be evicted
  • Using all (remaining) communication channels  Voting Disk(s)
  • A node is requested to “kill itself”; executer: typically CSSD

EXAMPLE: What if CSSD is stuck or server itself is not responding?

A node is requested to “kill itself”

  • BUT CSSD is “stuck” or “sick” (does not execute) – e.g.:
  •           – CSSD failed for some reason
             – CSSD is not scheduled within a certain margin

    OCSSDMONITOR (was: oprocd) will take over and execute

EXAMPLE: Cluster member (rac instance) can request a to kill another member (RAC Instance)

A cluster member (rac instance ) can request a kill another member in order to protect the data integrity, in such cases like control file progress record not written proper by the failure instance(read here) , then occsd tries to kill that member, if not possible its tries to evict the node.

 

11gR2 Changes –> Important, in 11GR2, the fencing (eviction) does not to reboot.

  • Until Oracle Clusterware 11.2.0.2, fencing (eviction) meant “re-boot”
  • With Oracle Clusterware 11.2.0.2, re-boots will be seen less, because:
         – Re-boots affect applications that might run an a node, but are not protected
         – Customer requirement: prevent a reboot, just stop the cluster – implemented...

How does this works?

With Oracle Clusterware 11.2.0.2, re-boots will be seen less: Instead of fast re-booting the node, a graceful shutdown of the cluster stack is attempted

 

  • It starts with a failure – e.g. network heartbeat or interconnect failure
  • Then IO issuing processes are killed; it is made sure that no IO process remains
         – For a RAC DB mainly the log writer and the database writer are of concern
  • Once all IO issuing processes are killed, remaining processes are stopped
         – IF the check for a successful kill of the IO processes, fails → reboot
  • Once all remaining processes are stopped, the stack stops itself with a “restart flag”
  • OHASD will finally attempt to restart the stack after the graceful shutdown
  •    Exception to above:- 

  • IF the check for a successful kill of the IO processes fails → reboot
  • IF CSSD gets killed during the operation → reboot
  • IF cssdmonitor (oprocd replacement) is not scheduled → reboot
  • IF the stack cannot be shutdown in “short_disk_timeout”-seconds → reboot

5 comments to Oracle RAC: Node evictions & 11gR2 node eviction means restart of cluster stack not reboot of node