Subscribe to Posts by Email

Subscriber Count

    696

Disclaimer

All information is offered in good faith and in the hope that it may be of use for educational purpose and for Database community purpose, but is not guaranteed to be correct, up to date or suitable for any particular purpose. db.geeksinsight.com accepts no liability in respect of this information or its use. This site is independent of and does not represent Oracle Corporation in any way. Oracle does not officially sponsor, approve, or endorse this site or its content and if notify any such I am happy to remove. Product and company names mentioned in this website may be the trademarks of their respective owners and published here for informational purpose only. This is my personal blog. The views expressed on these pages are mine and learnt from other blogs and bloggers and to enhance and support the DBA community and this web blog does not represent the thoughts, intentions, plans or strategies of my current employer nor the Oracle and its affiliates or any other companies. And this website does not offer or take profit for providing these content and this is purely non-profit and for educational purpose only. If you see any issues with Content and copy write issues, I am happy to remove if you notify me. Contact Geek DBA Team, via geeksinsights@gmail.com

Pages

PMON (ospid: nnnn): terminating the instance due to error 481

 

We got a known issue with ASM not coming up in second and subsequently failed to start crs and other resources.

Review grid alert log & os Logs

  • $GRID_HOME/log/<nodename>/alert<nodename>.log

oifcfg shows

$oifcfg –getif

eth0 192.168.2.10 global_clusterinterconnect

eth1 192.168.10.2 global

usb0 169.254.95.0

eth0:2 169.254.96.0

eth0:3 169.254.95.0

 

From 11g R2 (I believe 11.2.0.2 onwards) there is a cluster resource called HAIP which used to manage the cluster interconnects high availability. Prior to 11gr2 if cluster interconnect goes down there will be hang/node evictions depends on the situation. Where in from 11gr2 onwards we can specify up to 3 (as I known) cluster interconnects for a cluster which internally manages with this non-routable IP’s, Essentially, even if one of the physical interface is offline, private interconnect traffic can be routed through the other available physical interface. This leads to highly available architecture for private interconnect traffic.

 

Nice explanation from Riyaz’s Note:-

HAIP, High Availability IP, is the Oracle based solution for load balancing and failover for private interconnect traffic. Typically, Host based solutions such as Bonding (Linux), Trunking (Solaris) etc is used to implement high availability solutions for private interconnect traffic. But, HAIP is an Oracle solution for high availability. During initial start of clusterware, a non-routeable IP address is plumbed on the private subnet specified. That non-routable IP is used by the clusterware and the database for private interconnect traffic.

 

Now back to the issue:-

As you can see the usb0 is attached to the the internal IP (169.254.X.X is an non routable IP’s range internal to OS) , clusterware will confused with this and unable to start the crs resources promptly.

Clusterware picked two IP addresses on 169.254.x.x subnet on eth1 private interface as shown below. These two IP addresses will be used by the clusterware and RAC database for private interconnect traffic.

$ifconfig –a
...
eth0:2: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
inet 169.254.95.95 netmask ffff8000 broadcast 169.254.95.255
eth0:3: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 3
inet 169.254.95.96 netmask ffff8000 broadcast 169.254.95.255

Review of database shows that these two IP addresses are used for private interconnects in node 1.

SQL> select * from gv$cluster_interconnects
INST_ID     NAME             IP_ADDRESS           IS_ SOURCE
---------- --------------- ---------------- --- -----------------------------1           eth0:3           169.254.95.95        NO
1           eth0:2           169.254.95.96        NO

 

Solution:-

For now we got to know that some intermediate device like USB0, (usb printers also possible) enabled in server causing this issue we have asked Unix team to disable the same, Further also request them to disable at bios level to not get repeated the same after a reboot.

 

Hope this helps!!!!

4 comments to PMON (ospid: nnnn): terminating the instance due to error 481

  • Hi,

    The usb0 is a device provided by the IMM’s server on the OS side and this interface is configured to use a DHCP server. Since no link and no DHCP server are available an adress in the 169.254.0.0/16 subnet is picked up, as stated in the RFC. At this point, tthe problem is in fact that Oracle doesn’t respect the terms of RFC 3927 ? For me , it’s an Oracle bug…

    • Hi,

      I am not too good on OS / hardware, what you told may be correct. but from the 11gR2 RAC cluster interconnect concerning, this RFC (169.254.*.*) will be used to provide high availability, where in if anything (any device) is using this route (i.e 169.254.*.*) will be reached by clusterware for checking cluster integrity and as they cannot respond the way oracle cluster understand, the nodes will be evicted or does not start. This is basically a confusion (may be oracle handling not correctly or bug as you said) between the cluster to see the devices (usb) as network devices.

      -Thanks
      Geek DBA

  • Mane

    Hi,

    had an error 481 today. But on my 2 cluster nodes the usb network device already was disabled.
    Grid Infrastructure was running on node 1 but couldn’t start on node 2 after reboot of node 2.

    In Oracle KB (Document ID 1383737.1) I found the solution.
    Node 1 (yes Node ONE, not TWO) had no route to 169.254.0.0/16
    Node 2 had the correct route.

    Adding the route on Node 1 with “route add -net 169.254.0.0 netmask 255.255.0.0 dev bond1” saved my friday 🙂

    Cheers