All about Database Administration, Tips & Tricks

All information is offered in good faith and in the hope that it may be of use for educational purpose and for Database community purpose, but is not guaranteed to be correct, up to date or suitable for any particular purpose. db.geeksinsight.com accepts no liability in respect of this information or its use. This site is independent of and does not represent Oracle Corporation in any way. Oracle does not officially sponsor, approve, or endorse this site or its content and if notify any such I am happy to remove. Product and company names mentioned in this website may be the trademarks of their respective owners and published here for informational purpose only. This is my personal blog. The views expressed on these pages are mine and learnt from other blogs and bloggers and to enhance and support the DBA community and this web blog does not represent the thoughts, intentions, plans or strategies of my current employer nor the Oracle and its affiliates or any other companies. And this website does not offer or take profit for providing these content and this is purely non-profit and for educational purpose only. If you see any issues with Content and copy write issues, I am happy to remove if you notify me. Contact Geek DBA Team, via geeksinsights@gmail.com

Installer has detected that the diskgroup name provided already exists on the system.

Hello,

After a failed crs installation, we have removed everything by using deconfigure and deinstall utility and freshly started the grid infrastruction installation, and we got this error

Checking Temp space: must be greater than 120 MB.   Actual 2395 MB    Passed
Checking swap space: must be greater than 150 MB.   Actual 32767 MB    Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2013-05-13_05-23-16AM. Please wait ...[FATAL] [INS-30516] Please specify unique disk groups.
   CAUSE: Installer has detected that the diskgroup name provided already exists on the system.
   ACTION: Specify different disk group.

Some background:-
1) Not using asm libraries or ASM lib disk we are just providing luns with udev rules

	[root@**** rsp_files]# cd /dev/oracleasm/disks
	[root@**** rsp_files]# ls -ltr

2) Partitions does not contain any valid partition table ,means no information written on the partitions

	fdisk -l /dev/mapper/asm1grid
	Disk /dev/mapper/asmt1grid: 10.7 GB, 10737745920 bytes
	255 heads, 63 sectors/track, 1305 cylinders
	Units = cylinders of 16065 * 512 = 8225280 bytes
	Disk /dev/mapper/asmt1grid doesn't contain a valid partition table

3) And my deconfigure clusterware and deinstall worked perfectly, no errors,

Where else could be the disk group information that installer is looking and failing?

Well, there is nothing much to scratch out our head, its just there in the lun headers and stayed there. So it wont use it any more we just need to scrap it out. To do the same dd command would be right choice.

          dd if=/dev/zero of=/dev/mapper/asm1grid bs=1048576 count=50

A little explanation, i am padding my /dev/mapper/asm1grid lun with zeros of 1 mb block size and count of 50, this just an ideal , if you want u can add the blocksize also (solaris depends on the size of volume, linux 4096)

Then our installation proceeds without any issue. hope this helps to you as well.

-Thanks
Geek DBA

RAC: Cluster Interconnect performance troubleshooting – Quick reference

A quick troubleshooting reference for cluster interconnect performance in Oracle RAC to Identify network and contention issues.

a) Check for "gc cr lost blocks" wait event in Automatic Workload Repository (AWR)/sysstats.

If found, check for these errors on the NIC:

        Dropped packets/fragments
        Buffer overflows
        Packet reassembly failures or timeouts
        TX/RX errors

Use these commands to find any errors:

        netstat -s
        Ifconfig -a
        ORADEBUG

b) Identify Interconnect performance from AWR.

Under Global Cache and Enqueue Services - Workload Characteristics

        Avg global cache cr block receive time (ms): should be <=15 ms

        Global Cache and Enqueue Services - Messaging Statistics

        Avg message sent queue time on ksxp (ms): should be <1 ms

        Under Interconnect Ping Latency Stats

        Avg Latency 8K msg should be close to Avg Latency 500B msg.

c) These wait events from AWR/sysstat can indicate contention related to RAC.

        GC current block busy
        GV cr block busy
        GC current buffer busy
        GC buffer busy acquire/release

d) These wait events in the AWR indicate that there might be a Hot Block that is causing these wait events. From the AWR Segment Statistics, you can find the objects.

        Enq:TX Index Contention
        Gc buffer busy
        Gc current block busy
        Gc current split

e) This issue will be noticed if multiple sessions are inserting into a single object or are using a sequence, and the indexed column is sequentially increasing. To address the specific issues:

Identify the indexes and Global Hash Partition them.
Increase the Sequence Cache if ordering is not a problem. –

Thanks

Geek DBA

Quick Question: What are the kernel parameters to tune cluster interconnect performance

To tune the cluster interconnect performance typically the messages buffers size that should transmit between the nodes, the following kernel parameters should set. 256K is fairly adequate.

net.core.rmem_default = 262144
net.core.rmem_max = 262144
net.core.wmem_default = 262144
net.core.wmem_max = 262144

-Thanks

Geek DBA

Difference between object_id and data_object_id from dba_objects

Hello,

I have got two questions from my friend.

1) What is the difference between object_id and data_object_id?

2) Will be there any performance impact if data_object_id changes?

For the question #1,

    a) Object_id represents the primary_key or the unique identifier for the object
    b) data_object_id is the pointer (foreign key courtesy TOM) to the data segment (physical segment)

Since some objects like synonyms etc does not have segments they will not have data_object_id's which supports the argument (b) above.

Now lets move on, both will be same or different

Well, the object_id and data_object_id will be same initially, The data object id is assigned to that segment and any kind of change done on the physical segment would lead to give a change in that number for data_object_id and data_object_id will be changed for following cases

      a) Alter table or alter index
      b) Alter table exchange partition
      c) Truncate table 
      d) Alter table move or using redefinition package
      e) if the table is part of a cluster, data_object_id points to cluster segment

For question #2, will be there any performance impact?
a) Answer will be NO, (according to my knowledge, if anyone has got any pointers please comment)
b) A question in my mind, thinking logically when you use rowid in your app (a combination of datafile,block,row) shall it affect? Again data_object_id
is again a kind of logical representation number like foreign key so may not affect the rowid based calls

Any pointers to know more about on this would be most welcome.

-Thanks
Geek DBA

Difference between failed_login_attempts and sec_max_failed_login_attempts

Hello,

As we all know, We have a profile setting called failed_login_attempts, where and which when a user inputs invalid password for a given threshold default is (unlimited) , then the user will be locked. Means the profile has to part with each and every user.

Similarly, if you want to blockade/drop the connections for all users where wrong password attempt reached to defined threshold, oracle 11g introduced a parameter called sec_max_failed_login_attempts (default 10) which protects the database from malfunction access. This feature is especially used when hacker want to access the database with different users (like default users) and try to give random wrong passwords. If and if the hacker failed to crack the password for different users for 10 times then connectivity will be dropped.

This way we can protect the database from malfunction access at database level.

-Thanks
Geek DBA

Recover datafile that accidentally deleted

Thanks to Martin's blog, which show us the data file recovery when it accidentally dropped.
(apologies i lost the original link)

***** DO IT ON YOUR OWN RISK *****

Created tablespace and users and some tables.

	SQL> create tablespace test datafile '/oradata/TESTTest01.dbf' size 100m;
	Tablespace created.

	SQL> create user test identified by test default tablespace test;
	User created.

	SQL> grant connect,resource to test;
	Grant succeeded.

	SQL> grant dba to test;
	Grant succeeded.

	SQL> create table t1 as select * from dba_objects;
	Table created.

	SQL> create table t2 as select * from dba_users;
	Table created.

	SQL> conn test/test
	Connected

	SQL> select count(*) from t1;
	35000

	SQL> select count(*) from t2;
	46

Removed the datafile

	rm -f /oradata/TESTTest01.dbf

Datafile showing recover status as its got deleted

	SQL> Select file#,name,status from v$datafile;


	     FILE#	NAME				STATUS
	----------	-------------------------- 	--------------- 
		38	/oradata/TESTTest01.dbf		RECOVER

Check for the dbwr process ID, The process id for db writer process is 14906, where the dbwriter locks the file headers and useful for this diagnosis.

	
	ps -eaf | grep dbw*_
	oracle   14306     1  0 Apr03 ?        00:02:16 ora_dbw0_

using lsof, find which files the dbwr process accessing or its hanged on,

	[oracle@TESTRAC1]/users/oracle # lsof -p 14306

	COMMAND   PID   USER   FD   TYPE    DEVICE   SIZE/OFF    NODE NAME
	oracle  14306 oracle  cwd    DIR     253,7       4096  615765 /u01/app/oracle/ora112/dbs
	oracle  14306 oracle  rtd    DIR     104,2       4096       2 /
	oracle  14306 oracle  txt    REG     253,7  218550485 1130499 /u01/app/oracle/ora112/bin/oracle
	oracle  14306 oracle  DEL    REG      0,13            2228239 /SYSV48ff1ce4
	oracle  14306 oracle  mem    CHR       1,5               3894 /dev/zero
	oracle  14306 oracle  mem    REG     104,2      35688   83628 /usr/lib64/libnuma.so.1
	oracle  14306 oracle  mem    REG     104,2    1661454   16169 /lib64/libc-2.11.1.so
	oracle  14306 oracle  mem    REG     104,2     108213   16180 /lib64/libnsl-2.11.1.so
	oracle  14306 oracle  mem    REG     104,2     135646   16195 /lib64/libpthread-2.11.1.so
	oracle  14306 oracle  mem    REG     104,2     388274   16177 /lib64/libm-2.11.1.so
	oracle  14306 oracle  mem    REG     104,2      19114   16175 /lib64/libdl-2.11.1.so
	oracle  14306 oracle  mem    REG     104,2       5560   16214 /lib64/libaio.so.1.0.1
	oracle  14306 oracle  mem    REG     253,7     150599   84476 /u01/app/oracle/ora112/lib/libocrutl11.so
	oracle  14306 oracle  mem    REG     253,7    3295575   84394 /u01/app/oracle/ora112/lib/libocrb11.so
	oracle  14306 oracle  mem    REG     253,7    1559829   84362 /u01/app/oracle/ora112/lib/libocr11.so
	oracle  14306 oracle  mem    REG     253,7      12763   84499 /u01/app/oracle/ora112/lib/libskgxn2.so
	oracle  14306 oracle  mem    REG     253,7   16067428  402373 /u01/app/oracle/ora112/lib/libhasgen11.so
	oracle  14306 oracle  mem    REG     253,7     161828   84503 /u01/app/oracle/ora112/lib/libdbcfg11.so
	oracle  14306 oracle  mem    REG     253,7     216789   84310 /u01/app/oracle/ora112/lib/libclsra11.so
	oracle  14306 oracle  mem    REG     253,7    7925240   84384 /u01/app/oracle/ora112/lib/libnnz11.so
	oracle  14306 oracle  mem    REG     104,2      47206   16199 /lib64/librt-2.11.1.so
	oracle  14306 oracle  mem    REG     253,7     993144   84391 /u01/app/oracle/ora112/lib/libskgxp11.so
	oracle  14306 oracle  mem    REG     253,7     532417   84323 /u01/app/oracle/ora112/lib/libcell11.so
	oracle  14306 oracle  mem    REG     104,2     149797   16162 /lib64/ld-2.11.1.so
	oracle  14306 oracle  DEL    REG     253,6              73768 /var/run/nscd/dbQpkhsj
	oracle  14306 oracle  mem    REG     253,7      12315   84306 /u01/app/oracle/ora112/lib/libodm11.so
	oracle  14306 oracle    0r   CHR       1,3        0t0    3883 /dev/null
	oracle  14306 oracle    1w   CHR       1,3        0t0    3883 /dev/null
	oracle  14306 oracle    2w   CHR       1,3        0t0    3883 /dev/null
	oracle  14306 oracle    3r   CHR       1,3        0t0    3883 /dev/null
	oracle  14306 oracle    4r   CHR       1,3        0t0    3883 /dev/null
	oracle  14306 oracle    5r   CHR       1,3        0t0    3883 /dev/null
	oracle  14306 oracle    7r   CHR       1,3        0t0    3883 /dev/null
	oracle  14306 oracle    8r   CHR       1,3        0t0    3883 /dev/null
	oracle  14306 oracle    9r   CHR       1,3        0t0    3883 /dev/null
	oracle  14306 oracle   10r   CHR       1,5        0t0    3894 /dev/zero
	oracle  14306 oracle   11r   CHR       1,5        0t0    3894 /dev/zero
	oracle  14306 oracle   13r   REG     253,7    1092608  583904 /u01/app/oracle/ora112/rdbms/mesg/oraus.msb
	oracle  14306 oracle   14r   DIR       0,3          0  137759 /proc/14306/fd
	oracle  14306 oracle   15r   CHR       1,5        0t0    3894 /dev/zero
	oracle  14306 oracle   18r   REG     253,7    1092608  583904 /u01/app/oracle/ora112/rdbms/mesg/oraus.msb
	oracle  14306 oracle  256u   REG 199,17000   10108928     559 /oradata/TEST/Control1.ctl
	oracle  14306 oracle  257u   REG 199,17000   10108928     560 /oradata/TEST/Control2.ctl
	oracle  14306 oracle  258uW  REG 199,17000  419438592     138 /oradata/TEST/system.dbf
	oracle  14306 oracle  259uW  REG 199,17000  419438592     141 /oradata/TEST/sysaux.dbf
	oracle  14306 oracle  260uW  REG 199,17000 1992302592     142 /oradata/TEST/rollbacks.dbf
	oracle  14306 oracle  295uW  REG 199,17000 1677729792     143 /oradata/TEST/temp.dbf
	oracle  14306 oracle  296uW  REG 199,17000  104865792  260497 /oradata/TEST/Test01.dbf (deleted)

Did you observed?, the last line shows the file we just deleted. Lets check in the process for 296

	[oracle@TESTRAC1] cd /proc/14306/fd #

Cat the process 296 to a orbitory file

	[oracle@TESTRAC1]/proc/14306/fd # cat 296 > /tmp/Test01.dbf

File got created in /tmp

	[oracle@TESTRAC1]/proc/14306/fd # ls -ltr /tmp/Test01.dbf
	-rw-rw-r-- 1 oracle oracle 104865792 2013-04-24 08:45 /tmp/Test01.dbf

Make the old file offline drop

	SQL> alter database datafile '/oradata/TEST/Test01.dbf' offline drop;
	Database altered.

Rename the old file name to new file (i.e in new location)

	SQL> alter database rename file '/oradata/TEST/Test01.dbf' to '/tmp/Test01.dbf';

	SQL> alter database datafile '/tmp/Test01.dbf' online;
	alter database datafile '/tmp/Test01.dbf' online
	*
	ERROR at line 1:
	ORA-01113: file 38 needs media recovery
	ORA-01110: data file 38: '/tmp/Test01.dbf'

Recover the datafile

	SQL> recover datafile '/tmp/Test01.dbf';
	Media recovery complete.

Online the datafile

	SQL> alter database datafile '/tmp/Test01.dbf' online;
	Database altered.

Check the status

	SQL> select file#,name,status from v$datafile;


	     FILE#	NAME				STATUS
	-----------	---------------------------	----------------------------------
		38	/tmp/Test01.dbf			ONLINE

Bingo your file is back, But wait, this may work only when your database is not bounced after the deletion of the file.

Hope this helps
-Thanks
Geek DBA

RAC: Timezone differences in CRS and Database logs

Hello All,

We have got an issue about time zone differences in alert logs , crs logs with system timestamp and with system timestamp.

	 System date time 
	[oracle@TESTRAC1]/ # date
	Tue Apr 24 04:10:11 CEST 2013

But database Alert log shows,

  	Thread 1 advanced to log sequence 235 (LGWR switch)
	Current log# 11 seq# 235 mem# 0: +DATA01/db/onlinelog/redot01g01m01.rdo
	Current log# 11 seq# 235 mem# 1: +FRA01/db/onlinelog/redot01g01m02.rdo
	Fri Apr 24 45:04:50 2013
	Archived Log entry 475 added for thread 1 sequence 234 ID 0x984ece99 dest 1:
	[oracle@TESTRAC1 trace]$ date
	Fri Apr 24 06:46:16 IST 2013
    	 Note: Two hours ahead of the database time and the host time

Few checks before start actual resolution

1) Our database timezone showing correct timestamp only.

	SQL> select systimestamp from dual;
	SYSTIMESTAMP
	---------------------------------------------------------------------------
	24-APR-13 04.13.03.105848 AM +02:00

2) Hardware clock checked and its fine

	[oracle@TESTRAC1] # hwclock
	Tue 24 Apr 2013 04:13:43 AM CEST  -0.072343 seconds

3) Ran the cluster verification utility for timesynchronisation issues and its passed

        ./cluvfy cluvfy comp clocksync

	Verifying Clock Synchronization across the cluster nodes
	Checking if Clusterware is installed on all nodes...
	Check of Clusterware install passed

	Checking if CTSS Resource is running on all nodes...
	Check: CTSS Resource running on all nodes
	  Node Name                             Status
	  ------------------------------------  ------------------------
	  TESTRAC1                          	passed
	  TESTRAC2                          	passed
	Result: CTSS resource check passed


	Querying CTSS for time offset on all nodes...
	Result: Query of CTSS for time offset passed

	Check CTSS state started...
	Check: CTSS state
	  Node Name                             State
	  ------------------------------------  ------------------------
	  TESTRAC1                          	Observer
	  TESTRAC2                          	Observer
	CTSS is in Observer state. Switching over to clock synchronization checks using NTP
	Starting Clock synchronization checks using Network Time Protocol(NTP)...

	NTP Configuration file check started...
	The NTP configuration file "/etc/ntp.conf" is available on all nodes
	NTP Configuration file check passed

	Checking daemon liveness...

	Check: Liveness for "ntpd"
	  Node Name                             Running?
	  ------------------------------------  ------------------------
	  TESTRAC1                          	yes
	  TESTRAC2                         	 yes
	Result: Liveness check passed for "ntpd"
	Check for NTP daemon or service alive passed on all nodes

	Checking NTP daemon command line for slewing option "-x"
	Check: NTP daemon command line
	  Node Name                             Slewing Option Set?
	  ------------------------------------  ------------------------
	  TESTRAC1                         	 yes
	  TESTRAC2                         	 yes
	Result:
	NTP daemon slewing option check passed

	Checking NTP daemon's boot time configuration, in file "/etc/sysconfig/ntpd", for slewing option "-x"

	Check: NTP daemon's boot time configuration
	  Node Name                             Slewing Option Set?
	  ------------------------------------  ------------------------
	  TESTRAC1                        	  yes
	  TESTRAC2                         	 yes
	Result:
	NTP daemon's boot time configuration check for slewing option passed

	Checking whether NTP daemon or service is using UDP port 123 on all nodes

	Check for NTP daemon or service using UDP port 123
	  Node Name                             Port Open?
	  ------------------------------------  ------------------------
	  TESTRAC1                         	 yes
	  TESTRAC2                        	  yes

	NTP common Time Server Check started...
	NTP Time Server ".GPS." is common to all nodes on which the NTP daemon is running
	Check of common NTP Time Server passed

	Clock time offset check from NTP Time Server started...
	Checking on nodes "[TESTRAC1, TESTRAC2]"...
	Check: Clock time offset from NTP Time Server

	Time Server: .GPS.
	Time Offset Limit: 1000.0 msecs
	  Node Name     Time Offset               Status
	  ------------  ------------------------  ------------------------
	  TESTRAC1 	 0.125                     passed
	  TESTRAC2  	0.274                     passed
	Time Server ".GPS." has time offsets that are within permissible limits for nodes "[TESTRAC1, TESTRAC2]".
	Clock time offset check passed
	Result: Clock synchronization check using Network Time Protocol(NTP) passed
	Oracle Cluster Time Synchronization Services check passed
	Verification of Clock Synchronization across the cluster nodes was successful.

4) CTSSD is in observed mode, means our cluster using ntp

So everything is intact, but where is the problem, here you go about the issue and its resolution.

Resolution Steps

This is due to the installation of cluster using a stack that we have in place in our company where that is built in HK
and the timezone parameter set as HK and Oracle install maintains a file called s_crsconfig_TESTRAC1_env for timezone setting for node.
We just run a script while input a parameter file for cluster installation, it will install the crs automatically for given nodes.
But the parameter hardcoded was in HK timezone and where in we installed for India.

1) Check the time zone parameter of the linux operating system

	[root@TESTRAC1 install]# cat /etc/sysconfig/clock
	ZONE="Asia/Kolkata"
	UTC=false
	[root@TESTRAC1 install]#

2) Chek the s_crsconfig_TESTRAC1_env.txt file values, make sure the OS and this Time Zone entries are same.

	[root@TESTRAC1 ~]# cd /u01/gi/oragrid/grid/11.2.0.3/crs/install

	[root@TESTRAC1 install]# cat s_crsconfig_TESTRAC1_env.txt
	### This file can be used to modify the NLS_LANG environment variable, which det
	### For example, a new charset can be configured by setting NLS_LANG=JAPANESE_JA
	### Do not modify this file except to change NLS_LANG, or under the direction of

	TZ=Asia/Hong_Kong
	NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1
	TNS_ADMIN=
	ORACLE_BASE=
	[root@TESTRAC1 install]#

	NOTE: Here OS Time Zone is {Asia/Kolkata} and CRS timezone is {Asia/Hong_Kong}, now we need to update the CRS timezone as {Asia/Kolkata}

3) Take the backup of s_crsconfig_TESTRAC1_env.txt file and update the TZ value as OS timezone.

	[root@TESTRAC1 install]# cp s_crsconfig_TESTRAC1_env.txt s_crsconfig_TESTRAC1_env.txt_bkp
	root@TESTRAC1 install]# vi s_crsconfig_TESTRAC1_env.txt
	[root@TESTRAC1 install]# 
	[root@TESTRAC1 install]# cat s_crsconfig_TESTRAC1_env.txt
	### This file can be used to modify the NLS_LANG environment variable, which determines the charset to be used for messages.
	### For example, a new charset can be configured by setting NLS_LANG=JAPANESE_JAPAN.UTF8
	### Do not modify this file except to change NLS_LANG, or under the direction of Oracle Support Services

	TZ=Asia/Kolkata 
	NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1
	TNS_ADMIN=
	ORACLE_BASE=
	[root@TESTRAC1 install]#

4) Bounce the HAS on 2 nodes one after other

	[root@TESTRAC1 install]# cd /u01/gi/oragrid/grid/11.2.0.3/bin

	[root@TESTRAC1 bin]# ./crsctl stop has

	[root@TESTRAC1 bin]# ./crsctl start has
	CRS-4123: Oracle High Availability Services has been started.

Post Checks
On node 1
~~~~~~~~
Check the time of the OS and Database and make sure both are same.

	[oracle@TESTRAC1]/ # date
	Tue Apr 30 06:10:11 CEST 2013

	Thread 1 advanced to log sequence 489 (LGWR switch)
	  .....
	Fri Apr 26 06:11:50 2013
	...

Same steps for 2nd node as well.

Hope this helps.

-Thanks
Geek DBA

Time to read few good posts from other blogs

Dear Readers,

We have few good posts in Oracle community this week and here are they, and very interesting though

Jonathan Lewis on Systimestamp puzzles:-

systimestamp

Oracle optimizer blog comes up with method_opt parameter significance in dbms_stats

How does the METHOD_OPT parameter work-

Randolf Geist on

ASM AU Size And LMT AUTOALLOCATE

Arup Nanda on streams not only for streams,

Streams Pool is only for Streams- Think Again!

Tanel Poder shared the good stuff on Oracle upgrades (very huge pdf 500 slides)

Oracle Database 11.2 Upgrade and Migration slides

Apologies , as I am busy with some other stuff and could not post any new from last couple of weeks, but stay expected will be posting a series of posts soon which are underway.

Keep busy reading above, will share some soon. 🙂

-Thanks

Geek DBA

Identifying clustername in Oracle CRS

To identify the clustername in Oracle CRS use

$CRS_HOME/bin/cemutlo –n

Use OCRDUMP
which will create a text file called OCRDUMPFILE open that file and look for this entry

#ocrdump -stdout -keyname SYSTEM | grep -A 1 clustername | grep ORATEXT | awk ‘{print $3}’

In the script $ORA_CRS_HOME/install/rootconfig:

ORA_CRS_HOME=/oracle/product/CRS
CRS_ORACLE_OWNER = Oracle
...
CRS_CLUSTER_NAME = crs

In the the $ORA_CRS_HOME/cdata,

Under cdata, there will be a directory with your clustername

Take an export backup of ocrfile,

#ocrconfig -export /backup/ocr_exp.dmp -s online

It will have the file created under /backup and then look for SYSTEM.css.clustername value from that file. Alternatively you can use “strings” command to read the file /backup/ocr_exp.dmp and then find out the value.

Update by a reader:-

the same can be found by using "$CRS_HOME/bin/olsnodes -c"

How to Query the Cluster Name [ID 577300.1]

[sam id=2]

High cpu usage with library cache locks

We have got a call from team asking about a high cpu usage, and here is our hunting down

load averages:  97.5,  98.8, 100.0;  up 11+22:24:01   01:14:52
410 processes: 276 sleeping, 118 running, 3 zombie, 13 on cpu
CPU states:  0.0% idle, 89.6% user, 10.4% kernel,  0.0% iowait,  0.0% swap
Memory: 32G phys mem, 17G free mem, 20G total swap, 20G free swap

   PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
 24735 oracle     1  44    0 4266M 1834M run      8:13  1.17% oracle
 23492 oracle     1  47    0 4266M 1758M sleep    8:59  1.01% oracle
 25364 oracle     1  54    0 4266M 1927M sleep    9:55  1.00% oracle
 25442 oracle     1  47    0 4267M 1943M run     18:11  1.00% oracle
 25474 oracle     1  44    0 4267M 1943M run      9:45  0.99% oracle
 22904 oracle     1  35    0 4266M 1723M run     15:15  0.96% oracle
 24749 oracle     1  44    0 4266M 1834M run     10:21  0.95% oracle
 21442 oracle     1  37    0 4266M 1586M run     17:08  0.93% oracle
 24852 oracle     1  46    0 4266M 1854M run     11:18  0.92% oracle
 23088 oracle     1  42    0 4266M 1735M run     10:47  0.91% oracle
 23871 oracle     1  47    0 4266M 1766M run     16:52  0.89% oracle
 21093 oracle     1  44    0 4266M 1383M run      7:53  0.88% oracle
 20981 oracle     1  37    0 4266M 1366M run     16:03  0.88% oracle
 22158 oracle     1  47    0 4266M 1634M sleep    9:09  0.88% oracle
 25718 oracle     1  35    0 4266M 1967M run     18:05  0.86% oracle

We understand that from above

 
	89.6% CPU usage by user process
	10.4% by kernel

By looking at command, these are all Oracle process, but in my database server I have four database but how to identify the cpu usage from which database,

Just use Capital C switch while you run top command and then P

It will turn out something like below,

 
 CPU states:  0.0% idle, 89.7% user, 10.3% kernel,  0.0% iowait,  0.0% swap
 Memory: 32G phys mem, 17G free mem, 20G total swap, 20G free swap
 
	    PID USERNAME LWP PRI NICE  SIZE   RES STATE    TIME    CPU COMMAND
	  22384 oracle     1  49    0 4266M 1686M run     17:26  1.15% oraclePROD (LOCAL=NO)
	  25762 oracle     1  35    0 4266M 1970M run     12:41  0.94% oraclePROD (LOCAL=NO)
	  25157 oracle     1  44    0 4266M 1898M sleep   12:32  0.92% oraclePROD (LOCAL=NO)
	  22709 oracle     1  57    0 4266M 1694M cpu/1    9:44  0.90% oraclePROD (LOCAL=NO)
	  21527 oracle     1  44    0 4266M 1595M run     11:46  0.87% oraclePROD (LOCAL=NO)
	  21690 oracle     1  55    0 4266M 1606M run     17:43  0.87% oraclePROD (LOCAL=NO)
	  21706 oracle     1  55    0 4267M 1611M run     17:13  0.86% oraclePROD (LOCAL=NO)
	  25732 oracle     1  49    0 4266M 1966M run     16:18  0.86% oraclePROD (LOCAL=NO)
	  20981 oracle     1  49    0 4266M 1366M run     16:14  0.85% oraclePROD (LOCAL=NO)
	  21093 oracle     1  49    0 4266M 1383M run      8:03  0.85% oraclePROD (LOCAL=NO)
	  24427 oracle     1  46    0 4266M 1815M run     10:00  0.84% oraclePROD (LOCAL=NO)
	  24132 oracle     1  44    0 4266M 1782M sleep   11:44  0.83% oraclePROD (LOCAL=NO)
	  25430 oracle     1  45    0 4266M 1938M sleep   12:32  0.83% oraclePROD (LOCAL=NO)
	  22667 oracle     1  44    0 4266M 1691M run     11:05  0.82% oraclePROD (LOCAL=NO)
	 25317 oracle     1  53    0 4266M 1926M run      9:56  0.80% oraclePROD (LOCAL=NO)

So these process consuming top CPU were belongs to PROD database only,

But wait see that, CPU usage all are well below 2% but how come utilizing or burning more cpu.

Let's look at how many sessions or process, well ideally many process using 1-2% may consume more, just a thought,

 
	 $ps -eaf | grep wc -l | grep PROD
	 300

300, looks fair... my process /sessions count was set at 1000.

But whats causing, from OS level this is what we can dig, lets go into the database and see, what this processes/sessions are doing

For this, pretty straight view i will use is v$session

 
	SQL> select sid,serial#,logon_time,status,last_call_et/60,sql_id,event,blocking_session from v$session where username!='SYS' and last_call_et/60 > 0 order by logon_time;

	Output edited for breivity:- 
	  SID SERIAL# LOGON_TIME      STATUS     LAST_CALL_ET/60 SQL_ID          EVENT            BLOCKING_SESSION
	----- ------- --------------- ---------- --------------- --------------- ---------------- ----------------
	 3261     428 31-MAR-13       ACTIVE          166.033333 a049q1sr8wndm   resmgr:cpu quant
	 3234       1 31-MAR-13       ACTIVE          166.033333 96qsds85jm6w4   library cache lo
	 3240     216 31-MAR-13       ACTIVE          166.033333 a049q1sr8wndm   library cache lo
	 3222      10 31-MAR-13       ACTIVE          166.033333 96qsds85jm6w4   library cache: m
	 3232     527 31-MAR-13       ACTIVE          165.933333 96qsds85jm6w4   library cache lo
	 3213     211 31-MAR-13       ACTIVE          166.033333 96qsds85jm6w4   library cache lo
	 3209     147 31-MAR-13       ACTIVE          166.033333 96qsds85jm6w4   resmgr:cpu quant
	 3202      71 31-MAR-13       ACTIVE          166.033333 96qsds85jm6w4   resmgr:cpu quant
	 3200      10 31-MAR-13       ACTIVE          154.583333 a049q1sr8wndm   library cache: m
	 3194       7 31-MAR-13       ACTIVE          166.033333 96qsds85jm6w4   library cache lo
	 3184       7 31-MAR-13       ACTIVE          166.033333 96qsds85jm6w4   resmgr:cpu quant
	 ....
	 3015     597 31-MAR-13       ACTIVE          166.033333 a049q1sr8wndm   library cache lo
	 3008     488 31-MAR-13       ACTIVE          166.033333 a049q1sr8wndm   library cache lo
	 3002       9 31-MAR-13       ACTIVE          166.016667 96qsds85jm6w4   library cache: m
	 3000      80 31-MAR-13       ACTIVE          166.033333 a049q1sr8wndm   library cache lo
	 ...
	 2969      85 31-MAR-13       ACTIVE          79.6333333 a049q1sr8wndm   resmgr:cpu quant
	 2960       2 31-MAR-13       ACTIVE          166.033333 a049q1sr8wndm   library cache lo
	 2957      38 31-MAR-13       ACTIVE          166.033333 a049q1sr8wndm   library cache lo
	 2951      20 31-MAR-13       ACTIVE          166.033333 a049q1sr8wndm   library cache lo
	 ....
	 2707     354 31-MAR-13       ACTIVE          163.833333 a049q1sr8wndm   library cache lo
	 2684      14 31-MAR-13       ACTIVE          166.016667 96qsds85jm6w4   cursor: pin S
	 2683     130 31-MAR-13       ACTIVE          166.016667 96qsds85jm6w4   library cache: m
	 2677       7 31-MAR-13       ACTIVE          166.016667 96qsds85jm6w4   library cache lo
	 2673     183 31-MAR-13       ACTIVE          166.016667 96qsds85jm6w4   library cache: m
	 ...
	 2611      27 31-MAR-13       ACTIVE          166.016667 a049q1sr8wndm   resmgr:cpu quant
	 2605      55 31-MAR-13       ACTIVE          166.016667 a049q1sr8wndm   library cache lo
	 2604      93 31-MAR-13       ACTIVE          166.016667 96qsds85jm6w4   resmgr:cpu quant
	 2597     382 31-MAR-13       ACTIVE          166.016667 a049q1sr8wndm   library cache: m
	 2587      25 31-MAR-13       ACTIVE          166.016667 a049q1sr8wndm   resmgr:cpu quant
	 2583     139 31-MAR-13       ACTIVE          166.016667 96qsds85jm6w4   library cache lo
	 2571      95 31-MAR-13       ACTIVE          166.016667 a049q1sr8wndm   library cache: m
         2578     338 31-MAR-13       ACTIVE          166.016667 96

You can also keep blocking_session,row_wait_obj# columns to see the sessions which are blocking and the for what object it was.

Viola, I have got library cache locks from 31-Mar, which apparently burn or turns on keep knocking cpu in smaller times but stay constant and that is the reason we are seeing high cpu load constantly.

Library cache locks and libary cache mutex s, are wait events note for the sessions/users to wait for the object to be available,

Further, if you look at some wait event are showing resmgr:cpu quantum, the session is waiting for cpu to be available to process its request.

To identify which objects are in contention that part of another post already written here -->
http://db.geeksinsight.com/2013/02/14/resolving-library-cache-mutex-x/
For now, we got to know that application has recyled on 31-Mar where in the database sessions were hanged on from that time and not get cleared,

We have cleared/killed those sessions and apparently the cpu usage came down.

In many cases, tracking down cpu usage by oracle is not pretty straight and it depends on case to case. Some times you may have backups running and causing more cpu.

Hope this helps,
-Thanks
Geek DBA

All about Database Administration, Tips & Tricks

New Features for DBA’s

Subscribe to Posts by Email

Subscriber Count

Disclaimer

Recent Posts

Categories

Archives

Pages

Installer has detected that the diskgroup name provided already exists on the system.

RAC: Cluster Interconnect performance troubleshooting – Quick reference

Quick Question: What are the kernel parameters to tune cluster interconnect performance

Difference between object_id and data_object_id from dba_objects

Difference between failed_login_attempts and sec_max_failed_login_attempts

Recover datafile that accidentally deleted

RAC: Timezone differences in CRS and Database logs

Time to read few good posts from other blogs

Identifying clustername in Oracle CRS

High cpu usage with library cache locks

New Features for DBA’s

Follow Me!!!

Subscribe to Posts by Email

Subscriber Count

Disclaimer

Recent Posts

Categories

Archives

Tags

Pages