Disclaimer All information is offered in good faith and in the hope that it may be of use for educational purpose and for Database community purpose, but is not guaranteed to be correct, up to date or suitable for any particular purpose. db.geeksinsight.com accepts no liability in respect of this information or its use.
This site is independent of and does not represent Oracle Corporation in any way. Oracle does not officially sponsor, approve, or endorse this site or its content and if notify any such I am happy to remove.
Product and company names mentioned in this website may be the trademarks of their respective owners and published here for informational purpose only.
This is my personal blog. The views expressed on these pages are mine and learnt from other blogs and bloggers and to enhance and support the DBA community and this web blog does not represent the thoughts, intentions, plans or strategies of my current employer nor the Oracle and its affiliates or any other companies. And this website does not offer or take profit for providing these content and this is purely non-profit and for educational purpose only.
If you see any issues with Content and copy write issues, I am happy to remove if you notify me.
Contact Geek DBA Team, via geeksinsights@gmail.com
|
Hello,
After a failed crs installation, we have removed everything by using deconfigure and deinstall utility and freshly started the grid infrastruction installation, and we got this error
Checking Temp space: must be greater than 120 MB. Actual 2395 MB Passed
Checking swap space: must be greater than 150 MB. Actual 32767 MB Passed
Preparing to launch Oracle Universal Installer from /tmp/OraInstall2013-05-13_05-23-16AM. Please wait ...[FATAL] [INS-30516] Please specify unique disk groups.
CAUSE: Installer has detected that the diskgroup name provided already exists on the system.
ACTION: Specify different disk group.
Some background:-
1) Not using asm libraries or ASM lib disk we are just providing luns with udev rules
[root@**** rsp_files]# cd /dev/oracleasm/disks
[root@**** rsp_files]# ls -ltr
2) Partitions does not contain any valid partition table ,means no information written on the partitions
fdisk -l /dev/mapper/asm1grid
Disk /dev/mapper/asmt1grid: 10.7 GB, 10737745920 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk /dev/mapper/asmt1grid doesn't contain a valid partition table
3) And my deconfigure clusterware and deinstall worked perfectly, no errors,
Where else could be the disk group information that installer is looking and failing?
Well, there is nothing much to scratch out our head, its just there in the lun headers and stayed there. So it wont use it any more we just need to scrap it out. To do the same dd command would be right choice.
dd if=/dev/zero of=/dev/mapper/asm1grid bs=1048576 count=50
A little explanation, i am padding my /dev/mapper/asm1grid lun with zeros of 1 mb block size and count of 50, this just an ideal , if you want u can add the blocksize also (solaris depends on the size of volume, linux 4096)
Then our installation proceeds without any issue. hope this helps to you as well.
-Thanks
Geek DBA
A quick troubleshooting reference for cluster interconnect performance in Oracle RAC to Identify network and contention issues.
a) Check for "gc cr lost blocks" wait event in Automatic Workload Repository (AWR)/sysstats.
If found, check for these errors on the NIC:
Dropped packets/fragments Buffer overflows Packet reassembly failures or timeouts TX/RX errors
Use these commands to find any errors:
netstat -s Ifconfig -a ORADEBUG
b) Identify Interconnect performance from AWR.
Under Global Cache and Enqueue Services - Workload Characteristics
Avg global cache cr block receive time (ms): should be <=15 ms
Global Cache and Enqueue Services - Messaging Statistics
Avg message sent queue time on ksxp (ms): should be <1 ms
Under Interconnect Ping Latency Stats
Avg Latency 8K msg should be close to Avg Latency 500B msg.
c) These wait events from AWR/sysstat can indicate contention related to RAC.
GC current block busy GV cr block busy GC current buffer busy GC buffer busy acquire/release
d) These wait events in the AWR indicate that there might be a Hot Block that is causing these wait events. From the AWR Segment Statistics, you can find the objects.
Enq:TX Index Contention Gc buffer busy Gc current block busy Gc current split
e) This issue will be noticed if multiple sessions are inserting into a single object or are using a sequence, and the indexed column is sequentially increasing. To address the specific issues:
Identify the indexes and Global Hash Partition them. Increase the Sequence Cache if ordering is not a problem. β
Thanks
Geek DBA
To tune the cluster interconnect performance typically the messages buffers size that should transmit between the nodes, the following kernel parameters should set. 256K is fairly adequate.
net.core.rmem_default = 262144 net.core.rmem_max = 262144 net.core.wmem_default = 262144 net.core.wmem_max = 262144
-Thanks
Geek DBA
Hello,
I have got two questions from my friend.
1) What is the difference between object_id and data_object_id?
2) Will be there any performance impact if data_object_id changes?
For the question #1,
a) Object_id represents the primary_key or the unique identifier for the object
b) data_object_id is the pointer (foreign key courtesy TOM) to the data segment (physical segment)
Since some objects like synonyms etc does not have segments they will not have data_object_id's which supports the argument (b) above.
Now lets move on, both will be same or different
Well, the object_id and data_object_id will be same initially, The data object id is assigned to that segment and any kind of change done on the physical segment would lead to give a change in that number for data_object_id and data_object_id will be changed for following cases
a) Alter table or alter index
b) Alter table exchange partition
c) Truncate table
d) Alter table move or using redefinition package
e) if the table is part of a cluster, data_object_id points to cluster segment
For question #2, will be there any performance impact?
a) Answer will be NO, (according to my knowledge, if anyone has got any pointers please comment)
b) A question in my mind, thinking logically when you use rowid in your app (a combination of datafile,block,row) shall it affect? Again data_object_id
is again a kind of logical representation number like foreign key so may not affect the rowid based calls
Any pointers to know more about on this would be most welcome.
-Thanks
Geek DBA
Hello,
As we all know, We have a profile setting called failed_login_attempts, where and which when a user inputs invalid password for a given threshold default is (unlimited) , then the user will be locked. Means the profile has to part with each and every user.
Similarly, if you want to blockade/drop the connections for all users where wrong password attempt reached to defined threshold, oracle 11g introduced a parameter called sec_max_failed_login_attempts (default 10) which protects the database from malfunction access. This feature is especially used when hacker want to access the database with different users (like default users) and try to give random wrong passwords. If and if the hacker failed to crack the password for different users for 10 times then connectivity will be dropped.
This way we can protect the database from malfunction access at database level.
-Thanks
Geek DBA
Thanks to Martin's blog, which show us the data file recovery when it accidentally dropped.
(apologies i lost the original link)
***** DO IT ON YOUR OWN RISK *****
Created tablespace and users and some tables.
SQL> create tablespace test datafile '/oradata/TESTTest01.dbf' size 100m;
Tablespace created.
SQL> create user test identified by test default tablespace test;
User created.
SQL> grant connect,resource to test;
Grant succeeded.
SQL> grant dba to test;
Grant succeeded.
SQL> create table t1 as select * from dba_objects;
Table created.
SQL> create table t2 as select * from dba_users;
Table created.
SQL> conn test/test
Connected
SQL> select count(*) from t1;
35000
SQL> select count(*) from t2;
46
Removed the datafile
rm -f /oradata/TESTTest01.dbf
Datafile showing recover status as its got deleted
SQL> Select file#,name,status from v$datafile;
FILE# NAME STATUS
---------- -------------------------- ---------------
38 /oradata/TESTTest01.dbf RECOVER
Check for the dbwr process ID, The process id for db writer process is 14906, where the dbwriter locks the file headers and useful for this diagnosis.
ps -eaf | grep dbw*_
oracle 14306 1 0 Apr03 ? 00:02:16 ora_dbw0_
using lsof, find which files the dbwr process accessing or its hanged on,
[oracle@TESTRAC1]/users/oracle # lsof -p 14306
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
oracle 14306 oracle cwd DIR 253,7 4096 615765 /u01/app/oracle/ora112/dbs
oracle 14306 oracle rtd DIR 104,2 4096 2 /
oracle 14306 oracle txt REG 253,7 218550485 1130499 /u01/app/oracle/ora112/bin/oracle
oracle 14306 oracle DEL REG 0,13 2228239 /SYSV48ff1ce4
oracle 14306 oracle mem CHR 1,5 3894 /dev/zero
oracle 14306 oracle mem REG 104,2 35688 83628 /usr/lib64/libnuma.so.1
oracle 14306 oracle mem REG 104,2 1661454 16169 /lib64/libc-2.11.1.so
oracle 14306 oracle mem REG 104,2 108213 16180 /lib64/libnsl-2.11.1.so
oracle 14306 oracle mem REG 104,2 135646 16195 /lib64/libpthread-2.11.1.so
oracle 14306 oracle mem REG 104,2 388274 16177 /lib64/libm-2.11.1.so
oracle 14306 oracle mem REG 104,2 19114 16175 /lib64/libdl-2.11.1.so
oracle 14306 oracle mem REG 104,2 5560 16214 /lib64/libaio.so.1.0.1
oracle 14306 oracle mem REG 253,7 150599 84476 /u01/app/oracle/ora112/lib/libocrutl11.so
oracle 14306 oracle mem REG 253,7 3295575 84394 /u01/app/oracle/ora112/lib/libocrb11.so
oracle 14306 oracle mem REG 253,7 1559829 84362 /u01/app/oracle/ora112/lib/libocr11.so
oracle 14306 oracle mem REG 253,7 12763 84499 /u01/app/oracle/ora112/lib/libskgxn2.so
oracle 14306 oracle mem REG 253,7 16067428 402373 /u01/app/oracle/ora112/lib/libhasgen11.so
oracle 14306 oracle mem REG 253,7 161828 84503 /u01/app/oracle/ora112/lib/libdbcfg11.so
oracle 14306 oracle mem REG 253,7 216789 84310 /u01/app/oracle/ora112/lib/libclsra11.so
oracle 14306 oracle mem REG 253,7 7925240 84384 /u01/app/oracle/ora112/lib/libnnz11.so
oracle 14306 oracle mem REG 104,2 47206 16199 /lib64/librt-2.11.1.so
oracle 14306 oracle mem REG 253,7 993144 84391 /u01/app/oracle/ora112/lib/libskgxp11.so
oracle 14306 oracle mem REG 253,7 532417 84323 /u01/app/oracle/ora112/lib/libcell11.so
oracle 14306 oracle mem REG 104,2 149797 16162 /lib64/ld-2.11.1.so
oracle 14306 oracle DEL REG 253,6 73768 /var/run/nscd/dbQpkhsj
oracle 14306 oracle mem REG 253,7 12315 84306 /u01/app/oracle/ora112/lib/libodm11.so
oracle 14306 oracle 0r CHR 1,3 0t0 3883 /dev/null
oracle 14306 oracle 1w CHR 1,3 0t0 3883 /dev/null
oracle 14306 oracle 2w CHR 1,3 0t0 3883 /dev/null
oracle 14306 oracle 3r CHR 1,3 0t0 3883 /dev/null
oracle 14306 oracle 4r CHR 1,3 0t0 3883 /dev/null
oracle 14306 oracle 5r CHR 1,3 0t0 3883 /dev/null
oracle 14306 oracle 7r CHR 1,3 0t0 3883 /dev/null
oracle 14306 oracle 8r CHR 1,3 0t0 3883 /dev/null
oracle 14306 oracle 9r CHR 1,3 0t0 3883 /dev/null
oracle 14306 oracle 10r CHR 1,5 0t0 3894 /dev/zero
oracle 14306 oracle 11r CHR 1,5 0t0 3894 /dev/zero
oracle 14306 oracle 13r REG 253,7 1092608 583904 /u01/app/oracle/ora112/rdbms/mesg/oraus.msb
oracle 14306 oracle 14r DIR 0,3 0 137759 /proc/14306/fd
oracle 14306 oracle 15r CHR 1,5 0t0 3894 /dev/zero
oracle 14306 oracle 18r REG 253,7 1092608 583904 /u01/app/oracle/ora112/rdbms/mesg/oraus.msb
oracle 14306 oracle 256u REG 199,17000 10108928 559 /oradata/TEST/Control1.ctl
oracle 14306 oracle 257u REG 199,17000 10108928 560 /oradata/TEST/Control2.ctl
oracle 14306 oracle 258uW REG 199,17000 419438592 138 /oradata/TEST/system.dbf
oracle 14306 oracle 259uW REG 199,17000 419438592 141 /oradata/TEST/sysaux.dbf
oracle 14306 oracle 260uW REG 199,17000 1992302592 142 /oradata/TEST/rollbacks.dbf
oracle 14306 oracle 295uW REG 199,17000 1677729792 143 /oradata/TEST/temp.dbf
oracle 14306 oracle 296uW REG 199,17000 104865792 260497 /oradata/TEST/Test01.dbf (deleted)
Did you observed?, the last line shows the file we just deleted. Lets check in the process for 296
[oracle@TESTRAC1] cd /proc/14306/fd #
Cat the process 296 to a orbitory file
[oracle@TESTRAC1]/proc/14306/fd # cat 296 > /tmp/Test01.dbf
File got created in /tmp
[oracle@TESTRAC1]/proc/14306/fd # ls -ltr /tmp/Test01.dbf
-rw-rw-r-- 1 oracle oracle 104865792 2013-04-24 08:45 /tmp/Test01.dbf
Make the old file offline drop
SQL> alter database datafile '/oradata/TEST/Test01.dbf' offline drop;
Database altered.
Rename the old file name to new file (i.e in new location)
SQL> alter database rename file '/oradata/TEST/Test01.dbf' to '/tmp/Test01.dbf';
SQL> alter database datafile '/tmp/Test01.dbf' online;
alter database datafile '/tmp/Test01.dbf' online
*
ERROR at line 1:
ORA-01113: file 38 needs media recovery
ORA-01110: data file 38: '/tmp/Test01.dbf'
Recover the datafile
SQL> recover datafile '/tmp/Test01.dbf';
Media recovery complete.
Online the datafile
SQL> alter database datafile '/tmp/Test01.dbf' online;
Database altered.
Check the status
SQL> select file#,name,status from v$datafile;
FILE# NAME STATUS
----------- --------------------------- ----------------------------------
38 /tmp/Test01.dbf ONLINE
Bingo your file is back, But wait, this may work only when your database is not bounced after the deletion of the file.
Hope this helps
-Thanks
Geek DBA
Hello All,
We have got an issue about time zone differences in alert logs , crs logs with system timestamp and with system timestamp.
System date time
[oracle@TESTRAC1]/ # date
Tue Apr 24 04:10:11 CEST 2013
But database Alert log shows,
Thread 1 advanced to log sequence 235 (LGWR switch)
Current log# 11 seq# 235 mem# 0: +DATA01/db/onlinelog/redot01g01m01.rdo
Current log# 11 seq# 235 mem# 1: +FRA01/db/onlinelog/redot01g01m02.rdo
Fri Apr 24 45:04:50 2013
Archived Log entry 475 added for thread 1 sequence 234 ID 0x984ece99 dest 1:
[oracle@TESTRAC1 trace]$ date
Fri Apr 24 06:46:16 IST 2013
Note: Two hours ahead of the database time and the host time
Few checks before start actual resolution
1) Our database timezone showing correct timestamp only.
SQL> select systimestamp from dual;
SYSTIMESTAMP
---------------------------------------------------------------------------
24-APR-13 04.13.03.105848 AM +02:00
2) Hardware clock checked and its fine
[oracle@TESTRAC1] # hwclock
Tue 24 Apr 2013 04:13:43 AM CEST -0.072343 seconds
3) Ran the cluster verification utility for timesynchronisation issues and its passed
./cluvfy cluvfy comp clocksync
Verifying Clock Synchronization across the cluster nodes
Checking if Clusterware is installed on all nodes...
Check of Clusterware install passed
Checking if CTSS Resource is running on all nodes...
Check: CTSS Resource running on all nodes
Node Name Status
------------------------------------ ------------------------
TESTRAC1 passed
TESTRAC2 passed
Result: CTSS resource check passed
Querying CTSS for time offset on all nodes...
Result: Query of CTSS for time offset passed
Check CTSS state started...
Check: CTSS state
Node Name State
------------------------------------ ------------------------
TESTRAC1 Observer
TESTRAC2 Observer
CTSS is in Observer state. Switching over to clock synchronization checks using NTP
Starting Clock synchronization checks using Network Time Protocol(NTP)...
NTP Configuration file check started...
The NTP configuration file "/etc/ntp.conf" is available on all nodes
NTP Configuration file check passed
Checking daemon liveness...
Check: Liveness for "ntpd"
Node Name Running?
------------------------------------ ------------------------
TESTRAC1 yes
TESTRAC2 yes
Result: Liveness check passed for "ntpd"
Check for NTP daemon or service alive passed on all nodes
Checking NTP daemon command line for slewing option "-x"
Check: NTP daemon command line
Node Name Slewing Option Set?
------------------------------------ ------------------------
TESTRAC1 yes
TESTRAC2 yes
Result:
NTP daemon slewing option check passed
Checking NTP daemon's boot time configuration, in file "/etc/sysconfig/ntpd", for slewing option "-x"
Check: NTP daemon's boot time configuration
Node Name Slewing Option Set?
------------------------------------ ------------------------
TESTRAC1 yes
TESTRAC2 yes
Result:
NTP daemon's boot time configuration check for slewing option passed
Checking whether NTP daemon or service is using UDP port 123 on all nodes
Check for NTP daemon or service using UDP port 123
Node Name Port Open?
------------------------------------ ------------------------
TESTRAC1 yes
TESTRAC2 yes
NTP common Time Server Check started...
NTP Time Server ".GPS." is common to all nodes on which the NTP daemon is running
Check of common NTP Time Server passed
Clock time offset check from NTP Time Server started...
Checking on nodes "[TESTRAC1, TESTRAC2]"...
Check: Clock time offset from NTP Time Server
Time Server: .GPS.
Time Offset Limit: 1000.0 msecs
Node Name Time Offset Status
------------ ------------------------ ------------------------
TESTRAC1 0.125 passed
TESTRAC2 0.274 passed
Time Server ".GPS." has time offsets that are within permissible limits for nodes "[TESTRAC1, TESTRAC2]".
Clock time offset check passed
Result: Clock synchronization check using Network Time Protocol(NTP) passed
Oracle Cluster Time Synchronization Services check passed
Verification of Clock Synchronization across the cluster nodes was successful.
4) CTSSD is in observed mode, means our cluster using ntp
So everything is intact, but where is the problem, here you go about the issue and its resolution.
Resolution Steps
This is due to the installation of cluster using a stack that we have in place in our company where that is built in HK
and the timezone parameter set as HK and Oracle install maintains a file called s_crsconfig_TESTRAC1_env for timezone setting for node.
We just run a script while input a parameter file for cluster installation, it will install the crs automatically for given nodes.
But the parameter hardcoded was in HK timezone and where in we installed for India.
1) Check the time zone parameter of the linux operating system
[root@TESTRAC1 install]# cat /etc/sysconfig/clock
ZONE="Asia/Kolkata"
UTC=false
[root@TESTRAC1 install]#
2) Chek the s_crsconfig_TESTRAC1_env.txt file values, make sure the OS and this Time Zone entries are same.
[root@TESTRAC1 ~]# cd /u01/gi/oragrid/grid/11.2.0.3/crs/install
[root@TESTRAC1 install]# cat s_crsconfig_TESTRAC1_env.txt
### This file can be used to modify the NLS_LANG environment variable, which det
### For example, a new charset can be configured by setting NLS_LANG=JAPANESE_JA
### Do not modify this file except to change NLS_LANG, or under the direction of
TZ=Asia/Hong_Kong
NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1
TNS_ADMIN=
ORACLE_BASE=
[root@TESTRAC1 install]#
NOTE: Here OS Time Zone is {Asia/Kolkata} and CRS timezone is {Asia/Hong_Kong}, now we need to update the CRS timezone as {Asia/Kolkata}
3) Take the backup of s_crsconfig_TESTRAC1_env.txt file and update the TZ value as OS timezone.
[root@TESTRAC1 install]# cp s_crsconfig_TESTRAC1_env.txt s_crsconfig_TESTRAC1_env.txt_bkp
root@TESTRAC1 install]# vi s_crsconfig_TESTRAC1_env.txt
[root@TESTRAC1 install]#
[root@TESTRAC1 install]# cat s_crsconfig_TESTRAC1_env.txt
### This file can be used to modify the NLS_LANG environment variable, which determines the charset to be used for messages.
### For example, a new charset can be configured by setting NLS_LANG=JAPANESE_JAPAN.UTF8
### Do not modify this file except to change NLS_LANG, or under the direction of Oracle Support Services
TZ=Asia/Kolkata
NLS_LANG=AMERICAN_AMERICA.WE8ISO8859P1
TNS_ADMIN=
ORACLE_BASE=
[root@TESTRAC1 install]#
4) Bounce the HAS on 2 nodes one after other
[root@TESTRAC1 install]# cd /u01/gi/oragrid/grid/11.2.0.3/bin
[root@TESTRAC1 bin]# ./crsctl stop has
[root@TESTRAC1 bin]# ./crsctl start has
CRS-4123: Oracle High Availability Services has been started.
Post Checks
On node 1
~~~~~~~~
Check the time of the OS and Database and make sure both are same.
[oracle@TESTRAC1]/ # date
Tue Apr 30 06:10:11 CEST 2013
Thread 1 advanced to log sequence 489 (LGWR switch)
.....
Fri Apr 26 06:11:50 2013
...
Same steps for 2nd node as well.
Hope this helps.
-Thanks
Geek DBA
Dear Readers,
We have few good posts in Oracle community this week and here are they, and very interesting though
Jonathan Lewis on Systimestamp puzzles:-
systimestamp
Oracle optimizer blog comes up with method_opt parameter significance in dbms_stats
How does the METHOD_OPT parameter work-
Randolf Geist on
ASM AU Size And LMT AUTOALLOCATE
Arup Nanda on streams not only for streams,
Streams Pool is only for Streams- Think Again!
Tanel Poder shared the good stuff on Oracle upgrades (very huge pdf 500 slides)
Oracle Database 11.2 Upgrade and Migration slides
Apologies , as I am busy with some other stuff and could not post any new from last couple of weeks, but stay expected will be posting a series of posts soon which are underway.
Keep busy reading above, will share some soon. π
-Thanks
Geek DBA
To identify the clustername in Oracle CRS use
$CRS_HOME/bin/cemutlo βn
or
Use OCRDUMP which will create a text file called OCRDUMPFILE open that file and look for this entry
#ocrdump -stdout -keyname SYSTEM | grep -A 1 clustername | grep ORATEXT | awk β{print $3}β
or
In the script $ORA_CRS_HOME/install/rootconfig:
ORA_CRS_HOME=/oracle/product/CRS CRS_ORACLE_OWNER = Oracle ... CRS_CLUSTER_NAME = crs
or
In the the $ORA_CRS_HOME/cdata,
Under cdata, there will be a directory with your clustername
or
Take an export backup of ocrfile,
#ocrconfig -export /backup/ocr_exp.dmp -s online
It will have the file created under /backup and then look for SYSTEM.css.clustername value from that file. Alternatively you can use βstringsβ command to read the file /backup/ocr_exp.dmp and then find out the value.
Update by a reader:-
the same can be found by using "$CRS_HOME/bin/olsnodes -c"
How to Query the Cluster Name [ID 577300.1]
[sam id=2]
We have got a call from team asking about a high cpu usage, and here is our hunting down
load averages: 97.5, 98.8, 100.0; up 11+22:24:01 01:14:52
410 processes: 276 sleeping, 118 running, 3 zombie, 13 on cpu
CPU states: 0.0% idle, 89.6% user, 10.4% kernel, 0.0% iowait, 0.0% swap
Memory: 32G phys mem, 17G free mem, 20G total swap, 20G free swap
PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
24735 oracle 1 44 0 4266M 1834M run 8:13 1.17% oracle
23492 oracle 1 47 0 4266M 1758M sleep 8:59 1.01% oracle
25364 oracle 1 54 0 4266M 1927M sleep 9:55 1.00% oracle
25442 oracle 1 47 0 4267M 1943M run 18:11 1.00% oracle
25474 oracle 1 44 0 4267M 1943M run 9:45 0.99% oracle
22904 oracle 1 35 0 4266M 1723M run 15:15 0.96% oracle
24749 oracle 1 44 0 4266M 1834M run 10:21 0.95% oracle
21442 oracle 1 37 0 4266M 1586M run 17:08 0.93% oracle
24852 oracle 1 46 0 4266M 1854M run 11:18 0.92% oracle
23088 oracle 1 42 0 4266M 1735M run 10:47 0.91% oracle
23871 oracle 1 47 0 4266M 1766M run 16:52 0.89% oracle
21093 oracle 1 44 0 4266M 1383M run 7:53 0.88% oracle
20981 oracle 1 37 0 4266M 1366M run 16:03 0.88% oracle
22158 oracle 1 47 0 4266M 1634M sleep 9:09 0.88% oracle
25718 oracle 1 35 0 4266M 1967M run 18:05 0.86% oracle
We understand that from above
89.6% CPU usage by user process
10.4% by kernel
By looking at command, these are all Oracle process, but in my database server I have four database but how to identify the cpu usage from which database,
Just use Capital C switch while you run top command and then P
It will turn out something like below,
CPU states: 0.0% idle, 89.7% user, 10.3% kernel, 0.0% iowait, 0.0% swap
Memory: 32G phys mem, 17G free mem, 20G total swap, 20G free swap
PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND
22384 oracle 1 49 0 4266M 1686M run 17:26 1.15% oraclePROD (LOCAL=NO)
25762 oracle 1 35 0 4266M 1970M run 12:41 0.94% oraclePROD (LOCAL=NO)
25157 oracle 1 44 0 4266M 1898M sleep 12:32 0.92% oraclePROD (LOCAL=NO)
22709 oracle 1 57 0 4266M 1694M cpu/1 9:44 0.90% oraclePROD (LOCAL=NO)
21527 oracle 1 44 0 4266M 1595M run 11:46 0.87% oraclePROD (LOCAL=NO)
21690 oracle 1 55 0 4266M 1606M run 17:43 0.87% oraclePROD (LOCAL=NO)
21706 oracle 1 55 0 4267M 1611M run 17:13 0.86% oraclePROD (LOCAL=NO)
25732 oracle 1 49 0 4266M 1966M run 16:18 0.86% oraclePROD (LOCAL=NO)
20981 oracle 1 49 0 4266M 1366M run 16:14 0.85% oraclePROD (LOCAL=NO)
21093 oracle 1 49 0 4266M 1383M run 8:03 0.85% oraclePROD (LOCAL=NO)
24427 oracle 1 46 0 4266M 1815M run 10:00 0.84% oraclePROD (LOCAL=NO)
24132 oracle 1 44 0 4266M 1782M sleep 11:44 0.83% oraclePROD (LOCAL=NO)
25430 oracle 1 45 0 4266M 1938M sleep 12:32 0.83% oraclePROD (LOCAL=NO)
22667 oracle 1 44 0 4266M 1691M run 11:05 0.82% oraclePROD (LOCAL=NO)
25317 oracle 1 53 0 4266M 1926M run 9:56 0.80% oraclePROD (LOCAL=NO)
So these process consuming top CPU were belongs to PROD database only,
But wait see that, CPU usage all are well below 2% but how come utilizing or burning more cpu.
Let's look at how many sessions or process, well ideally many process using 1-2% may consume more, just a thought,
$ps -eaf | grep wc -l | grep PROD
300
300, looks fair... my process /sessions count was set at 1000.
But whats causing, from OS level this is what we can dig, lets go into the database and see, what this processes/sessions are doing
For this, pretty straight view i will use is v$session
SQL> select sid,serial#,logon_time,status,last_call_et/60,sql_id,event,blocking_session from v$session where username!='SYS' and last_call_et/60 > 0 order by logon_time;
Output edited for breivity:-
SID SERIAL# LOGON_TIME STATUS LAST_CALL_ET/60 SQL_ID EVENT BLOCKING_SESSION
----- ------- --------------- ---------- --------------- --------------- ---------------- ----------------
3261 428 31-MAR-13 ACTIVE 166.033333 a049q1sr8wndm resmgr:cpu quant
3234 1 31-MAR-13 ACTIVE 166.033333 96qsds85jm6w4 library cache lo
3240 216 31-MAR-13 ACTIVE 166.033333 a049q1sr8wndm library cache lo
3222 10 31-MAR-13 ACTIVE 166.033333 96qsds85jm6w4 library cache: m
3232 527 31-MAR-13 ACTIVE 165.933333 96qsds85jm6w4 library cache lo
3213 211 31-MAR-13 ACTIVE 166.033333 96qsds85jm6w4 library cache lo
3209 147 31-MAR-13 ACTIVE 166.033333 96qsds85jm6w4 resmgr:cpu quant
3202 71 31-MAR-13 ACTIVE 166.033333 96qsds85jm6w4 resmgr:cpu quant
3200 10 31-MAR-13 ACTIVE 154.583333 a049q1sr8wndm library cache: m
3194 7 31-MAR-13 ACTIVE 166.033333 96qsds85jm6w4 library cache lo
3184 7 31-MAR-13 ACTIVE 166.033333 96qsds85jm6w4 resmgr:cpu quant
....
3015 597 31-MAR-13 ACTIVE 166.033333 a049q1sr8wndm library cache lo
3008 488 31-MAR-13 ACTIVE 166.033333 a049q1sr8wndm library cache lo
3002 9 31-MAR-13 ACTIVE 166.016667 96qsds85jm6w4 library cache: m
3000 80 31-MAR-13 ACTIVE 166.033333 a049q1sr8wndm library cache lo
...
2969 85 31-MAR-13 ACTIVE 79.6333333 a049q1sr8wndm resmgr:cpu quant
2960 2 31-MAR-13 ACTIVE 166.033333 a049q1sr8wndm library cache lo
2957 38 31-MAR-13 ACTIVE 166.033333 a049q1sr8wndm library cache lo
2951 20 31-MAR-13 ACTIVE 166.033333 a049q1sr8wndm library cache lo
....
2707 354 31-MAR-13 ACTIVE 163.833333 a049q1sr8wndm library cache lo
2684 14 31-MAR-13 ACTIVE 166.016667 96qsds85jm6w4 cursor: pin S
2683 130 31-MAR-13 ACTIVE 166.016667 96qsds85jm6w4 library cache: m
2677 7 31-MAR-13 ACTIVE 166.016667 96qsds85jm6w4 library cache lo
2673 183 31-MAR-13 ACTIVE 166.016667 96qsds85jm6w4 library cache: m
...
2611 27 31-MAR-13 ACTIVE 166.016667 a049q1sr8wndm resmgr:cpu quant
2605 55 31-MAR-13 ACTIVE 166.016667 a049q1sr8wndm library cache lo
2604 93 31-MAR-13 ACTIVE 166.016667 96qsds85jm6w4 resmgr:cpu quant
2597 382 31-MAR-13 ACTIVE 166.016667 a049q1sr8wndm library cache: m
2587 25 31-MAR-13 ACTIVE 166.016667 a049q1sr8wndm resmgr:cpu quant
2583 139 31-MAR-13 ACTIVE 166.016667 96qsds85jm6w4 library cache lo
2571 95 31-MAR-13 ACTIVE 166.016667 a049q1sr8wndm library cache: m
2578 338 31-MAR-13 ACTIVE 166.016667 96
You can also keep blocking_session,row_wait_obj# columns to see the sessions which are blocking and the for what object it was.
Viola, I have got library cache locks from 31-Mar, which apparently burn or turns on keep knocking cpu in smaller times but stay constant and that is the reason we are seeing high cpu load constantly.
Library cache locks and libary cache mutex s, are wait events note for the sessions/users to wait for the object to be available,
Further, if you look at some wait event are showing resmgr:cpu quantum, the session is waiting for cpu to be available to process its request.
To identify which objects are in contention that part of another post already written here -->
http://db.geeksinsight.com/2013/02/14/resolving-library-cache-mutex-x/
For now, we got to know that application has recyled on 31-Mar where in the database sessions were hanged on from that time and not get cleared,
We have cleared/killed those sessions and apparently the cpu usage came down.
In many cases, tracking down cpu usage by oracle is not pretty straight and it depends on case to case. Some times you may have backups running and causing more cpu.
Hope this helps,
-Thanks
Geek DBA
|
Follow Me!!!