We have got a call from team asking about a high cpu usage, and here is our hunting down
load averages: 97.5, 98.8, 100.0; up 11+22:24:01 01:14:52 410 processes: 276 sleeping, 118 running, 3 zombie, 13 on cpu CPU states: 0.0% idle, 89.6% user, 10.4% kernel, 0.0% iowait, 0.0% swap Memory: 32G phys mem, 17G free mem, 20G total swap, 20G free swap PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 24735 oracle 1 44 0 4266M 1834M run 8:13 1.17% oracle 23492 oracle 1 47 0 4266M 1758M sleep 8:59 1.01% oracle 25364 oracle 1 54 0 4266M 1927M sleep 9:55 1.00% oracle 25442 oracle 1 47 0 4267M 1943M run 18:11 1.00% oracle 25474 oracle 1 44 0 4267M 1943M run 9:45 0.99% oracle 22904 oracle 1 35 0 4266M 1723M run 15:15 0.96% oracle 24749 oracle 1 44 0 4266M 1834M run 10:21 0.95% oracle 21442 oracle 1 37 0 4266M 1586M run 17:08 0.93% oracle 24852 oracle 1 46 0 4266M 1854M run 11:18 0.92% oracle 23088 oracle 1 42 0 4266M 1735M run 10:47 0.91% oracle 23871 oracle 1 47 0 4266M 1766M run 16:52 0.89% oracle 21093 oracle 1 44 0 4266M 1383M run 7:53 0.88% oracle 20981 oracle 1 37 0 4266M 1366M run 16:03 0.88% oracle 22158 oracle 1 47 0 4266M 1634M sleep 9:09 0.88% oracle 25718 oracle 1 35 0 4266M 1967M run 18:05 0.86% oracle
We understand that from above
89.6% CPU usage by user process 10.4% by kernel
By looking at command, these are all Oracle process, but in my database server I have four database but how to identify the cpu usage from which database,
Just use Capital C switch while you run top command and then P
It will turn out something like below,
CPU states: 0.0% idle, 89.7% user, 10.3% kernel, 0.0% iowait, 0.0% swap Memory: 32G phys mem, 17G free mem, 20G total swap, 20G free swap PID USERNAME LWP PRI NICE SIZE RES STATE TIME CPU COMMAND 22384 oracle 1 49 0 4266M 1686M run 17:26 1.15% oraclePROD (LOCAL=NO) 25762 oracle 1 35 0 4266M 1970M run 12:41 0.94% oraclePROD (LOCAL=NO) 25157 oracle 1 44 0 4266M 1898M sleep 12:32 0.92% oraclePROD (LOCAL=NO) 22709 oracle 1 57 0 4266M 1694M cpu/1 9:44 0.90% oraclePROD (LOCAL=NO) 21527 oracle 1 44 0 4266M 1595M run 11:46 0.87% oraclePROD (LOCAL=NO) 21690 oracle 1 55 0 4266M 1606M run 17:43 0.87% oraclePROD (LOCAL=NO) 21706 oracle 1 55 0 4267M 1611M run 17:13 0.86% oraclePROD (LOCAL=NO) 25732 oracle 1 49 0 4266M 1966M run 16:18 0.86% oraclePROD (LOCAL=NO) 20981 oracle 1 49 0 4266M 1366M run 16:14 0.85% oraclePROD (LOCAL=NO) 21093 oracle 1 49 0 4266M 1383M run 8:03 0.85% oraclePROD (LOCAL=NO) 24427 oracle 1 46 0 4266M 1815M run 10:00 0.84% oraclePROD (LOCAL=NO) 24132 oracle 1 44 0 4266M 1782M sleep 11:44 0.83% oraclePROD (LOCAL=NO) 25430 oracle 1 45 0 4266M 1938M sleep 12:32 0.83% oraclePROD (LOCAL=NO) 22667 oracle 1 44 0 4266M 1691M run 11:05 0.82% oraclePROD (LOCAL=NO) 25317 oracle 1 53 0 4266M 1926M run 9:56 0.80% oraclePROD (LOCAL=NO)
So these process consuming top CPU were belongs to PROD database only,
But wait see that, CPU usage all are well below 2% but how come utilizing or burning more cpu.
Let's look at how many sessions or process, well ideally many process using 1-2% may consume more, just a thought,
$ps -eaf | grep wc -l | grep PROD 300
300, looks fair... my process /sessions count was set at 1000.
But whats causing, from OS level this is what we can dig, lets go into the database and see, what this processes/sessions are doing
For this, pretty straight view i will use is v$session
SQL> select sid,serial#,logon_time,status,last_call_et/60,sql_id,event,blocking_session from v$session where username!='SYS' and last_call_et/60 > 0 order by logon_time; Output edited for breivity:- SID SERIAL# LOGON_TIME STATUS LAST_CALL_ET/60 SQL_ID EVENT BLOCKING_SESSION ----- ------- --------------- ---------- --------------- --------------- ---------------- ---------------- 3261 428 31-MAR-13 ACTIVE 166.033333 a049q1sr8wndm resmgr:cpu quant 3234 1 31-MAR-13 ACTIVE 166.033333 96qsds85jm6w4 library cache lo 3240 216 31-MAR-13 ACTIVE 166.033333 a049q1sr8wndm library cache lo 3222 10 31-MAR-13 ACTIVE 166.033333 96qsds85jm6w4 library cache: m 3232 527 31-MAR-13 ACTIVE 165.933333 96qsds85jm6w4 library cache lo 3213 211 31-MAR-13 ACTIVE 166.033333 96qsds85jm6w4 library cache lo 3209 147 31-MAR-13 ACTIVE 166.033333 96qsds85jm6w4 resmgr:cpu quant 3202 71 31-MAR-13 ACTIVE 166.033333 96qsds85jm6w4 resmgr:cpu quant 3200 10 31-MAR-13 ACTIVE 154.583333 a049q1sr8wndm library cache: m 3194 7 31-MAR-13 ACTIVE 166.033333 96qsds85jm6w4 library cache lo 3184 7 31-MAR-13 ACTIVE 166.033333 96qsds85jm6w4 resmgr:cpu quant .... 3015 597 31-MAR-13 ACTIVE 166.033333 a049q1sr8wndm library cache lo 3008 488 31-MAR-13 ACTIVE 166.033333 a049q1sr8wndm library cache lo 3002 9 31-MAR-13 ACTIVE 166.016667 96qsds85jm6w4 library cache: m 3000 80 31-MAR-13 ACTIVE 166.033333 a049q1sr8wndm library cache lo ... 2969 85 31-MAR-13 ACTIVE 79.6333333 a049q1sr8wndm resmgr:cpu quant 2960 2 31-MAR-13 ACTIVE 166.033333 a049q1sr8wndm library cache lo 2957 38 31-MAR-13 ACTIVE 166.033333 a049q1sr8wndm library cache lo 2951 20 31-MAR-13 ACTIVE 166.033333 a049q1sr8wndm library cache lo .... 2707 354 31-MAR-13 ACTIVE 163.833333 a049q1sr8wndm library cache lo 2684 14 31-MAR-13 ACTIVE 166.016667 96qsds85jm6w4 cursor: pin S 2683 130 31-MAR-13 ACTIVE 166.016667 96qsds85jm6w4 library cache: m 2677 7 31-MAR-13 ACTIVE 166.016667 96qsds85jm6w4 library cache lo 2673 183 31-MAR-13 ACTIVE 166.016667 96qsds85jm6w4 library cache: m ... 2611 27 31-MAR-13 ACTIVE 166.016667 a049q1sr8wndm resmgr:cpu quant 2605 55 31-MAR-13 ACTIVE 166.016667 a049q1sr8wndm library cache lo 2604 93 31-MAR-13 ACTIVE 166.016667 96qsds85jm6w4 resmgr:cpu quant 2597 382 31-MAR-13 ACTIVE 166.016667 a049q1sr8wndm library cache: m 2587 25 31-MAR-13 ACTIVE 166.016667 a049q1sr8wndm resmgr:cpu quant 2583 139 31-MAR-13 ACTIVE 166.016667 96qsds85jm6w4 library cache lo 2571 95 31-MAR-13 ACTIVE 166.016667 a049q1sr8wndm library cache: m 2578 338 31-MAR-13 ACTIVE 166.016667 96
You can also keep blocking_session,row_wait_obj# columns to see the sessions which are blocking and the for what object it was.
Viola, I have got library cache locks from 31-Mar, which apparently burn or turns on keep knocking cpu in smaller times but stay constant and that is the reason we are seeing high cpu load constantly.
Library cache locks and libary cache mutex s, are wait events note for the sessions/users to wait for the object to be available,
Further, if you look at some wait event are showing resmgr:cpu quantum, the session is waiting for cpu to be available to process its request.
To identify which objects are in contention that part of another post already written here -->
http://db.geeksinsight.com/2013/02/14/resolving-library-cache-mutex-x/
For now, we got to know that application has recyled on 31-Mar where in the database sessions were hanged on from that time and not get cleared,
We have cleared/killed those sessions and apparently the cpu usage came down.
In many cases, tracking down cpu usage by oracle is not pretty straight and it depends on case to case. Some times you may have backups running and causing more cpu.
Hope this helps,
-Thanks
Geek DBA
Follow Me!!!