All about Database Administration, Tips & Tricks

The Insights of a Geek DBA !!!

All information is offered in good faith and in the hope that it may be of use for educational purpose and for Database community purpose, but is not guaranteed to be correct, up to date or suitable for any particular purpose. db.geeksinsight.com accepts no liability in respect of this information or its use. This site is independent of and does not represent Oracle Corporation in any way. Oracle does not officially sponsor, approve, or endorse this site or its content and if notify any such I am happy to remove. Product and company names mentioned in this website may be the trademarks of their respective owners and published here for informational purpose only. This is my personal blog. The views expressed on these pages are mine and learnt from other blogs and bloggers and to enhance and support the DBA community and this web blog does not represent the thoughts, intentions, plans or strategies of my current employer nor the Oracle and its affiliates or any other companies. And this website does not offer or take profit for providing these content and this is purely non-profit and for educational purpose only. If you see any issues with Content and copy write issues, I am happy to remove if you notify me. Contact Geek DBA Team, via geeksinsights@gmail.com

Performance Troubleshooting Series: Query Parsing & Execution Times

As we all know, optimizer parse the statement to provide a plan, but to provide that Oracle Optimizer has to go through different phases called Parse, Execute, Fetch.

1. Parsing has three phases again,

Compiler: Compiles the statement to oracle readable format or oracle understandable state.

Estimator: Look out the statement and get the object names and check the object statistics and provide a estimate how much time and resources like cpu / mem / io required to execute this plan

Plan Generator: Based on the estimator inputs, plan generator will provide the plan in terms of execution plan.

Then plan execution will be done by the server process and get the data out from disk or buffer with two different phases called,

2. Execute: Executes the statement.

3. Fetch: Fetches the data in sequential way that how plan was.

Now for this post, How to find how much time spent for each phase.

Lets look at different views and what they can offer to us,

V$SQL

From the shared pool, Provides sql execution details of all including cpu time, elapsed time, io time, wait time etc,

but elapsed time is for complete parse/execute/fetch

ELAPSED_TIME NUMBER Elapsed time (in microseconds) used by this cursor for parsing, executing, and fetching

dba_hist_sqlstat

From the awr data, Provides sql execution details of all including cpu time, elapsed time, io time, wait time etc,but elapsed time is for complete parse/execute/fetch

`ELAPSED_TIME_TOTAL`	`NUMBER`		Cumulative value of elapsed time (in microseconds) used by this cursor for parsing/executing/fetching
`ELAPSED_TIME_DELTA`	`NUMBER`		Delta value of elapsed time (in microseconds) used by this cursor for parsing/executing/fetching

v$sesstat

v$sqlstat

select b.sid, a.name, b.value

from v$sesstat b, v$statname a

where a.name in ('parse count (hard)', 'execute count')

and b.statistic# = a.statistic#

order by sid;

v$sqlstats

From the shared pool, more or like v$sql but more robust and good. but still elapsed time is for complete parse/execute/fetch

From Documentation:-

ELAPSED_TIME NUMBER Elapsed time (in microseconds) used by this cursor for parsing, executing, and fetching

v$active_session_history

ASH provides robust information of active sessions and what they are waiting or consuming, and can provide if a session is in parsing or execution during that sample time at least from 11gr2

`IN_PARSE`	`VARCHAR2(1)`	Indicates whether the session was parsing at the time of sampling (`Y`) or not (`N`)
`IN_HARD_PARSE`	`VARCHAR2(1)`	Indicates whether the session was hard parsing at the time of sampling (`Y`) or not (`N`)
`IN_SQL_EXECUTION`	`VARCHAR2(1)`	Indicates whether the session was executing SQL statements at the time of sampling (`Y`) or not (`N`)

or example:-

if the sql_id is in_parse y , you have to sum the whole sample time or count it.

For example:- THe sid is 521 and sql_id is 93sdljkls0300

The following query gives you how much time the query spent on only hard parsing or executing.

select sid,sql_id,sum(delta_time) from v$active_session_history where sql_id='93sdljkls0300' and in_hard_parse='Y' group by sid,sql_id;

select sid,sql_id,sum(delta_time) from v$active_session_history where sql_id='93sdljkls0300' and in_sql_execution='Y' group by sid,sql_id;

Then, What is the straight method to know how much parsing,executing fetching time?

As far as my knowledge permits and what I know, The good gold old method, sql tracing. Yes, the 10046 or sql_trace provides the information you want.

How to trace the session/sql_id Read Here

Once you trace the session or sql statement, If you look raw trace file,

PARSING IN CURSOR #12 len=169 dep=2 uid=0 oct=3 lid=0 tim=11746424128512 hv=1173719687 ad='2e7c2c50'

select col#, grantee#, privilege#,max(mod(nvl(option$,0),2)) from objauth$ where obj#=:1 and col# is not null

group by privilege#, col#, grantee# order by col#, grantee#

END OF STMT

PARSE #12:c=0,e=1024,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,tim=11746424128512

EXEC #12:c=0,e=3072,p=0,cr=0,cu=0,mis=1,r=0,dep=2,og=4,tim=11746424131584

FETCH #12:c=0,e=0,p=0,cr=2,cu=0,mis=0,r=0,dep=2,og=4,tim=11746424131584

As you see above the e=1024 for PARSE and e-3072 for Exec and Fetch. This way you can identify where your statement is taking much time in parsing or executing.

-Regards

GeekDBA

Configuring RAC on Amazon Cloud

Hello All,

This post is specifically for those who works on Amazon Cloud and want to build a RAC on Amazon EC2 Instances (machines).

There is a very good article in the AWS Documentation Articles (here), but the linux instances taken and built were very large and per hour it cost around 4$ each machine (three machines).

For those who are looking for a cost effective testing solution, I have tried to built the RAC on m3.medium instances which costs around 15Cents per hour. Which I think phenomenal cheap. Finally, my verdict its not that easy to have the machines ready and also the following issues (at least to my knowledge, I welcome if anyone can correct me here)

Multicast Support on EC2 Instance IP not supported: Due to this we have to rely on Ntop N2N (kind of VPN) which transmits multicast packets across nodes
Although used N2N, the cluster haip having issues, this means you cannot have multiple private networks
Public VIP that means Node VIP and SCAN VIP cannot be failed over as well unless you create a specific failover script to right instances
You cannot use server pools concept since the VIP assignments are static to the nodes.

Despite of having all this , if you want to check here are the steps.

The following steps assumes

You have an AWS account and a EC2 instance with AWS CLI installed

Node Configurations

Details	Sto.geeklabs.com	Db1.geeklabs.com	Db2.geeklabs.com
AWS Type	Ec2	Ec2	Ec2
Instance Class	M3.medium	M3.medium	M3.medium
Multicast Support	Ntop- Super Node	Ntop – edge	Ntop-edge
Shared Storage	iSCSi Target	iSCSi Initiator	iSCSi Initiator
Storage Type	EBS/Magnetic	EBS/Magnetic	EBS/Magnetic
Storage Size	Root Volume – 20gb EBS Volumes – 10 gb * 2	Root Volume- 20gb	Root Volume -20gb
VPC	Custom VPC Internet gate Way Enabled	Custom VPC Internet gate Way Enabled	Custom VPC Internet gate Way Enabled
Public Subnet	Subnet 1 : 10.0.0.74	Subnet 1 : 10.0.0.233	Subnet 1 : 10.0.0.83
Secondary Public IP’s	Nil	Subnet 1 : 10.0.0.100 (VIP) Subnet 1 : 10.0.0.101 (SCAN) Subnet 1 : 10.0.0.102 (SCAN) Subnet 1 : 10.0.0.103 (SCAN)	Subnet 1 : 10.0.0.104 (VIP)
Private Subnet	Nil	Subnet 2: 10.0.1.21	Subnet 2: 10.0.1.22
OS - AWS Image	OEL 5.11 UEK ami-c5b2dfff	OEL 5.11 UEK ami-c5b2dfff	OEL 5.11 UEK ami-c5b2dfff

Installation Document

Taking too long?

Reload document

Open in new tab

12c Database – Degraded Query Performance – 12c Clients with Queries on Lob tables

The issue is there’s a query which executes quickly through 12c client in 12c database where as it is taking longer time when connecting as with 11g Clients to 12c database.

The table has a lob column and what observed is,

12c Client to 12c database:- During Execution of the query, direct path reads and PGA has grown upto 1.5gb, v$temporary_lobs shows the lobs created
11g Client to 12 c database :- During Execution of the query, db file sequential reads and PGA has consumed only 61mb, where v$temporary_lobs show nothing.

This is due to 12c clients create a server side temporary lobs (abstract lobs) and used PGA much and completed quickly.

Where in the 11g client does not have the enhancement to create abstract lobs and still fetches to the client side and caused query to run for 2 days.

This is listed as bug, 22729430 and after applying this patch the query running from 11g client also completed quickly.

Bottom Line, many things can happen especially you use many things in Oracle database. ;).

Hope this helps.

GeekDBA

Result Cache: RC Latch – Troubleshooting

Most of us know the result cache is introduced in 11g initially with pl/sql result cache and then the query result cache to store the results of query in a separate memory structures in shared pool.

As I do not want to get into how it works, what it do, may be the best start to read is documentation.

However, the results and use cases are vary from case to case for each environment.

But when you (actually developers want to use it more ) want to use you and need to know how much size for cache size is required.

By default the size is allocated automatically , from the documentation,

Automatic memory management

If you are using the MEMORY_TARGET initialization parameter to specify memory allocation, Oracle Database allocates 0.25% of the value of the MEMORY_TARGET parameter to the result cache.

Automatic shared memory management

If you are managing the size of the shared pool using the SGA_TARGET initialization parameter, Oracle Database allocates 0.50% of the value of the SGA_TARGET parameter to the result cache.

Manual shared memory management

If you are managing the size of the shared pool using the SHARED_POOL_SIZE initialization parameter, then Oracle Database allocates 1% of the shared pool size to the result cache.

And the parameter that controls result cache on/off is RESULT_CACHE_MODE= FORCE/MANUAL and the later as default. And you can also specify how much result cache you want to keep by using RESULT_CACHE_MAX_SIZE parameter

One of the biggest issue with Result Cache is the sessions waiting for or AWR report top events shows, Result Cache: RC Latch which can degrade performance queries/database. The two reasons mostly.

Undersized result cache
Concurrency of users using the same result everytime (as single Latch is covering whole result cache pool, for which the concurrent sessions reuse the same object/block in result cache can lead to greater contention)

Lets look at, Undersized Result Cache, Lets start with real example which we have recently came across (manipulated example).

A result cache size of 150MB, viewed in result cache report.

SQL> set serveroutput on

SQL> exec dbms_result_cache.memory_report

R e s u l t C a c h e M e m o r y R e p o r t

[Parameters]

Block Size = 1K bytes --> The size of each buffer in result cache is 1k

Maximum Cache Size = 257312K bytes (157312 blocks) --> Maximum Cache size 250MB

Maximum Result Size = 6665K bytes (7865 blocks) --> Maximum of size a result 6MB

[Memory]

Total Memory = 261687288 bytes [1.004% of the Shared Pool] --> Telling the 1.004% used , as we have manual memory management

... Fixed Memory = 247152 bytes [0.001% of the Shared Pool]

... Dynamic Memory = 261540136 bytes [1.003% of the Shared Pool]

....... Overhead = 452648 bytes

....... Cache Memory = 257312K bytes (257312 blocks)

........... Unused Memory = 0 blocks

........... Used Memory = 257312 blocks

............... Dependencies = 100 blocks (100 count)

............... Results = 257295 blocks --> The results cache has each 1k block 257295 blocks means 250MB

................... PLSQL = 257295 blocks (257295 count) --> And apparently most of results are using by PL/SQL objects not query results.

SQL> select name, value from v$result_cache_statistics;

NAME VALUE

------------------------------ --------------------

Block Size (Bytes) 1024

Block Count Maximum 257312 ====> 250MB

Block Count Current 257312 ====> 250MB

Result Size Maximum (Blocks) 7865

Create Count Success 427809 ====> No of times the result cache created a new results thats about 427809*1k=420MB

Create Count Failure 0

Find Count 7101 ====> No of times the queries used the result cache this will tell how effective your queries using result cache, the number 7101 out of 157312 is less than 5% used

Invalidation Count 0

Delete Count Invalid 2 ====> No of times that blocks have been deleted becoz of underlying data changes the higher the number the inefficient usage of result cache.

Delete Count Valid 70260 ====> This will tell you number of times that result cache has deleted the blocks from RC although the blocks were valid, this tells you the undersized   result cache size.

As see above my result cache has more create counts and delete counts than find count. This is not at all a efficient usage of result cache, lets park what is stored in this result cache for a while and what the object is, lets think regardless of efficient usage of result cache I want to size it properly. This is required because , if the objects are waiting for result cache creating it blocks they may see degraded performance waiting for "Result Cache: RC Latch". Now we have to determine the size result cache for my database.

Size of block 1K

Count of Creation blocks sucessfully, 427809

Deleted Count of Valid blocks due to insufficient size:70260

Find count (reuse of cache) 7101

Taking consideration, 427809-7101+70260 = 490968 i.e around 420MB + 250MB current = 650MB approx. (In my example above the result_cache_max_size is not set and its used default)

The second one, concurrency of the result cache lead to "Result Cache: RC Latch", how to know, The parent address of the object in result cache tells us.

v$result_cache_objects give the name of the object and count but not the concurrent users accessing the object. indeed v$sql can provide a clue.

SQL> select sql_id,executions,elapsed_time,users_opening,users_executing where sql_id='';

The more the users_executing the contention / concurrency it has. To resolve the latch that is due to concurrency you may need to consider reducing the users executing and resulting with same package.

Finally the result cache objects can be found from. V$result_cache_objects, as you see the name contains the name of object with parent address as the parent address is same for all rows in my case.

NAMESPACE STATUS NAME NUMBER_OF_RESULTS AVG_SCAN_CNT MAX_SCAN_CNT TOT_BLK_CNT

PLSQL Published "XXXX"."XXX_PKG"::11."FUNCTIONNAME"#s43556wdfdfg#140 157042 0 5 257042

PLSQL Results "XXXX"."XXX_PKG"::11."FUNCTIONNAME"#s43556wdfdfg#140 157042 0 5 257042

PLSQL Results "XXXX"."XXX_PKG"::11."FUNCTIONNAME"#s43556wdfdfg#140 157042 0 5 257042

PLSQL Results "XXXX"."XXX_PKG"::11."FUNCTIONNAME"#s43556wdfdfg#140 157042 0 5 257042

Hope this helps.

GeekDBA

Missing Diskgroup in CRS – Adding Manually

Hello

While doing some setup, I have came across a diskgroup (OCRDG) missing in CRS registry, however the diskgroup is present in the ASM instance and showing mounted.

[root@oinfo12c-rac1 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE oinfo12c-rac1 STABLE
ONLINE ONLINE oinfo12c-rac2 STABLE
ora.LISTENER.lsnr
ONLINE ONLINE oinfo12c-rac1 STABLE
ONLINE OFFLINE oinfo12c-rac2 STABLE
ora.asm
ONLINE ONLINE oinfo12c-rac1 Started,STABLE
ONLINE ONLINE oinfo12c-rac2 Started,STABLE
ora.net1.network
ONLINE ONLINE oinfo12c-rac1 STABLE
ONLINE ONLINE oinfo12c-rac2 STABLE
ora.ons
ONLINE ONLINE oinfo12c-rac1 STABLE
ONLINE ONLINE oinfo12c-rac2 STABLE

As you see the output shows nothing about OCRDG, Lets look at asmcmd, and the OCRDG is present.

ASMCMD> lsdg
State Type Rebal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Voting_files Name
MOUNTED EXTERN N 512 4096 1048576 5114 773 0 773 0 Y DATA/
MOUNTED EXTERN N 512 4096 1048576 5114 4852 0 4852 0 N OCRDG/

Seems some permission issue, etc, do not want to spend much time on it. Just want to add it manually.

But hold there, srvctl does not have

srvctl add diskgroup

How to add it then?

We have crsctl commands which you can add resource, Trying my luck.

If you have a any other diskgroup present in the CRS as like mine (DATA diskgroup is there), Just copy and register it as like below

1) crsctl stat resource ora.DATA.dg -p > profile.txt

2) vi profile.txt
### Remove Name,Type from the file rest keep as is.

3) Add the resource
crsctl add resource ora.OCRDG.dg -type ora.diskgroup.type -file profile.txt

4) Check the status again

[root@oinfo12c-rac1 ~]# crsctl stat res -t
--------------------------------------------------------------------------------
Name Target State Server State details
--------------------------------------------------------------------------------
Local Resources
--------------------------------------------------------------------------------
ora.DATA.dg
ONLINE ONLINE oinfo12c-rac1 STABLE
ONLINE ONLINE oinfo12c-rac2 STABLE
ora.LISTENER.lsnr
ONLINE ONLINE oinfo12c-rac1 STABLE
ONLINE OFFLINE oinfo12c-rac2 STABLE
ora.OCRDG.dg
ONLINE ONLINE oinfo12c-rac1 STABLE
ONLINE ONLINE oinfo12c-rac2 STABLE
ora.asm
ONLINE ONLINE oinfo12c-rac1 Started,STABLE
ONLINE ONLINE oinfo12c-rac2 Started,STABLE

If you do not have any other diskgroup, try this

1. crsctl add resource ora.OCRDG.dg -type ora.diskgroup.type -attr ="ACL=owner:oracle:rwx,START_DEPENDENCIES=pullup:always(ora.asm) hard(ora.asm),VERSION=12.1.0.1.0"

This is should add the missing diskgroup.

Have fun.

GeekDBA

Performance Troubleshooting Series : Identifying Problematic Sessions or Queries (Method 1)

In the Previous Posts , to troubleshoot the problematic queries/long running session issues, we have sorted out two approaches, Here we discuss the Method 1

Method 1. Session Wait Event Approach:- In this approach we will try to identify from the problematic session what its waiting for , why its waiting, what can be done to resolve the problem.

The most common causes are explained here, however there may be many cases. The events classified here are can impact the instance level hence database performance can be degraded.

SQL> select sid,username,event,blocking_session,sql_id,prev_sql_id from v$session where username='username';

SQL> select sid,username,event,blocking_session,sql_id,prev_sql_id from v$session where sid='sid';

SQL> select session_id,username,event,blocking_session,sql_id,prev_sql_id from v$active_session_history where session_id='';

Concentrate on the Event column and the following events can represent you the problem. Events are mainly classified as Administration, I/O, Concurrency etc. And these notifies what the database is experiencing in terms of waits. As such the waits more the problem in that particular area is more.

For Example, you may have see the following output with above queries

As you see above the event column shows most of the times the following events and sessions are waiting for that event. Each wait event represents the issue it has and followed by solution.

1. Latch Cache Buffer chains:-

Problem:- As such more buffers (headers) are been read and trying to modify and have a long list of chain for each buffer and repeatable reads (more nested loops), the session get waited on doing latch cache buffer chains.

Solution:- Check the sql_id and generate execution plan, compare the execution plan with previous plan from history, and you may see more nested loops.

This is due to in correct statistics which lead to more nested loops instead of hash joins.

Collect the statistics and you rerun the statement, flush the buffer pool will help.

Performance Troubleshooting Series : Identifying Problematic Sessions or Queries (Preface)

In this part of the series, we will be looking into , how to identify the Problematic Sessions or Queries.

Before beginning, one should distinguish the question here. You may face questions like, and the answer for both is to identify either loaded sessions for a database.

1. Database is/was slow.

2. Our Session/Query is/was slow.

To make it simple, I will stick this two questions for this post now.

Question 1: Database is/was slow

First lets see how to determine is the database currently running slow.

1. AAS:- Average Active Sessions , As written here you can find the AAS as a measurement to understand the database is loaded or not.

Part 1 and Part 2 of the Series

2.Identify the most resource Consumption processes:-

But if you see the database has load i.e AAS is more then

Part 3 of the series

As by now , you might have determined the database load, by AAS, with yard stick cpu count of the database, if your AAS is less than CPU COUNT then you can simply say that database is not loaded, that means you do not have that many concurrent sessions occupying CPU (leaving other factors like IO/Memory)

Then, why they reporting slow. It might be the case that their specific session or process taking time to complete or waiting or running slow.

So you need to ask few questions now.

1. Provide more details on SID, Machine, SQL statement,, at what time the session logged on.

With the details above you can identify or pin point to session level troubleshooting..

Question 2:- Session / Query Running Slow:-

3. As you identify the session which causing issue, now look at those sessions top events.

There are two Methods and a systematic approach.

Method 1. Session Wait Event Approach:- In this approach we will try to identify from the problematic session what its waiting for , why its waiting, what can be done to resolve the problem.

Method 2. Session SQL ID Run History Approach:- In this approach we shall need to check , if the query for the session that running long has been ran before, if so ran, what is the plan before and what is the plan currently and is there any change, if so any change why it changed what caused to be changed etc.

SQL> select sid,username,status,event,sql_id,prev_sql_id,blocking_session,row_wait_obj# from v$session where sid='';

(you can replace any condition like username, machine in where clause with the information you have got from application teams)

Alternatively, you can find the series of information from v$active_session_history of every second sample for the same session using below query

SQL> select sample_time,session_id,username,event,sql_id,sql_id,prev_sql_id,blocking_session,blocking_instance,plan_hash_value from v$active_session_history where session_id='' or username='';

In Next series these methods will be discussed.

-Thanks

GEEKDBA

Exadata Installation (only for test and education purpose)

This post is about installing Exadata storage server while I am preparing for exadata certification. This is just for educational and learning purposes to understand exadata components and commands to get some hands on experience before exam.

Pre-requisities

Install virtualbox
Install Oracle Linux 5 Update 9 (just install this ersion only not even oracle linux 6, if not exadata may not start properly)
Add a hard disk with 10gb to your virtualbox

Download the required softwares from OTN

Oracle Linux 5 (update 9)
Oracle Middleware software from edelivery.oracle.com
For convenience , I made bundle of all required things in zip and you can download from here

Install required rpm's (if you have media mounted do rpm -ivh of these packages)

[root@stocell1 ~] yum install oracle-validated
[root@stocell1 ~] yum install perl
[root@stocell1 ~] yum install
[root@stocell1 ~] yum install unzip
[root@stocell1 ~] yum install perl
[root@stocell1 ~] yum install net-snmp-utils
[root@stocell1 ~] yum install "perl(LWP)"

Install required rpms (navigate to media folder and server/packages)

[root@stocell1 ~] rpm -ivh binutils compat-libcap1 compat-libstdc++-33 compat-libstdc++-33.i686 gcc gcc-c++ glibc glibc.i686 glibc-devel glibc-devel.i686 ksh libgcc libgcc.i686 libstdc++ libstdc++.i686 libstdc++-devel libstdc++-devel.i686 libaio libaio.i686 libaio-devel libaio-devel.i686 libXext libXext.i686 libXtst libXtst.i686 libX11 libX11.i686 libXau libXau.i686 libxcb libxcb.i686 libXi libXi.i686 make sysstat unixODBC unixODBC-devel unzip perl net-snmp-utils "perl(LWP)"

[root@stocell1 ~] rpm -ivh oracle-validated

Create swap file

dd if=/dev/zero of=/swapfile1 bs=1024 count=2097152

Add rds protocol

[celladmin@stocell1 ~] modprobe rds
[celladmin@stocell1 ~] modprobe rds_tcp
[celladmin@stocell1 ~] modprobe rds_rdma
[celladmin@stocell1 ~] vi /etc/modprobe.d/rds.conf (insert below line and save the file)
install rds /sbin/modprobe --ignore-install rds && /sbin/modprobe rds_tcp && /sbin/modprobe rds_rdma
[celladmin@stocell1 ~] lsmod | grep rds

Stop Firewall

[root@stocell1 ~] service iptables off
[root@stocell1 ~]service iptables stop
[root@stocell1 ~] chkconfig iptables off

Create directories & log directories

[root@stocell1 ~] mkdir /var/log/oracle
[root@stocell1 ~] mkdir /opt/oracle
[root@stocell1 ~] chmod 775 /var/log/oracle

Add hostname to /etc/hosts (find your machine ip address through ifconfig command and it to your etc/hosts file)

[root@stocell1 ~]vi /etc/hosts
127.0.0.1 localhost.localdomain localhost

<ipaddre> stocell1.geekslabs.com geekslabs

[root@stocell1 ~] service network restart

Install JDK 1.5.0

[root@stocell1 ~] rpm -ivh jdk-1_5_0_15-linux-amd64.rpm

Install the exadata storage software (this will install the exadata storage software)

[root@stocell1 ~] unzip cell.bin

Just in case if already installed, and removed the folder you can uninstall rpm using following

rpm -e --noscripts cell-11.2.3.2.1_LINUX.X64_130109-1

[root@stocell1 ~] rpm -ivh cell-11.2.3.2.1_LINUX.X64_130109-1.x86_64.rpm

I have 10gb disk, create 4 1024M disks, create following script and ensure change the /dev/xvdh with your disk

[root@stocell1 ~] vi partitions.sh

#!/bin/bash
for i in {1..20}
do
cat <<EOF | fdisk /dev/sdb
n
p
{$i}

+500M
w
EOF
done

Run the script, it will create 10 partitions with 1gb each.

[celladmin@stocell1 ~] sh partition.sh

Create Softlink to the disk partitions and make them as raw disks

cd /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks
mkdir -p disks/raw
cd disks/raw
ln -s /dev/sdb1 cell12_DISK1
ln -s /dev/sdb2 cell12_DISK2
ln -s /dev/sdb3 cell12_DISK3
ln -s /dev/sdb4 cell12_DISK4

ln -s /dev/sdb5 cell12_DISK5
ln -s /dev/sdb6 cell12_DISK6
ln -s /dev/sdb7 cell12_DISK7
ln -s /dev/sdb8 cell12_DISK8
ln -s /dev/sdb9 cell12_DISK9
ln -s /dev/sdb10 cell12_DISK10
ln -s /dev/sdb11 cell12_DISK11

Create the Cell services using cellcli, as celladmin

[celladmin@stocell1 ~]su - celladmin
[celladmin@stocell1 ~]cellcli -e create cell cell12 interconnect1=eth0

Start the cell services

[celladmin@stocell1 ~]$ cellcli -e alter cell restart services all
Stopping the RS, CELLSRV, and MS services...
CELL-01509: Restart Server (RS) not responding.
Starting the RS, CELLSRV, and MS services...
Getting the state of RS services... running
Starting CELLSRV services...
The STARTUP of CELLSRV services was successful.
Starting MS services...
The STARTUP of MS services was successful.

Create Cell Disks, this will create the cell disks for all the raw disks we have created

[celladmin@stocell1 ~]$ cellcli -e create celldisk all

[celladmin@stocell1 ~]$ cellcli
CellCLI> list celldisk
CD_DISK10_cell12 normal
CD_DISK11_cell12 normal
CD_DISK1_cell12 normal
CD_DISK2_cell12 normal
CD_DISK3_cell12 normal
CD_DISK4_cell12 normal
CD_DISK5_cell12 normal
CD_DISK6_cell12 normal
CD_DISK7_cell12 normal
CD_DISK8_cell12 normal
CD_DISK9_cell12 normal

Create Grid Disks

[celladmin@stocell1 ~]$ cellcli -e create griddisk all harddisk prefix=DATA

CellCLI> list griddisk
DATA_CD_DISK10_cell12 active
DATA_CD_DISK11_cell12 active
DATA_CD_DISK1_cell12 active
DATA_CD_DISK2_cell12 active
DATA_CD_DISK3_cell12 active
DATA_CD_DISK4_cell12 active
DATA_CD_DISK5_cell12 active
DATA_CD_DISK6_cell12 active
DATA_CD_DISK7_cell12 active
DATA_CD_DISK8_cell12 active
DATA_CD_DISK9_cell12 active

Check Luns (in this case they are just a bunch of raw disks otherwise they point to hard disks)

CellCLI> list lun
/opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK1 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK1 normal
/opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK10 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK10 normal
/opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK11 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK11 normal
/opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK2 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK2 normal
/opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK3 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK3 normal
/opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK4 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK4 normal
/opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK5 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK5 normal
/opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK6 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK6 normal
/opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK7 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK7 normal
/opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK8 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK8 normal
/opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK9 /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/disks/raw/cell12_DISK9 normal

Finally, Check your Cell Service Details

[celladmin@stocell1 ~]$ cellcli
 CellCLI: Release 11.2.3.2.1 - Production on Tue Aug 01 07:48:51 EDT 2017
 Copyright (c) 2007, 2012, Oracle.  All rights reserved.
 Cell Efficiency Ratio: 1
 CellCLI> list cell detail
 name:                   cell12
 bbuTempThreshold:       60
 bbuChargeThreshold:     800
 bmcType:                absent
 cellVersion:            OSS_11.2.3.2.1_LINUX.X64_130109
 cpuCount:               1
 diagHistoryDays:        7
 fanCount:               1/1
 fanStatus:              normal
 flashCacheMode:         WriteThrough
 id:                     29bf45c9-e189-4221-84bd-3683593d0e7a
 interconnectCount:      1
 interconnect1:          eth0
 iormBoost:              0.0
 ipaddress1:             10.0.0.136/24
 kernelVersion:          2.6.39-400.215.11.el5uek
 makeModel:              Fake hardware
 metricHistoryDays:      7
 offloadEfficiency:      1.0
 powerCount:             1/1
 powerStatus:            normal
 releaseVersion:         11.2.3.2.1
 releaseTrackingBug:     14522699
 status:                 online
 temperatureReading:     0.0
 temperatureStatus:      normal
 upTime:                 0 days, 0:30
 cellsrvStatus:          running
 msStatus:               running
 rsStatus:               running

MS and RS Services running and as well as cellsrv.

And if you look above make model show as fake hardware since we run as simulator not original exadata hardware.

This concludes the storage software installation and in next post we will see how to use this grid disks as ASM disks.

Performance Troubleshooting Series : Troubleshooting Instance Memory Structures (SGA + PGA) Usage

In the previous series we have seen how to troubleshoot IO, CPU and Database TimeSpent/Consumed,

Today's Post I will be sharing with you the most common issues or to tackle memory related questions.

As we all know Oracle does manage things (blocks/modifications) in memory and track them in memory and sort them in memory and store them in memory and users first read the memory structures only not the disk.

So how big the memory should be, Does that memory is adequate sized, if not what kind of waits you will be seeing specifically for those memory structures like PGA, Shared Pool, Buffer Cache etc.

I will be starting as of 10g only, these so called memory structures are managed by oracle by MMAN & MMNL processes when sga_target/memory_max_target is set > 0. That means if any of the pools with in SGA is required to adjust with in themselves these background process help to do that for you. In earlier versions , DBA has to manually resize them as require during batch timings and during day timings (I am the one of those DBA too who did it often previously). So that relieves a major work from DBA. But there are many questions about SGA utilization and PGA utilization and sizing and historical usage.

Okay, now proceed with typical question and answers and related troubleshooting tips.

Question	Views Needed	Description	Historical Queries
A new database, How much SGA/PGA you need?	None	This is tricky, you cannot fit entire database into memory. So ideally when a new database anticipating the common sizing start with 2g for medium and 4g for larger instances and you will need to capacity plan until you have some baseline. To capacity plan the next row will answer the question
A older database, Capacity Plan the SGA/PGA?	Current:- v$sga_target_advice v$pga_target_advice v$shared_pool_advice v$db_cache_advice v$memory_target_advice historical:- dba_hist*	To Understand Current/Historical SGA sizing and performance SQL> select * from v$sga_target_advice; SQL> select * from dba_hist_sga_target_advice where snap_id =&id; Notes:- See at column sga_size_factor, where there is 1, its your current setting of sga, so when you have this setting size for example in above case 13gb the number of estimated physical reads were 387416890, if we increase to double the size i.e where the sga_size_factor is 2, i.e 27gb (last line) our database just have 113358182 physical reads. So there may be benefit. If benefit is less that means your sga is adequate and need to increase. Not only physical reads you can see the estimated db time from 7048407 to 6947617 it varies, that means I dont have benefit in terms of DB time but may benefit from physical reads. If you are running a 11g memory management features i.e SGA+PGA combined use v$memory_target_advice instead. To Understand Current/Historical PGA Sizing and performance SQL> select * from v$pga_target_advice; SQL> select * from dba_hist_pga_target_advice where snap_id=&id; Good one Bad One What you have observed between good and bad screenshots? The current value again stays with pga_target_factor column and where it shows 1, that means the current pga is 9gb, where in the difference is and a rule of thumb the estimated_overalloc_count should always be 0 at your current setting . If not you may need to consider increasing the pga_aggregate_target parameter, So in Good screenshot at size 1 it has 0 estimated_overalloc_count where in the Bad screenshot it shows that 94 times oracle has to do more asking os calls/internal to increase pga. Which is not good. Although pga_aggregate_target parameter if you set at ex: 9gb , if its required session server process will use more than 9gb from os free memory with sysop calls. but if its passing that limit of 9gb it has to fetch from OS free memory and increase estimated overalloc count, if its not able to get free memory then session/server process fails with ora-4030 error, process out of memory error. So pga limit is just a soft limit until 11g, in 12c you can restrict this behaviour with parameter pga_aggregate_limit With above, you might have understood the sizing of pga and sga setting in database is adequate or not, and those historical ones too. This will be a starting point to any of the DBA's to estimate the sga/pga sizing. If you want to understand about specifically buffer_pool and shared_pool advisory you can read from here
Session/Statement utilization of SGA/PGA ?	v$sgastat v$pgastat v$process_memory v$process_memory_detail v$sesstat v$active_session_history dba_hist_active_sess_history	For SGA:- To understand current sizes for each pool in the instance? SQL> select pool,sum(bytes)/1024/1024 from v$sgastat group by pool; To Understand current free size with in each in the instance? SQL> select pool,name,sum(bytes)/1024/1024 from v$sgastat where name='free memory' group by pool,name; To Understand current utilization of shared pool by each object that occupied, SQL> select type,round(sum(sharable_mem)/1024/1024,2) from v$db_object_cache group by type order by 2 desc; As you see above cursors i.e plain sql statements using 2.2 gb and cursor statistics taking 450mb of utilization and rest others. So here most of my shared pool utilization is going for simple sql statements. If you want you can further break this query with specific user group by. To understand current utilization by each statement, SQL> select sql_id,sum(sharable_mem) from v$sql where rownum < 100 group by sql_id order by 2 desc; For PGA:- To Understand total Process memory usage overall, SQL> select category,sum(allocated)/1024/1024 AllocMB,sum(used) UsedMB, sum(max_allocated)/1024/1024 MaxAllocMB from v$process_memory group by category; To Understand specific session PGA utilization under which area. SQL> SELECT s.sid,pm.allocated/1024/1024 Alloc_MB, used/1024/1024 UsedMB, Max_allocated/1024/1024 MaxAllocMB FROM v$session s, v$process p , v$process_memory pm WHERE s.paddr = p.addr AND p.pid = pm.pid AND s.sid IN (&1) ORDER BY sid, category; As you see above the majority area goes to Other section of PGA consumption that means, Hash/Sorting etc. We need to understand what causing more pga utilization using workarea active. Work areas are sort area and hash area with in PGA to store the data that need to hash grouped or sorted. SQL> select inst_id,sid,sql_id,operation_type,sum(actual_mem_used)/1024/1024,sum(work_area_size)/1024/1024, sum(tempseg_size)/1024/1024 from gv$sql_workarea_active group by inst_id,sid,sql_id,operation_type; As you see which SID, SQL_ID and which area of workarea i.e hashing or sorting using more PGA can be found easily above. Run the current_pga_by_session.sql also provide the information overall usage for each session as below. From 11R2 Onwards you can get individual pga allocation from v$active_session_history only. sql> select session_id,event,status,session_state,pga_allocated,temp_space_allocated from v$active_session_history where session_id=''; For Historical you run sql> select session_id,event,status,session_state,pga_allocated,temp_space_allocated from dba_hist_active_sess_history where session_id=&sessionid and snap_id between &BEGIN_SNAP and &END_SNAP; From historical metrics, you can access the pga allocation for each snap by SQL> select snap_id,metric_name,max_val,avg_val from dba_hist_sysmetric_summary where metric_name='Total PGA Allocated'; As you got SQL_ID & SID you can see further where is full table scan is happening and reducing the full scan can eliminate hash area usage thus by pga consumption. And for sorts look out data and if it really needed to keep order by clause.
Does instance/sessions is experiencing any memory related waits?	v$system_event v$session v$active_session_history dba_hist_active_sess_history dba_hist_system_event	If you have inadequate sga or pga there may be some waits that user experience, the following are the most common waits that can visible in wait event column for those sessions. For SGA:- For Buffer Pool:- free buffer wait, either your buffer pool is inadequate or dbwr is not enough to write quickly to the disk and the session waiting on this wait event to keep their blocks into buffer cache and waiting. For Shared Pool:- latch free or latch: shared pool, these waits will be occur when there is in adequate shared pool allocated to instance and waiting to get an allocation of memory structure to parse query (SQLAREA) For log_buffer:- redo buffer allocation retries, although this pool is not adjusted dynamically and if you less size allocated you will see this wait even its time to increase the log buffer too. For PGA:- direct path read, direct path read direct events appear if there is more PGA is consumed actually its a adverse affect, direct path reads consume pga memory and keep buffers in pga rather sga buffer pool. This is more in 11gr2 which should be addressed properly. Note: latch cache buffer chains, library cache lock, cursor pin x/s, row cache lock, buffer busy wait etc, are related to concurrency issues not memory related waits. Identify Wait Events from Current (V$session/v$active_session_history) or from AWR/Statspack For Current:-
Common Errors that can be caused by inadequate memory structures	Alert Log Session/Query Failures	For Shared Pool:- ora-4031 Usually a statement failed or in alert you will see this ORA-04031: "unable to allocate n bytes of shared memory ("shared pool", "object_name", "alloc type(2,0)" ...) Few things note in above line, which pool Shared Pool, object name is namespace in db_object_cache like sql area, plsql or table or which part of shared pool structure it is unable to allocate, and how many bytes it unable to. Generally, whenever a new query is need to parsed that need to obtain a latch for specific size bucket for the query size and memory structure is obtained and parsing will be done. In this process when a memory structure is not available for that query size particularly then its leads to this issue. However small sized shared pool also can lead this since there is no free memory and no latch to obtain therefore no memory structure. Hence query fails. Again, shared pool is divided into multiple subpools with number of buckets with different sizes, you can see this output and query to check in your database shared_pool_buckets.sql As you see I have two pools (sgaheap(1,0) and sgaheap(2,0) with in shared pool and each pool contains free memory chunks of different sizes and also reserved free memory. So if my query requires for example 4K chunk and it can go to any pool and obtain the free chunk out of it. As many queries fire against of same size and 4K chunk is 0 then if any other query goes to pool 1 and there is no free chunk available it fails with ora-4031 unable to allocate 4096 bytes (sqlarea.....) like this. Unfortunately, even if you have pool 2 having 4k Chunk it will not allocated to the failed query. Since your query has acquired a latch for pool 1. How to resolve it, So first identify whether your shared pool has free memory with in pool, SQL> select pool,name,sum(bytes)/1024/1024 from v$sgastat where name='free memory' group by pool,name; So as such you have free memory available even still the query is failing means you got the above issue, i.e shared pool fragmentation, since the shared pool is divided into multiple pools and the free chunks are distributed evenly/proportionally to the pools. This is required to reduce the waiting on long list of latch and introduced from 8i I think. Again, unluckily there is no hard fact rule how many pools you need for a given sized shared pool, so its upto you how many pools you need, try to reduce the pools to the average, for example give a size of shared pool about 10g you may have 2 to 3 pools, if so try to reduce to 2. Note:- Do not reduce to 1 never, this may cause another issue long waiting of latching queue. since there is only one pool and latch list will be long and hence parsing time will be more (indirectly CPU consumption). How to check how many pools and the parameter, SQL> select distinct ksmdsidx from x$ksmss; KSMDSIDX ---------- 1 2 SQL> Show parameter _kghdsidx_count 2 Note: Changing this parameter may need bounce For PGA - ora-4030 ora-4030 , out of process memory, There are numerous possibilities for this issue, however in an normal running system and no issues before, the following may be the cause 1. Inadequate free memory at OS If you see more resize operations like PGA_TARGET GROW always then the PGA is inadequate and you must increase the PGA or try to reduce the following. Secondly, Identify where PGA utilization is more, SQL> select category,sum(allocated)/1024/1024 AllocMB,sum(used) UsedMB, sum(max_allocated)/1024/1024 MaxAllocMB from v$process_memory group by category; 1. More hash joins i.e more full table scans. Identify the tables/statements that involved in tablescans_by_session.sql fulltable_scans_statements.sql 2. More direct path reads, Identify tables that are involved in direct path reads which consume pga (11gr2) and avoid full tablescans on them 3. More Sorting, identify which statement having more sorts
Improving Shared Pool Efficiency	v$sqlstat v$sqlarea v$library_cache_memory v$sql_shared_pool	First understand how is your shared pool utilization, SQL> select round(bytes/(10241024),0) count, 'Shared Pool Size' info from v$sgainfo where name = 'Shared Pool Size' union all select count(), 'Unique sql statements' from v$sql_shared_cursor where child_number = 0 union all select count(), 'Same sql but not first copy'from v$sql_shared_cursor where child_number >= 1 union all select count(), 'Execution plans' from v$sql_shared_cursor union all select count(), 'Count from sqlarea' from v$sqlarea; As you see, the database is more unique statements than common statements and closer number of executions plans observed too. That means this database is firing more unique statements and cannot be shared. Secondly, how is library cache utilization under which namespace, SQL> select from v$library_cache_memory; As we observed above the library is used mostly in sqlarea i.e cursors or statements not the others. Solution:-1 1. Sharing your cursors at application level, if not possible at DB level with cursor_sharing=force or similar , read here 2. If you see in time model statistics the soft parse is taking more time, then increase session_cached_cursors and cursor_space_for_time=true. at parameter level. (mostly you will be seeing library cache lock and library cache latch or cursor pin s etc as wait events for this sessions) 3. Identify the plans that frequently changing by using sql_plan_changes.sql and fix them with profile may help. 4. Reduce the DDL while DML is running/batches or running to avoid more invalidations, like statistics collection, partition maintenance Run this query it will give overall picture of shared pool utilization of top 5 statements in terms of disk/buffer/memory/cpu/executions:- current_expensive_queries.sql
Improving Buffer Pool Efficiency	v$bh v$buffer_pool x$bh	First understand how many buffers you have in the buffer pool at the moment. SQL> SELECT name, block_size, SUM(buffers) FROM V$BUFFER_POOLGROUP BY name, block_size HAVING SUM(buffers) > 0; NAME BLOCK_SIZE SUM(BUFFERS) -------------------- ---------- ------------ DEFAULT 8192 1174567 Second identify which object taking more buffer pool usage buffer_pool_top_object.sql Next, identify those statements which consuming more memory with in buffer pool and check those execution plan, Again if full tablescans happening try to reduce them or may be check the logic or may increase the buffer pool. SQL> select to_char(last_active_time,'DD-MON HH24:MI'), sql_id,sql_text,executions,rows_processed/executions, disk_reads/executions,buffer_gets/executions from v$sql where sql_text like '%&top_table_above_in_buffer_pool%' order by last_active_time,buffer_gets;

Performance Troubleshooting Series : Understanding Database CPU – Consumers – Waiters – Average Active Sessions

Hello,

Previous parts in the series we have covered, DB Load Profile, DB Load Profile2, Time Model, IO Performance.

In Today's series, we will see, I cover both current/AWR in one after another below, the scripts are attached and linked for each screenshot for your later use. (It took longer than usual, as such I have to keep lot of screenshot and scripts and note up, please take time to read this long post)

We shall discuss the following in this post,

1. Troubleshoot the high CPU utilization alert (Current)

2. Identifying the high CPU utilization during some time back (AWR)

By Looking at below two different snapshot from an hour awr report, What you understand especially about the CPU. (Just sticking to this post "CPU" :))

Question: Is CPU Time 70% is waiting or consuming?

Answer: Its consuming, i.e 70% of database time is consuming on CPU only

Question: Is resmgr: cpu quantum is waiting or consuming?

Answer: Its waiting, i.e 70% of database time is went to waiting CPU

Note:- Most of the today's oracle database has resource manager active if so the above event can be visible.

Remember, when you see CPU Time, its time spend on CPU, you will need to

check which statements consuming more CPU.

Remember, when you see resmgr: cpu quantum in top its wait event,

its that the sessions waiting to get CPU, so your database is either having less CPU than it should or something running nasty.

So here, how to know whether this time spent on CPU is good or bad?

The yard stick: Average Active Sessions

If your average active sessions are below the CPU_COUNT, then you still have some other CPU available and database sessions need not to wait for it.

How to find average active sessions that spending time on CPU? AAS can be found from many ways

current_aas.sql (from metrics minutely input no of minutes to check)

ash_aas.sql (from ash of last two hours by minute)

dbash_aas.sql (from dbash more than 2 hours by hourly given days)

As like below screenshot.

awr_aas.sql (from awr from previous snaps by hourly given days)

statspack_aas.sql (from statspack by hourly given days)

So here, as we evidently know that CPU starvation is happening in the database,

you must check is this one time ? or all snaps or sessions in the database having same event appearing in top/current

How to find the sessions waiting for CPU?

current_cpu_waits.sql (v$session)

ash_cpu_waits.sql (from active session history last 2 hours)

dbash_waits.sql (from dba_active_sess_history more than 2 hours)

awr_cpu_waits.sql (from awr)

statspack_cpu_waits.sql (from statspack)

Which process is consuming more CPU?

If you have ASH, run ash_top_sessions.sql (Kyle Hailey Script),

As you see the , the sessions that taking more CPU or IO or Waiting is displaying as below

And you can pin point which sessions, which sql_id and so on,

As you see SID 65 is having 4 samples of CPU count in last 1 minute

and 717 having 1 sample of waiting. So sid 65 taking more cpu than others.

If you do not have ASH, run Current_CPU_Queries.sql or Current_expensive_queries.sql

As you see, the SID 3782 taking more CPU than others and some other sessions, these

are culprit session that consuming more CPU, Wait for sometime and if this sessions

stay long time then its time to check execution plan for it

If you are looking for previous time, run AWR_Top_CPU_Queries.sql and identify the top cpu consuming

statement

For ex: awr_top_cpu.sql 1 2701 2702

Then run, ash_top_sessions_dash.sql (from dba_hist_active_session_history) to identify which session has

run the top statement sql_id (gnycbx7rujv7g, fvv19kqad3vj3, 5x2anu2tnahca)

For ex: dbsession_for_sqlid.sql

And if you do not have the AWR, run Statspack_CPU_Queries.sql

What can cause sessions to wait for CPU?

1. Less CPU's than the database should have. (undersized database)

2. Couple of sessions spinning on CPU for latches and other sessions more waiting for CPU with 'resmgr: cpu quantum' wait event.

Identify them, Which process is consuming is more CPU? (the other column beside this)

1. Current_CPU_Queries.sql

2. ash_top_cpu.sql

3. Statspack_Latch_waits.sql

Now, you identified the Top CPU utilized statements, its time to understand how this statement performed previously

1. Run awr_sql_run_history.sql and look at two columns especially, PLAN_HASH_VALUE and CPU_COST

this tells you, is there any change in the plan with current plan and cost of CPU.

For awr_sql_run_history.sql 1 sql_id

If so cost is higher earlier, this SQL must be tuned or need to be analyze the logic.

If so Plan change has cause the CPU spike for this statement, there must be statistics or optimizer picking up nested loops more.

Solution: Collect stats for those tables in the query and see or logic need to be revisited.

Solution2: If you want a fix permanently, create sql profile for this statement (I will post it series next)

If you don't have AWR, run Statspack_sql_run_history.sql

Now you know the CPU starvation is one time, or appearing every time

1. If one time, consider tune the sql statements or reduce concurrency of statements

2. If its appearing all the time, time to scale the database for a better number of CPU

After identifying the problematic query, you may need to see if there is any change in statement executions,reads,cost etc.

Secondly, change in plan to check plan_hash_value column.

More notes on over consuming CPU queries.

Generally over nested loops, merge join cartesian, more number of hard parses, more invalidations of queries and need to reparse, latching them can cause more CPU utilization.

1. Nested Loops:- Optimizer do nested loops whenever it sees the join with small tables or smaller set of rows to complete as quick as possible, but how does it know it is small tables (or smaller set), Answer Statistics, If statistics given is wrong to optimizer it may tend to do more nested loops instead of hash join, originally the table is big but due to missing stats the table shown to optimizer as small, so correcting statistics may help to reduce the nested loops.

2. More Hard Parses:- Check in v$sql or dba_hist_sqlstat for the sql_id with executions and parse_calls column if both are same that means this statement is reparsing again and again rather reusing in shared pool. Can be any of the following reasons

1. While the statistics running , your statement is also running and causing more invalidations and need to do reparsing again and again.

2. Cursor_sharing = EXACT, (read more here)

3. Table maintenance operations (Alter, Alter index etc) or any DDL on the underlying object causing the statements to reload and reparse again in shared pool.

3. Merge Join Cartesian:- When you see the execution plan with this access path , then it may be cause that either one of the join condition is missing or statistics again causing to see table as 0 rows and actually table has rows. In my observation some time having 0 rows/some rows can lead to this inaccurate access paths especially with staging tables ( not global temporary tables)

4. Latching & Concurrency:- Due to small shared pool more number of sessions waiting to aquire latch and queue is increased, you may see latch:free or latch library cache lock etc in session wait events and this may cause spin on cpu, identifying the root of blocker and kill it may help in this cases. For example a package is executing by many sessions and this has been modified by one sessions can cause more invalidations and locks and latch release/acquire for many sessions at one time.

There may be number of ways to identify the similar information and everyone has their own approach and some can be quicker but some can be time taking, the above information is purely for learning purposes and also it has lot of queries appended due to the nature of the issue. It will be good to cover the post in all angles rather sticking to one area like current or past, I tried to put every possible current,past,ash,noash,statspack queries attached where they may be useful to some one who does not have all.

-Hope this helps

Next in Series: Troubleshooting Enqueues (Locks)

Geek DBA

All about Database Administration, Tips & Tricks

New Features for DBA’s

Subscribe to Posts by Email

Subscriber Count

Disclaimer

Recent Posts

Categories

Archives

Pages

Performance Troubleshooting Series: Query Parsing & Execution Times

Configuring RAC on Amazon Cloud

12c Database – Degraded Query Performance – 12c Clients with Queries on Lob tables

Result Cache: RC Latch – Troubleshooting

Missing Diskgroup in CRS – Adding Manually

Performance Troubleshooting Series : Identifying Problematic Sessions or Queries (Method 1)

Performance Troubleshooting Series : Identifying Problematic Sessions or Queries (Preface)

Exadata Installation (only for test and education purpose)

Performance Troubleshooting Series : Troubleshooting Instance Memory Structures (SGA + PGA) Usage

Performance Troubleshooting Series : Understanding Database CPU – Consumers – Waiters – Average Active Sessions

New Features for DBA’s

Follow Me!!!

Subscribe to Posts by Email

Subscriber Count

Disclaimer

Recent Posts

Categories

Archives

Tags

Pages