Subscribe to Posts by Email

Subscriber Count

    696

Disclaimer

All information is offered in good faith and in the hope that it may be of use for educational purpose and for Database community purpose, but is not guaranteed to be correct, up to date or suitable for any particular purpose. db.geeksinsight.com accepts no liability in respect of this information or its use. This site is independent of and does not represent Oracle Corporation in any way. Oracle does not officially sponsor, approve, or endorse this site or its content and if notify any such I am happy to remove. Product and company names mentioned in this website may be the trademarks of their respective owners and published here for informational purpose only. This is my personal blog. The views expressed on these pages are mine and learnt from other blogs and bloggers and to enhance and support the DBA community and this web blog does not represent the thoughts, intentions, plans or strategies of my current employer nor the Oracle and its affiliates or any other companies. And this website does not offer or take profit for providing these content and this is purely non-profit and for educational purpose only. If you see any issues with Content and copy write issues, I am happy to remove if you notify me. Contact Geek DBA Team, via geeksinsights@gmail.com

Pages

12c Database : Index clustering factor – Do your own math

From 12c, one can influence the clustering factor i.e set at custom level.

For those who does not know about clustering factor and how does it influence the optimizer to use index or not. Suggesting to read the following, if you already know skip the following and proceed to test case.

The clustering factor is a number which represent the degree to which data is randomly distributed in a table. It is the number of "block switches" while reading a table using an index.

It is used by Oracle's optimizer to help determine the cost associated with index range scans in comparison to full table scans.

To calculate the clustering factor of an index during the gathering of index statistics, Oracle does the following.

For each entry in the index Oracle compares the entry's table rowid block with the block of the previous index entry.
If the block is different, Oracle increments the clustering factor by 1.

Typically, Clustering factor can drastically increase if the table insertions or unordered as the index entry has to revisit the block again and again.

In addition to above, if the table is in ASSM tablespace and concurrent inserts from multiple sessions (parallel sessions) can increase clustering factor due to the fact of freelists and each session will use it own block to insert the same value of data and apparently index entries are have to unordered.

For example:-

	
		Table (6 rows) (3 blocks)
		Id	Name
		1	XXX
		2	XXX
		3	XXX	
		3	XXX	
		2	XXX
		1	XXX
	
		Index will look like this
		Good Clustering factor
		
			Root 
			Branch 1 - 4
			Index Block - Key 	Table block Entries
			Leaf 1 	    - 1		Block 1 - Rows 1
			Leaf 2 	    - 2		Block 2 - Rows 2
			Leaf 3      - 3		Block 3 - Rows 3
	
		Clustering factor will be close to 3 i.e num of blocks, since this need not revist the block again.
	
		Bad clustering factor (unordered table insertions)
			Root 
			Branch 1 - 4
			Index Block - Key 	Table block Entries
			Leaf 1 	      -1 	Block 1 - 1,2
			Leaf 2        -2 	Block 2 - 3,1
			Leaf 3        -3 	Block 3 - 2,3
			
Clustering factor will be close to 5 i.e num of blocks, since this has to revisit the table block again and again,
To search for a value 1, it has to revist the table block 1 and table block 3 and increase the CF from 3 to 4 accordingly
Again for value 3, it has to revist the table block 2, and 3 , and thus increase the CF from 4 to 5 accordingly
As with bad clustering factor optimizer sees there is no benefit to access the table via index it will choose full tablescan.

Test Case:-

Create a table with simple index and perform some deletes and inserts of same values.
This makes the table unordered and also the index entries to revisit the table blocks again 
and again. making clustering factor on higher side.
If so the optimizer will choose full table scan rather index scan and will not use index at all.

	SQL> create table TEST as select object_id,object_name from all_objects;

	Table created.

	SQL> create index test_idx on TEST(object_id) compute statistics;

	Index created.

	/* deleted some rows */
	SQL> delete from TEST where rownum < 10000;

	9999 rows deleted.

	SQL> commit;

	Commit complete.

	/* inserted the rows again with same values that has been deleted and 10000 rows more to get duplicate values */
	SQL>  insert into TEST select object_id,object_name from all_objects where rownum < 20000;

	19999 rows created.

	SQL> commit;

	Commit complete.

	SQL> exec dbms_stats.gather_table_stats(ownname=>'OE',tabname=>'TEST',cascade=>true);

	PL/SQL procedure successfully completed.

Check the index status

	SQL> select tablespace_name,index_name,num_rows leaf_blocks, avg_leaf_blocks_per_key, avg_data_blocks_per_key, clustering_factor, distinct_keys from dba_indexes where index_name='test_idx';

	TABLESPACE_NAME                INDEX_NAME LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY CLUSTERING_FACTOR DISTINCT_KEYS
	------------------------------ ---------- ----------- ----------------------- ----------------------- ----------------- -------------
	EXAMPLE                        test_idx               266                       1                       1             58346         90748

As you saw the index clustering factor is closer to num_rows which makes optimizer to derive index cost is
higher as it has to scan more than 50% of table blocks.

Take a look at execution plan

	SQL> explain plan for select object_id from TEST where object_id between 5000 and 20000;

	Explained.

	SQL>  select * from table(dbms_xplan.display());

	PLAN_TABLE_OUTPUT
	--------------------------------------------------------------------
	Plan hash value: 1033171814

	--------------------------------------------------------------------------
	| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
	--------------------------------------------------------------------------
	|   0 | SELECT STATEMENT  |      | 19413 | 97065 |   163   (1)| 00:00:01 |
	|*  1 |  TABLE ACCESS FULL| TEST  | 19413 | 97065 |   163   (1)| 00:00:01 |
	--------------------------------------------------------------------------

	Predicate Information (identified by operation id):
	---------------------------------------------------

	   1 - filter("OBJECT_ID"< =20000 AND "OBJECT_ID">=5000)

	13 rows selected.

As discussed, Full tablescan is performed rather index scan though we are just accessing 15% of the data.

To overcome this situation, 12c offers new table level statistics preference TABLE_CACHE_BLOCKS to set at reasonable (not sure what reasonable) value

The range you can set is from 1 to 255. This value depicts how much of table blocks already cached with index scan during stats collection. For example if you set maximum 255 while collecting index statistics it skips the incrementing the clustering factor by every 255 entries and so on.

Let's take a closer look at it.

	SQL>  exec dbms_stats.set_table_prefs(ownname=>'OE', tabname=>'TEST',pname=>'TABLE_CACHED_BLOCKS', pvalue=> 255);

	PL/SQL procedure successfully completed.

	SQL> EXEC dbms_stats.gather_table_stats(ownname=>'OE', tabname=>'TEST', estimate_percent=> null, cascade=> true,method_opt=>'FOR ALL COLUMNS SIZE 1');

	PL/SQL procedure successfully completed.

	SQL> select tablespace_name,index_name, leaf_blocks, avg_leaf_blocks_per_key, avg_data_blocks_per_key, clustering_factor, distinct_keys from dba_indexes where index_name='test_idx';

	TABLESPACE_NAME                INDEX_NAME LEAF_BLOCKS AVG_LEAF_BLOCKS_PER_KEY AVG_DATA_BLOCKS_PER_KEY CLUSTERING_FACTOR DISTINCT_KEYS
	------------------------------ ---------- ----------- ----------------------- ----------------------- ----------------- -------------
	EXAMPLE                        test_idx               266                       1                       1               960         90748


	SQL> explain plan for select object_id from TEST where object_id between 5000 and 20000;

	Explained.


	SQL> select * from table(dbms_xplan.display());

	PLAN_TABLE_OUTPUT
	--------------------------------------------------------------------
	Plan hash value: 3649445643

	-------------------------------------------------------------------------
	| Id  | Operation        | Name | Rows  | Bytes | Cost (%CPU)| Time     |
	-------------------------------------------------------------------------
	|   0 | SELECT STATEMENT |      | 19413 | 97065 |    45   (0)| 00:00:01 |
	|*  1 |  INDEX RANGE SCAN| test_idx | 19413 | 97065 |    45   (0)| 00:00:01 |
	-------------------------------------------------------------------------

	Predicate Information (identified by operation id):
	---------------------------------------------------

	   1 - access("OBJECT_ID">=5000 AND "OBJECT_ID"< =20000)

	13 rows selected.


Few notes;
	1) Value has been set to maximum 255
	2) clustering factor has came down from 58346 to 960 which is close to num of blocks
	3) Optimizer choose index scan rather full table scan
	
This is pesimistic approach, especially useful for volatile tables but note the following
	1) This setting is for whole table level preference , not for a single column
	2) Need Careful consideration of table_cached_blocks value to reasonable number

Comments are closed.