Subscribe to Posts by Email

Subscriber Count

    696

Disclaimer

All information is offered in good faith and in the hope that it may be of use for educational purpose and for Database community purpose, but is not guaranteed to be correct, up to date or suitable for any particular purpose. db.geeksinsight.com accepts no liability in respect of this information or its use. This site is independent of and does not represent Oracle Corporation in any way. Oracle does not officially sponsor, approve, or endorse this site or its content and if notify any such I am happy to remove. Product and company names mentioned in this website may be the trademarks of their respective owners and published here for informational purpose only. This is my personal blog. The views expressed on these pages are mine and learnt from other blogs and bloggers and to enhance and support the DBA community and this web blog does not represent the thoughts, intentions, plans or strategies of my current employer nor the Oracle and its affiliates or any other companies. And this website does not offer or take profit for providing these content and this is purely non-profit and for educational purpose only. If you see any issues with Content and copy write issues, I am happy to remove if you notify me. Contact Geek DBA Team, via geeksinsights@gmail.com

Pages

Oracle Execution Plan: Common Join Methods explained

In Oracle execution plan, you often see Nested loops/Hash Joins/Merge Joins etc. These are called Join methods between row sources (tables/result sets of query).

You are can find more about Rowsource Join Methods here in the documentation

There are many join methods, out of all the following were common and generic

  • Nested loop join  
  • Hash join 
  • Sort merge join
  • Merge Join Cartesian

Nested Loop Join

Consider the following query

SQL> Select tab1.*, tab2.* from tabl, tab2 where tabl.col1=tab2.col2;

In terms of coding (c ) logic It is processed like:

For i in (select * from tab1) loop
For j in (select * from tab2 where col2=i.col1) loop
Display results;
End loop;
End loop;

The Steps involved in doing nested loop are:

a) Identify outer (driving) table

b) Assign inner (driven) table to outer table.

c) For every row of outer table, access the rows of inner table.

In execution plan it is seen like this:

NESTED LOOPS
outer_loop
inner_loop

 

--------------------------------------------------------------------------------
| Id | Operation 			| Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------
| 0 | SELECT STATEMENT 		| 	| 100 | 2200| 6 (0) | 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID	| E 	| 25  | 225 | 1 (0) | 00:00:01 |
| 2 | NESTED LOOPS 		| 	| 100 | 2200| 6 (0) | 00:00:01 |
| 3 | TABLE ACCESS FULL 		| D 	| 4   | 52  | 3 (0) | 00:00:01 |
|* 4| INDEX RANGE SCAN 		| E_DEPT | 33  |     | 0 (0) | 00:00:01 |
--------------------------------------------------------------------------------

When optimizer uses nested loops?

Optimizer uses nested loop when we are joining tables containing small number of rows with an efficient driving condition. It is important to have an index on column of inner join table as this table is probed every time for a new value from outer table.

Optimizer may not use nested loop in case:

  • No of rows of both the table is quite high
  • Inner query always results in same set of records
  • The access path of inner table is independent of data coming from outer table.

Note: You will see more use of nested loop when using FIRST_ROWS optimizer mode as it works on model of showing instantaneous results to user as they are fetched. There is no need for selecting caching any data before it is returned to user. In case of hash join it is needed and is explained below.

Hash join

Hash joins are used when the joining large tables.The optimizer uses smaller of the 2 tables to build a hash table in memory and the scans the large tables and compares the hash value (of rows from large table) with this hash table to find the joined rows.

The algorithm of hash join is divided in two parts

  1. Build a in-memory hash table on smaller of the two tables.
  2. Probe this hash table with hash value for each row second table

In simpler terms it works like

Build phase

For each row RW1 in small (left/build) table loop

Calculate hash value on RW1 join key

Insert RW1 in appropriate hash bucket.

End loop;

Probe Phase

For each row RW2 in big (right/probe) table loop

Calculate the hash value on RW2 join key

For each row RW1 in hash table loop

If RW1 joins with RW2

Return RW1, RW2

End loop;

End loop;

When optimizer uses hash join?

Optimizer uses has join while joining big tables or big fraction of small tables.

Unlike nested loop, the output of hash join result is not instantaneous as hash joining is blocked on building up hash table.

Note: You may see more hash joins used with ALL_ROWS optimizer mode, because it works on model of showing results after all the rows of at least one of the tables are hashed in hash table.

Example:- For the example above I have no index and both table E and D are having large result set

-----------------------------------------------------------------------------------
| Id| Operation 	         | Name | Rows | Bytes	|TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------
| 0 | SELECT STATEMENT 	| 	| 250G| 5122G	| 	| 3968K(100)| 13:13:45 |
|* 1| HASH JOIN 	         | 	| 250G| 5122G	| 20M	| 3968K(100)| 13:13:45 |
| 2 | TABLE ACCESS FULL	| E 	| 1000K| 8789K	| 	| 2246 (3)  | 00:00:27 |
| 3 | TABLE ACCESS FULL	| D 	| 1000K| 12M	| 	| 2227 (2)  | 00:00:27 |
-----------------------------------------------------------------------------------

Sort merge join

Sort merge join is used to join two independent data sources. They perform better than nested loop when the volume of data is big in tables but not as good as hash joins in general.

They perform better than hash join when the join condition columns are already sorted or there is no sorting required.

The full operation is done in two parts:

  • Sort join operation

get first row RW1 from input1

get first row RW2 from input2.



  • Merge join operation

          while not at the end of either input loop

          if RW1 joins with RW2

          get next row R2 from input 2

          return (RW1, RW2)

          else if RW1 < style=""> get next row RW1 from input 1

          else

          get next row RW2 from input 2

         end loop

Note: If the data is already sorted, first step is avoided.

Important point to understand is, unlike nested loop where driven (inner) table is read as many number of times as the input from outer table, in sort merge join each of the tables involved are accessed at most once. So they prove to be better than nested loop when the data set is large.

When optimizer uses Sort merge join?

When the join condition is an inequality condition (like <, <=, >=). This is because hash join cannot be used for inequality conditions and if the data set is large, nested loop is definitely not an option.

If sorting is anyways required due to some other attribute (other than join) like “order by”, optimizer prefers sort merge join over hash join as it is cheaper.

Note: Sort merge join can be seen with both ALL_ROWS and FIRST_ROWS optimizer hint because it works on a model of first sorting both the data sources and then start returning the results. So if the data set is large and you have FIRST_ROWS as optimizer goal, optimizer may prefer sort merge join over nested loop because of large data. And if you have ALL_ROWS as optimizer goal and if any inequality condition is used the SQL, optimizer may use sort-merge join over hash join

Example:-

select e.ename,d.dname from e, d where e.deptno=d.deptno order by e.deptno;

-----------------------------------------------------------------------------------------
| Id | Operation 		     | Name 		| Rows | Bytes | Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT 	     | 		| 2500K| 52M	| 167 (26)| 00:00:02 |
| 1 | MERGE JOIN 		     | 		| 2500K| 52M	| 167 (26)| 00:00:02 |
| 2 | TABLE ACCESS BYINDEX ROWID | E 	| 10000| 90000    | 102 (1) | 00:00:02 |
| 3 | INDEX FULL SCAN 	     | E_DEPTNO 	| 10000| 	         | 100 (0) | 00:00:02 |
|* 4 | SORT JOIN 		     | 		| 1000 | 13000 	| 25 (4)  | 00:00:01 |
| 5 | TABLE ACCESS FULL	     | D 		| 1000 | 13000 	| 24 (0)  | 00:00:01 |
-----------------------------------------------------------------------------------------

Merge Join Cartesian

A Cartesian join is used when one or more of the tables does not have any join conditions to any other tables in the statement. The optimizer joins every row from one data source with every row from the other data source, creating the Cartesian product of the two sets.

In some cases, a common filter condition between the two tables could be picked up by the optimizer as a possible join condition. This is even more dangerous, because the joins are not flagged in the execution plan as being a Cartesian product.

Cartesian joins generally result from poorly written SQL. has three tables in the FROM clause, but only one join condition joining the two tables.

Example:-

Cartesian Join with Distinct Clause
SELECT DISTINCT h.order_id, l.line_item_id, l.quantity
  FROM order_items l, orders h, order_items l2
 WHERE h.customer_id = :b1
   AND l.order_id = h.order_id;

Plan
-------------------------------------------------
SELECT STATEMENT
 SORT UNIQUE
  MERGE JOIN CARTESIAN
   NESTED LOOPS
    TABLE ACCESS BY INDEX ROWID ORDERS
     INDEX RANGE SCAN ORDER_ID
    TABLE ACCESS BY INDEX ROWID ORDER_ITEMS
     INDEX RANGE SCAN ORDER_ID
   SORT JOIN
    INDEX FAST FULL SCAN ORDER_ID

Here in above case Orders and Order Items is involved in Merge Join cartesian with tight nested loops and join condition between the third table is missing for example.

A = B , B = C must be there, for the third table Oracle automatically creates the condition C=A which is called transitivity retain “retain equi-join pred upon transitive equality pred generation”, Jonathan lewis.

Most of the cases when an merge join Cartesian is happening, below are the possible cases

Check Join Conditions (Most possible when an developer is writing new piece of code)

Start with the table joins. Check if there are any join conditions missing between the given tables in the query. As I said, Cartesian product is only possible when two tables have Many to Many join relation. In normal scenarios, the joins should be of the type: (a.primary_key = b.foreign_key) In most of the cases, adding proper join conditions will remove the Merge Join Cartesian from the plan.

Stale Statistics:- ( Most possible when a huge purge/load has happened to database and stats were stale)

The optimizer selects the best execution plan on the basis of the object statistics. If the statistics are stale, the plan might not be the optimal plan. Gather the statistics using DBMS_STATS package.

Optimizer Bug (_optimizer_transitivity_retain)

The issue with Optimizer regarding (_optimizer_transitivity_retain)

…..

There are other Join methods, for which I gather and add here soon.

-Thanks

Geek DBA

1 comment to Oracle Execution Plan: Common Join Methods explained