Affect of object Statistics on SQL Execution statistics

This is a small article to demonstrate why correct statistics are important and how they affect execution statistics of same plan.
In the past we learned that changing table statistics or index statistics (or rebuilding index) can causes plan for a SQL to change. Because when statistics changes, optimizer will try to generate new plan based on changed statistics.
With 11g, oracle provided baseline to ensure stability in SQL plans. So if you have single baseline enabled for a SQL, you essentially have single plan for that SQL and that plan will not change unless auto tuning job evolve another plan or you manually evolves another plan.

Does it mean that object statistics has no role to play if your plan is fixed ?

Lets have a quick demo. In my below exercise, I will not be changing the plan (as I am using baseline). But we will see that execution statistics such as buffer_gets and disk reads changes as you change object statistics.

I faced this issue on one of our production database where stats were completely inaccurate and when we gathered stats on tables and indexes, execution stats for SQLs changed with same plan.

Setup

I am performing these tests on 11.2.0.4.
Lets create 2 tables T1 and T2. I will use one query on single table to go for FTS and other query joining T1 and T2 and use index.


SQL>create table T1 as select * from dba_tables;

Table created.

SQL>insert into T1 select * from T1;

2804 rows created.

SQL>insert into T1 select * from T1;

5608 rows created.
...
...

SQL>commit;

Commit complete.

SQL>--create index on this table on TABLE_NAME column and gather stats

SQL>create index I_T1_TABLE_NAME on T1(TABLE_NAME);

Index created.

SQL>exec dbms_stats.gather_table_stats(null,'T1');

PL/SQL procedure successfully completed.

Create 2nd table from some other view, lets say dba_tab_statistics


DEO>create table T2 as select * from dba_tab_statistics;

Table created.

DEO>insert into T2 select * from T2;

16289 rows created.

...
...

DEO>commit;

Commit complete.

DEO>--create index on TABLE_NAME column in T2 table

DEO>create index I_T2_TABLE_NAME on T2(TABLE_NAME);

Index created.

DEO>exec dbms_stats.gather_table_stats(null,'T2');

PL/SQL procedure successfully completed.

Following are the table and index level stats values for T1 and T2 and corresponding indexes


DEO>select table_name, num_rows, blocks, avg_row_len from dba_tables where table_name in ('T1','T2');

TABLE_NAME			           NUM_ROWS   BLOCKS     AVG_ROW_LEN
------------------------------ ---------- ---------- -----------
T1				               34783	  3214	     247
T2				               298096	  4351	     99

DEO>select INDEX_NAME, LEAF_BLOCKS, DISTINCT_KEYS, CLUSTERING_FACTOR, NUM_ROWS from dba_indexes where table_name in ('T1','T2');

INDEX_NAME		               LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR NUM_ROWS
------------------------------ ----------- ------------- ----------------- ----------
I_T1_TABLE_NAME 		       360	       1414	         34675	           34783
I_T2_TABLE_NAME 		       1813	       2364	         74610             298096

So when object statistics are accurate and current, I see following execution statistics for following 2 queries

Query 1:

select a.table_name, sum(b.num_rows) from T1 a, T2 b where a.table_name= b.table_name and a.owner = b.owner and a.table_name = :b1 group by a.table_name;


DEO>var b1 varchar2(40);
DEO>exec :b1 := 'ABC';

PL/SQL procedure successfully completed.

DEO>select a.table_name, sum(b.num_rows) from T1 a, T2 b where a.table_name= b.table_name and a.owner = b.owner and a.table_name = :b1 group by a.table_name;

TABLE_NAME		               SUM(B.NUM_ROWS)
------------------------------ ---------------
ABC				               240678640


Execution Plan
----------------------------------------------------------
Plan hash value: 1483037242

--------------------------------------------------------------------------------------------------------
| Id  | Operation			                 | Name 	       | Rows  | Bytes | Cost (%CPU)| Time	   |
--------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT		             |		           |	 1 |	61 |	 4  (50)| 00:00:01 |
|   1 |  SORT GROUP BY NOSORT		         |		           |	 1 |	61 |	 4  (50)| 00:00:01 |
|   2 |   MERGE JOIN			             |		           |	47 |  2867 |	 4  (50)| 00:00:01 |
|   3 |    SORT JOIN			             |		           |	11 |   396 |	 2  (50)| 00:00:01 |
|   4 |     VIEW			                 | VW_GBC_5	       |	11 |   396 |	 2  (50)| 00:00:01 |
|   5 |      HASH GROUP BY		             |		           |	11 |   297 |	 2  (50)| 00:00:01 |
|   6 |       TABLE ACCESS BY INDEX ROWID    | T2		       |   126 |  3402 |	 1   (0)| 00:00:01 |
|*  7 |        INDEX RANGE SCAN 	         | I_T2_TABLE_NAME |   126 |	   |	 1   (0)| 00:00:01 |
|*  8 |    SORT JOIN			             |		           |	25 |   625 |	 2  (50)| 00:00:01 |
|   9 |     TABLE ACCESS BY INDEX ROWID      | T1		       |	25 |   625 |	 1   (0)| 00:00:01 |
|* 10 |      INDEX RANGE SCAN		         | I_T1_TABLE_NAME |	25 |	   |	 1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7 - access("B"."TABLE_NAME"=:B1)
   8 - access("A"."TABLE_NAME"="ITEM_2" AND "A"."OWNER"="ITEM_1")
       filter("A"."OWNER"="ITEM_1" AND "A"."TABLE_NAME"="ITEM_2")
  10 - access("A"."TABLE_NAME"=:B1)

Note
-----
   - SQL plan baseline "SQL_PLAN_gt8npsxnncgm2abc84fa9" used for this statement


Statistics
----------------------------------------------------------
	  1  recursive calls
	  0  db block gets
	157  consistent gets
	  3  physical reads
	  0  redo size
	598  bytes sent via SQL*Net to client
	453  bytes received via SQL*Net from client
	  2  SQL*Net roundtrips to/from client
	  2  sorts (memory)
	  0  sorts (disk)
	  1  rows processed

Above query has done 157 consistent gets and 3 physical reads.

Query 2:

select count(1) from T2 where owner = :b2;


DEO>var b2 varchar2(40);
DEO>exec :b2 := 'SYS';

PL/SQL procedure successfully completed.

DEO>select count(1) from T2 where owner = :b2;

  COUNT(1)
----------
    251088


Execution Plan
----------------------------------------------------------
Plan hash value: 3321871023

---------------------------------------------------------------------------
| Id  | Operation	       | Name | Rows  | Bytes | Cost (%CPU)| Time	  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |	  |	1     |	5     |   625   (2)| 00:00:03 |
|   1 |  SORT AGGREGATE    |	  |	1     |	5     |	           |	      |
|*  2 |   TABLE ACCESS FULL| T2   | 19873 | 99365 |   625   (2)| 00:00:03 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OWNER"=:B2)

Note
-----
   - SQL plan baseline "SQL_PLAN_9wq67xmrafzda1c6cf506" used for this statement


Statistics
----------------------------------------------------------
	  1  recursive calls
	  0  db block gets
   4350  consistent gets
	  0  physical reads
	  0  redo size
	517  bytes sent via SQL*Net to client
	453  bytes received via SQL*Net from client
	  2  SQL*Net roundtrips to/from client
	  0  sorts (memory)
	  0  sorts (disk)
	  1  rows processed

Above query has done 4350 consistent gets and no physical reads.

Lets fake statistics to random high value and see the affect. Note that both queries above are using baseline, so plan will not change with further executions even after changing object stats

I am going to set table stats and index stats as follows


DEO>exec dbms_stats.SET_TABLE_STATS(null,'T1',NUMROWS=>3221020,NUMBLKS=>202590, AVGRLEN=>150)

PL/SQL procedure successfully completed.

DEO>exec dbms_stats.SET_INDEX_STATS(null,'I_T1_TABLE_NAME',NUMROWS=>3153430,NUMLBLKS=>124232,NUMDIST=>6,AVGLBLK=>2990,AVGDBLK=>19002,CLSTFCT=>76010)

PL/SQL procedure successfully completed.

DEO>exec dbms_stats.SET_TABLE_STATS(null,'T2',NUMROWS=>140000000,NUMBLKS=>13540202, AVGRLEN=>120)

PL/SQL procedure successfully completed.

DEO>exec dbms_stats.SET_INDEX_STATS(null,'I_T2_TABLE_NAME',NUMROWS=>13345304,NUMLBLKS=>1242022,NUMDIST=>12,AVGLBLK=>324,AVGDBLK=>1342,CLSTFCT=>260123)

PL/SQL procedure successfully completed.

DEO>select table_name, num_rows, blocks, avg_row_len from dba_tables where table_name in ('T1','T2');

TABLE_NAME			           NUM_ROWS   BLOCKS     AVG_ROW_LEN
------------------------------ ---------- ---------- -----------
T1				               3221020    202590	 150
T2				               140000000  13540202	 120

DEO>select INDEX_NAME, LEAF_BLOCKS, DISTINCT_KEYS, CLUSTERING_FACTOR, NUM_ROWS from dba_indexes where table_name in ('T1','T2');

INDEX_NAME		               LEAF_BLOCKS DISTINCT_KEYS CLUSTERING_FACTOR NUM_ROWS
------------------------------ ----------- ------------- ----------------- ----------
I_T1_TABLE_NAME 		       124232	   6	         76010             3153430
I_T2_TABLE_NAME 		       1242022	   12	         260123            13345304


I will purge those queries from shared_pool and run those same queries again


DEO>!cat purgesql.sql
accept sql_id prompt 'Enter SQL_ID:- '
DECLARE
 name varchar2(50);
BEGIN
 select distinct address||','||hash_value into name
 from v$sqlarea
 where sql_id like '&sql_id';

 sys.dbms_shared_pool.purge(name,'C',65);

END;
/

DEO>@purgesql
Enter SQL_ID:- 46j56265bmw7u

PL/SQL procedure successfully completed.

DEO>@purgesql
Enter SQL_ID:- gkbtfpmvxw4hn

PL/SQL procedure successfully completed.

Lets run the queries again after changing the stats

Query 1:

select a.table_name, sum(b.num_rows) from T1 a, T2 b where a.table_name= b.table_name and a.owner = b.owner and a.table_name = :b1 group by a.table_name;


DEO>set autotrace on
DEO>select a.table_name, sum(b.num_rows) from T1 a, T2 b where a.table_name= b.table_name and a.owner = b.owner and a.table_name = :b1 group by a.table_name;

TABLE_NAME		               SUM(B.NUM_ROWS)
------------------------------ ---------------
ABC				               240678640


Execution Plan
----------------------------------------------------------
Plan hash value: 1483037242

----------------------------------------------------------------------------------------------------
| Id  | Operation			              | Name 	        | Rows  | Bytes | Cost (%CPU)| Time	   |
----------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT		          |		            |	 1  |	61  |	74   (9)| 00:00:01 |
|   1 |  SORT GROUP BY NOSORT		      |		            |	 1  |	61  |	74   (9)| 00:00:01 |
|   2 |   MERGE JOIN			          |		            |  4464 |   265K|	74   (9)| 00:00:01 |
|   3 |    SORT JOIN			          |		            |	11  |   396 |	71   (6)| 00:00:01 |
|   4 |     VIEW			              | VW_GBC_5	    |	11  |   396 |	71   (6)| 00:00:01 |
|   5 |      HASH GROUP BY		          |		            |	11  |   297 |	71   (6)| 00:00:01 |
|   6 |       TABLE ACCESS BY INDEX ROWID | T2		        | 59222 |  1561K|	67   (0)| 00:00:01 |
|*  7 |        INDEX RANGE SCAN 	      | I_T2_TABLE_NAME | 59222 |	    |	55   (0)| 00:00:01 |
|*  8 |    SORT JOIN			          |		            |  2278 | 56950 |	 3  (67)| 00:00:01 |
|   9 |     TABLE ACCESS BY INDEX ROWID   | T1		        |  2278 | 56950 |	 1   (0)| 00:00:01 |
|* 10 |      INDEX RANGE SCAN		      | I_T1_TABLE_NAME |  2278 |	    |	 1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------

   7 - access("B"."TABLE_NAME"=:B1)
   8 - access("A"."TABLE_NAME"="ITEM_2" AND "A"."OWNER"="ITEM_1")
       filter("A"."OWNER"="ITEM_1" AND "A"."TABLE_NAME"="ITEM_2")
  10 - access("A"."TABLE_NAME"=:B1)

Note
-----
   - SQL plan baseline "SQL_PLAN_gt8npsxnncgm2abc84fa9" used for this statement


Statistics
----------------------------------------------------------
	 26  recursive calls
	 62  db block gets
	231  consistent gets
	  4  physical reads
  14528  redo size
	598  bytes sent via SQL*Net to client
	453  bytes received via SQL*Net from client
	  2  SQL*Net roundtrips to/from client
	  2  sorts (memory)
	  0  sorts (disk)
	  1  rows processed

Note that plan didn’t change, but we see that consistent gets (buffer_gets/exec) has increased from previous value of 157 to 231.
Physical reads are more or less same (just increase of 1).

Lets check 2nd query doing FTS

Query 2:

select count(1) from T2 where owner = :b2;


DEO>set  autotrace on
DEO>select count(1) from T2 where owner = :b2;

  COUNT(1)
----------
      9600

Execution Plan
----------------------------------------------------------
Plan hash value: 3321871023

---------------------------------------------------------------------------
| Id  | Operation	       | Name | Rows  | Bytes | Cost (%CPU)| Time	  |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |	  |	1     |	5     |  1926K  (1)| 02:10:54 |
|   1 |  SORT AGGREGATE    |	  |	1     |	5     |	           |	      |
|*  2 |   TABLE ACCESS FULL| T2   |  9333K|    44M|  1926K  (1)| 02:10:54 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter("OWNER"=:B2)

Note
-----
   - SQL plan baseline "SQL_PLAN_9wq67xmrafzda1c6cf506" used for this statement


Statistics
----------------------------------------------------------
	  1  recursive calls
	  0  db block gets
   8195  consistent gets
   4342  physical reads
	  0  redo size
	515  bytes sent via SQL*Net to client
	453  bytes received via SQL*Net from client
	  2  SQL*Net roundtrips to/from client
	  0  sorts (memory)
	  0  sorts (disk)
	  1  rows processed


Huge increase in consistent gets compared to previous run. Previous run showed consistent gets as 4350 when stats were accurate. With increased stats, consistent gets became 8195
Also, there was no disk reads previously may be because all blocks were in buffer. But with changed object level stats, we are seeing 4342 disk reads.

So always make sure you have latest accurate statistics available for your tables and indexes in your database. You may have right plans, but having bad object stats can cause Oracle to work more.

Hope this helps !!

Tracing Single SQL in Oracle

Many times, while doing SQL tuning, we want to trace (event 10046) single SQL in database. Instead of going for module level tracing or session level using DBMS_MONITOR, we can simply use below alter system command to trace specific SQL

Example: I have a table T1 and index T_I_TABLE_NAME on that table.
I am running following SQL and I want to trace on this SQL

select * from T1 where table_name = 'SINGLE_PRODUCT_GROUPS';

I can just find out the SQL ID of above SQL

select sql_id, sql_text from v$sql where sql_text like '%SINGLE_PRODUCT_GROUPS%'

SQL_ID	      SQL_TEXT
------------- --------------------------------------------------------------------------------
8kybysnu4nn34 select * from T1 where table_name = 'SINGLE_PRODUCT_GROUPS'

Once I have the SQL ID, I can use below alter system to trace this SQL


alter system set events 'sql_trace[SQL:8kybysnu4nn34] plan_stat=all_executions,wait=true,bind=true';

Note that even though this is alter system, other SQL IDs run will not have any effect on tracing because we are tracing only specific SQL ID. So unless SQL ID 8kybysnu4nn34 is run, it will not generate any trace file.

Once SQL ID is run and trace is generated, you can turn off tracing using following statement


alter system set events 'sql_trace[SQL:8kybysnu4nn34] off';

This might generate multiple trace files as multiple sessions might run same SQL ID (depending on the application).

Hope this helps !!!

direct path read behavior in Oracle 11.2

Prior to 11g, whenever optimizer goes for full table scan, Oracle used to show “db file scattered read” as wait event. But starting from 11g, a full table scan can show (depending on certain conditions) “direct path read” wait event.

db file scattered read – happens when blocks for a table is read from datafile into buffer cache in SGA

direct path read – happens when blocks for a table is read from datafile into PGA

Problem with fetching data in db cache was, if table is big enough it will use many blocks in db cache causing performance issue. And if multiple SQLs are doing FTS (full table scan) then every session will try to use those blocks in buffer cache by acquiring the latch and can cause “cache buffer chain: latch” contention. This will spike CPU on the host.

With direct path read, block read from datafile goes to PGA and every session has its own chunk of memory allocated in PGA. So multiple sessions doing FTS on same table does not make situation as bad as it used to be with “db file scattered read”

There are certain conditions based on which oracle decides whether to go for direct path read or use buffer cache.
In the reference section, I have mentioned many good blog articles which discusses one or more of these behaviors.

_small_table_threshold

_small_table_threshold : This is a hidden parameter which helps server process decide whether to go for direct path read or buffer cache. Unit of this parameter is number of blocks and default value is 2% of buffer cache size.

DEO>@param
Enter Parameter name: _small_table

Parameter			                  Description						                           Session Va Instance V
------------------------------------- ------------------------------------------------------------ ---------- ----------
_small_table_threshold		          lower threshold level of table size for direct reads	       27107      27107

In my case this is set to 27107 blocks which is approximately 2% of my db_cache_size blocks (you need to convert db_cache_size in terms of Oracle blocks and take 2% of that value to arrive at default value).

If size of table (blocks below high water mark) is > _small_table_threshold then Oracle chooses to go for direct path read.
In many blog articles, its stated that if size of table is > 5 times _small_table_threshold then it goes for direct path read.

During my testing, I was not able to verify 5 times _small_table_threshold. Because as soon as table size if more than _small_table_threshold, it goes for direct path read. I have checked this behavior in 11.2.0.3 as well as in 11.2.0.4

Lets test the dependency of direct path read on this hidden parameter for partitioned as well as non-partitioned table

Non-Partitoned table

My Oracle version is 11.2.0.4. I have a non-partitioned table T1 created from DBA_OBJECTS. You can create any test table of significant size using any view in database or by using “CONNECT BY rownum<=” clause from dual. Gather stats on table.


DEO>select table_name, num_rows, blocks from dba_tables where table_name = 'T1';

TABLE_NAME			           NUM_ROWS   BLOCKS
------------------------------ ---------- ----------
T1				               91200	    3214

My table has 3214 blocks.

Setting _small_table_threshold to 5000


DEO>alter session set "_small_table_threshold" = 5000;

Session altered.

Since table size is < _small_table_threshold, it should not go for direct path read. Instead we should see blocks going into db cache.
We can check which object’s blocks are placed in buffer cache using X$KCBOQH table. So when full table scan goes into db cache, we should be able to see the same using this table.

Lets flush the buffer cache to make sure nothing is present in buffer cache.


DEO>alter system flush buffer_cache;

System altered.

DEO>select count(*) from T1;

  COUNT(*)
----------
     91200

DEO>select obj#, sum(num_buf) from X$KCBOQH where obj# = (select object_id from dba_objects where object_name = 'T1') group by obj#;

      OBJ# SUM(NUM_BUF)
---------- ------------
     93840	   3209

As shown above FTS has read all buffers in db cache.

Now, lets reduce _small_table_threshold to 3000 and flush buffer cache again.

DEO>alter session set "_small_table_threshold"=3000;

Session altered.

DEO>alter system flush buffer_cache;

System altered.

Since table size > _small_table_threshold, we should see direct path read.
We can verify if a session has done direct path read using statistics “physical reads direct” from v$sesstat
Following query shows the same. In this case my SID is 6531.

DEO>r
  1* select a.sid, b.name, a.value from v$sesstat a, v$statname b where a.statistic# = b.statistic# and a.statistic# in (97) and a.sid = 6531

       SID NAME 								                            VALUE
---------- ---------------------------------------------------------------- ----------
      6531 physical reads direct						                    2

So current value of direct path read stats is 2 before reading from table. Lets read data from table


DEO>select count(*) from T1;

  COUNT(*)
----------
     91200

DEO>select a.sid, b.name, a.value from v$sesstat a, v$statname b where a.statistic# = b.statistic# and a.statistic# in (97) and a.sid = 6531;

       SID NAME 								                            VALUE
---------- ---------------------------------------------------------------- ----------
      6531 physical reads direct						                    3210

We see that if table size is more than _small_table_threshold it has gone for direct path read.
We can also confirm if db cache has anything related to our table T1. Remember than we flushed buffer cache before running last query on T1 table.

DEO>select obj#, sum(num_buf) from X$KCBOQH where obj# = (select object_id from dba_objects where object_name = 'T1') group by obj#;

      OBJ# SUM(NUM_BUF)
---------- ------------
     93840	      1

We can see 1 block in db cache which is the header block. Header block is always read in buffer cache.

Partitioned Table

In case of partitioned table same behavior was observed but _small_table_threshold is compared with partitions we are selecting instead of entire table.

I have created similar partitioned table with 4 partitions


CREATE TABLE T1_PART
(
    col1 NUMBER,
    col2 VARCHAR2(100)
 )
 PARTITION BY LIST (col1)
 (
    PARTITION PART_1  VALUES (1)
  , PARTITION PART_2  VALUES (2)
  , PARTITION PART_3  VALUES (3)
  , PARTITION PART_4  VALUES (4)
 ) ;

INSERT INTO T1_PART (SELECT mod(rownum,4)+1, rpad('X',100,'X') FROM dual CONNECT BY rownum<=896000);

DEO>select table_name, partition_name, num_rows, blocks, avg_row_len from user_tab_partitions where table_name='T1_PART';

TABLE_NAME		       PARTITION_NAME			              NUM_ROWS   BLOCKS     AVG_ROW_LEN
-------------------    -------------------------------------- ---------- ---------- -----------
T1_PART 		       PART_1				                  128000      1948	    104
T1_PART 		       PART_2				                  128000      1948	    104
T1_PART 		       PART_3				                  128000      1948	    104
T1_PART 		       PART_4				                  128000      1948	    104

4 rows selected.

Each partition is 1948 blocks.
Lets change _small_table_threshold to 1800 and select values from single partition. Since partition size we are selecting > _small_table_threshold, it should go for direct path read


DEO>alter session set "_small_table_threshold"=1800;

Session altered.

DEO>select count(1) from T1_PART where col1 = 1;

  COUNT(1)
----------
    128000

DEO>select obj#, sum(num_buf) from X$KCBOQH where obj# = (select object_id from dba_objects where object_name = 'T1_PART' and SUBOBJECT_NAME = 'PART_1') group by obj#;

      OBJ# SUM(NUM_BUF)
---------- ------------
     93845	      1

DEO>select a.sid, b.name, a.value from v$sesstat a, v$statname b where a.statistic# = b.statistic# and a.statistic# in (97) and a.sid = 6531;

       SID NAME 								                            VALUE
---------- ---------------------------------------------------------------- ----------
      6531 physical reads direct						                    5166

We are seeing “physical reads direct” stats value as 5166. This is a cumulative value. Previous value was 3210.
So it has read 5166 – 3210 = 1956 blocks which is approx same as number of blocks for 1 partition.

Also, we are see 1 block is read in buffer cache which is header block

To verify that _small_table_threshold is checked at partitions we are selecting and not at table level, lets increase _small_table_threshold to 3600 and continue to select single partition


DEO>alter session set "_small_table_threshold"=3600;

Session altered.

DEO>select count(1) from T1_PART where col1 = 1;

  COUNT(1)
----------
    128000

DEO>select a.sid, b.name, a.value from v$sesstat a, v$statname b where a.statistic# = b.statistic# and a.statistic# in (97) and a.sid = 6531;

       SID NAME 								                            VALUE
---------- ---------------------------------------------------------------- ----------
      6531 physical reads direct						                    5168

DEO>select obj#, sum(num_buf) from X$KCBOQH where obj# = (select object_id from dba_objects where object_name = 'T1_PART' and SUBOBJECT_NAME = 'PART_1') group by obj#;

      OBJ# SUM(NUM_BUF)
---------- ------------
     93845	   1949

So no increase in “physical reads direct”, but we can see blocks are read in buffer as sum(NUM_BUF) is showing same as number of blocks for that partition.

If we select 2 partitions, it will again go for direct path read because sum of blocks of 2 partitions > _small_table_threshold (3600)


DEO>select count(1) from T1_PART where col1 in (1,2);

  COUNT(1)
----------
    256000

DEO>select a.sid, b.name, a.value from v$sesstat a, v$statname b where a.statistic# = b.statistic# and a.statistic# in (97) and a.sid = 6531;

       SID NAME 								                            VALUE
---------- ---------------------------------------------------------------- ----------
      6531 physical reads direct						                    9064

Both partitions went for direct path read even though 1 partition was in buffer cache.
Above stat value if cumulative. It has increased by (9064 – 5168) 3896 blocks which is size of 2 partitions.

We can change this behavior using _direct_read_decision_statistics_driven parameter. Check later in this article.

What happens when the blocks are cached in buffer cache ?

If we have atleast 50% of blocks cached in buffer cache, Oracle will choose to use db cache instead of direct path read when using FTS. ALEX FATKULIN has published a very nice article http://afatkulin.blogspot.ie/2009/01/11g-adaptive-direct-path-reads-what-is.html about this behavior and he has provided a function to test this behavior.

But I am seeing different behavior in my instance. In my tests, Oracle is not going for db cache unless 98% blocks are in buffer cache.

In my case for table T1, we can select half the records from table to have approx 50% blocks cached in buffer cache and check the behavior.

Non-Partitioned Table

Lets increased _small_table_threshold to 5000 so that Oracle does not go for direct path read and read approx half the blocks in buffer cache. We are also flushing buffer cache to make sure nothing is present.


DEO>alter system flush buffer_cache;

System altered.

DEO>alter session set "_small_table_threshold"=5000;

Session altered.

DEO>select count(1) from T1 where rownum < 50000;

  COUNT(1)
----------
     49999

DEO>select obj#, sum(num_buf) from X$KCBOQH where obj# = (select object_id from dba_objects where object_name = 'T1') group by obj#;

      OBJ# SUM(NUM_BUF)
---------- ------------
     93840	   1753

DEO>select a.sid, b.name, a.value from v$sesstat a, v$statname b where a.statistic# = b.statistic# and a.statistic# in (97) and a.sid = 6531;

       SID NAME 								                            VALUE
---------- ---------------------------------------------------------------- ----------
      6531 physical reads direct						                     5168

From above output we can see that more than 50% blocks are cached in buffer cache.
Now, lets change _small_table_threshold to 3000 again to check if FTS goes for direct path read or buffer cache.


DEO>alter session set "_small_table_threshold"=3000;

Session altered.

DEO>select count(1) from T1;

  COUNT(1)
----------
     91200

DEO>select obj#, sum(num_buf) from X$KCBOQH where obj# = (select object_id from dba_objects where object_name = 'T1') group by obj#;

      OBJ# SUM(NUM_BUF)
---------- ------------
     93840	   1753

DEO>select a.sid, b.name, a.value from v$sesstat a, v$statname b where a.statistic# = b.statistic# and a.statistic# in (97) and a.sid = 6531;

       SID NAME 								                            VALUE
---------- ---------------------------------------------------------------- ----------
      6531 physical reads direct					                        8376

So even when we had more than 50% of blocks in buffer cache, its still going for direct path read. SUM(NUM_BUF) didnt change from 1753 to 3209 (all blocks on T1 table).
I increased number of blocks cached in buffer cache to 90% and it will still going for direct path read.

Only when approx 98% of blocks are cached then Oracle goes for db cache. I validated same behavior in 11.2.0.3 as well.

Partitioned Table

I repeated the same process for partitioned table and found same behavior. In this case Oracle checks for each partition instead of table.

Each partition has 1948 blocks. So lets put 90% of blocks in db cache and check behavior.
Lets increase _small_table_threshold to 3000 deliberately so that Oracle uses db cache.


DEO>alter session set "_small_table_threshold"=3000;

Session altered.

DEO>alter system flush buffer_cache;

System altered.

DEO>select obj#, sum(num_buf) from X$KCBOQH where obj# = (select object_id from dba_objects where object_name = 'T1_PART' and SUBOBJECT_NAME = 'PART_1') group by obj#;

no rows selected

DEO>select a.sid, b.name, a.value from v$sesstat a, v$statname b where a.statistic# = b.statistic# and a.statistic# in (97) and a.sid = 6531;

       SID NAME 								                            VALUE
---------- ---------------------------------------------------------------- ----------
      6531 physical reads direct						                    2

DEO>--selecting 90% of records from PART_1
DEO>select count(*) from T1_PART where col1 = 1 and rownum < (128000*0.9);

  COUNT(*)
----------
    115199

DEO>select a.sid, b.name, a.value from v$sesstat a, v$statname b where a.statistic# = b.statistic# and a.statistic# in (97) and a.sid = 6531;

       SID NAME 								                            VALUE
---------- ---------------------------------------------------------------- ----------
      6531 physical reads direct						                    2

DEO>select obj#, sum(num_buf) from X$KCBOQH where obj# = (select object_id from dba_objects where object_name = 'T1_PART' and SUBOBJECT_NAME = 'PART_1') group by obj#;

      OBJ# SUM(NUM_BUF)
---------- ------------
     93845	   1753

We have 90% of blocks of partition PART_1 in buffer cache.
Now we can reduce _small_table_threshold to 1800 and try selecting.


DEO>alter session set "_small_table_threshold"=1800;

Session altered.

DEO>--selecting complete PART_1
DEO>select count(*) from T1_PART where col1 = 1;

  COUNT(*)
----------
    128000

DEO>select obj#, sum(num_buf) from X$KCBOQH where obj# = (select object_id from dba_objects where object_name = 'T1_PART' and SUBOBJECT_NAME = 'PART_1') group by obj#;

      OBJ# SUM(NUM_BUF)
---------- ------------
     93845	   1753

DEO>select a.sid, b.name, a.value from v$sesstat a, v$statname b where a.statistic# = b.statistic# and a.statistic# in (97) and a.sid = 6531;

       SID NAME 								                            VALUE
---------- ---------------------------------------------------------------- ----------
      6531 physical reads direct						                    1950

DEO>

As seen above, even with 90% of blocks in db cache, it goes for direct path read.
Lets repeat the same by putting 98% of blocks


DEO>alter session set "_small_table_threshold"=3000;

Session altered.

DEO>alter system flush buffer_cache;

System altered.

DEO>select count(*) from T1_PART where col1 = 1 and rownum < (128000*0.98);

  COUNT(*)
----------
    125439

DEO>select obj#, sum(num_buf) from X$KCBOQH where obj# = (select object_id from dba_objects where object_name = 'T1_PART' and SUBOBJECT_NAME = 'PART_1') group by obj#;

      OBJ# SUM(NUM_BUF)
---------- ------------
     93845	   1910

DEO>alter session set "_small_table_threshold"=1800;

Session altered.

DEO>select count(*) from T1_PART where col1 = 1;

  COUNT(*)
----------
    128000

DEO>select obj#, sum(num_buf) from X$KCBOQH where obj# = (select object_id from dba_objects where object_name = 'T1_PART' and SUBOBJECT_NAME = 'PART_1') group by obj#;

      OBJ# SUM(NUM_BUF)
---------- ------------
     93845	   1949

DEO>select a.sid, b.name, a.value from v$sesstat a, v$statname b where a.statistic# = b.statistic# and a.statistic# in (97) and a.sid = 6531;

       SID NAME 								                            VALUE
---------- ---------------------------------------------------------------- ----------
      6531 physical reads direct						                     1950

So with 98% of rows/blocks in db cache, Oracle does NOT go for direct path read.

What if we have 1 partition in cache and we select 2 partitions. Will it go for direct path read ?

I have kept _small_table_threshold=3000 which is more than 1 partition size but less than 2 partitions size


DEO>alter session set "_small_table_threshold"=3000;

Session altered.

DEO>select count(*) from T1_PART where col1 = 1;

  COUNT(*)
----------
    128000

DEO>select obj#, sum(num_buf) from X$KCBOQH where obj# in (select object_id from dba_objects where object_name = 'T1_PART' and SUBOBJECT_NAME in ('PART_1','PART_2')) group by obj#;

      OBJ# SUM(NUM_BUF)
---------- ------------
     93845	   1949

We have above PART_1 in db cache and nothing on PART_2. Lets select 2 partitions


DEO>select count(*) from T1_PART where col1 in (1,2);

  COUNT(*)
----------
    256000

DEO>select obj#, sum(num_buf) from X$KCBOQH where obj# in (select object_id from dba_objects where object_name = 'T1_PART' and SUBOBJECT_NAME in ('PART_1','PART_2')) group by obj#;

      OBJ# SUM(NUM_BUF)
---------- ------------
     93845	   1949
     93846	      1

DEO>select a.sid, b.name, a.value from v$sesstat a, v$statname b where a.statistic# = b.statistic# and a.statistic# in (97) and a.sid = 6531;

       SID NAME 								                            VALUE
---------- ---------------------------------------------------------------- ----------
      6531 physical reads direct						                    5848

So even when 1 partition was in db cache, if we select 2 partitions, it will go for direct path read for both partitions.
Above stats value of 5848 is cumulative value from previous value of 1950. Difference comes out to be 3898 which is size in blocks of 2 partitions.

_direct_read_decision_statistics_driven

This parameter mainly affect the behavior of partition table.

We have seen above that if we have a partition cached in db cache and if we select 2 partitions, both partitions will go for direct path read. This happens when _direct_read_decision_statistics_driven is set to true (default).

When _direct_read_decision_statistics_driven parameter is true, Oracle uses table statistics to decide whether to go for direct path read or db cache (using db file scattered read).
When _direct_read_decision_statistics_driven parameter is false, Oracle uses the segment header block (the one which always gets selected in db cache) to decide whehter to go for direct path read or db cache. It means that value of _small_table_threshold will be compared with every segment(partition) and not as a whole with partitions we are selecting.

In all the above excercise _direct_read_decision_statistics_driven was set to true. So Oracle server process used table stats to decide the fetch method.

Let set _direct_read_decision_statistics_driven to false to check behavior.


DEO>alter session set "_direct_read_decision_statistics_driven"=false;

Session altered.

We have 1 partition cached entirely in db cache.


DEO>select obj#, sum(num_buf) from X$KCBOQH where obj# in (select object_id from dba_objects where object_name = 'T1_PART' and SUBOBJECT_NAME in ('PART_1','PART_2')) group by obj#;

      OBJ# SUM(NUM_BUF)
---------- ------------
     93845	   1949
     93846	      1

Current value of physical reads direct stats is as below and _small_table_threshold is set to 1800 which is less than size of single partition


DEO>select a.sid, b.name, a.value from v$sesstat a, v$statname b where a.statistic# = b.statistic# and a.statistic# in (97) and a.sid = 6531;

       SID NAME 								                            VALUE
---------- ---------------------------------------------------------------- ----------
      6531 physical reads direct						                    3896

DEO>alter session set "_small_table_threshold" = 1800;

Session altered.

Lets select 2 partitions


DEO>select count(1) from T1_PART where col1 in (1,2);

  COUNT(1)
----------
    256000

1 row selected.

DEO>select a.sid, b.name, a.value from v$sesstat a, v$statname b where a.statistic# = b.statistic# and a.statistic# in (97) and a.sid = 6531;

       SID NAME 								                            VALUE
---------- ---------------------------------------------------------------- ----------
      6531 physical reads direct						                    5846

1 row selected.

DEO>select obj#, sum(num_buf) from X$KCBOQH where obj# in (select object_id from dba_objects where object_name = 'T1_PART' and SUBOBJECT_NAME in ('PART_1','PART_2')) group by obj#;

      OBJ# SUM(NUM_BUF)
---------- ------------
     93845	   1949
     93846	      1

2 rows selected.

If we check carefully, we have direct path read done for only 1 partition (5846 – 3896 = 1950). 1950 is size of single partition.
So Oracle has gone for direct path read for 2nd partition and it has used blocks cached in buffer cache for 1st partition.
Since we have set _direct_read_decision_statistics_driven to false, Oracle reads header block of individual segment to decide fetch method to be used for that segment.
Since 1st segment PART_1 was present entirely in db cache, it decided to use that. PART_2 was not present in db cache, only header block was present so it compared PART_2 size with _small_table_threshold and since PART_2 size > _small_table_threshold, it went with direct path read.

In above exercise (_direct_read_decision_statistics_driven = true), when we set _small_table_threshold to 3600 for partition table and tried selecting 2 partitions, both went for direct path read. Now if we do that, both partitions will go to db cache because size of individual partitions (from header block of each segment) will be compared individually and not as sum of blocks we are selecting.

_serial_direct_read

If you want to disable direct path read on FTS, you can make use of _serial_direct_read parameter. Many DBA are increasing value of _small_table_threshold to very high value (higher than any table in database) and prevents direct path reads. But Oracle has provided this hidden parameter _serial_direct_read, which if set to false will disable direct path reads.

Default value of this parameter is true, which means allow direct path reads.

You can also set event 10949 which can also be used to disable direct path reads. Setting this event is like setting _serial_direct_read=false.

alter session set events '10949 trace name context forever, level 1';

Hope this helps !!

References

https://dioncho.wordpress.com/2009/07/21/disabling-direct-path-read-for-the-serial-full-table-scan-11g/
http://afatkulin.blogspot.ie/2009/01/11g-adaptive-direct-path-reads-what-is.html

Optimizer statistics-driven direct path read decision for full table scans (_direct_read_decision_statistics_driven)

http://oracle-tech.blogspot.com.tr/2014/04/smalltablethreshold-and-direct-path.html
http://oracle-tech.blogspot.com.tr/2014/04/directreaddecisionstatistcsdriven.html

Oracle Performance Tuning – ASH Basics

Recently I got an opportunity to dig into performance of database. We were not facing any performance issue, but we wanted to understand whats happening under the hood. What are the bottlenecks and how do we prepare and scale our database further.

In future the load on the databases are going to be crazy and we want to make sure that we look into every consumer of our database and optimize the usage.

I went back into the basics of database performance tuning – ASH. Active session history is the beast which has loads of information and provides information at very granular level – “session”. We can dig into ASH data and generate different reports rolled-up against various entities like module, action, sql_id, user_id etc. We can determine how much each of these entities are using database resources.

DB Time:

One of the most fundamental measure of database performance is DB Time. This is the amount of time a session has spend on database activities. This is a combination of CPU Time + non-Idle wait times. This does not consider any idle wait time so even if session is done with its activity but just sitting idle doing nothing, that idle time is not considered part of DB Time.

This gives a very accurate measure of amount of database resources a session is burning.

Active Session History:

ASH or Active Session History provides snapshots of active sessions connected to database (status=ACTIVE in v$session) and loads of other information related to those sessions about what they were doing when snapshot was taken. To understand how we can extract this information and make sense about it, we need to understand how ASH is build and what that information means.

V$ACTIVE_SESSION_HISTORY

Active session history is exposed by database using a view V$ACTIVE_SESSION_HISTORY. This view stores samples taken every 1 sec. A background process MMNL (Memory manager lite) takes a snapshot from v$session and v$session_wait about the current state of active sessions and dumps the information in V$ACTIVE_SESSION_HISTORY. Note that only active sessions’s snapshot is taken.

V$ACTIVE_SESSION_HISTORY is a memory view stored in the form of circular buffer. It means that this view has size limit on its growth (approx 2MB / CPU). So when it reaches to that size it starts overwriting the oldest records in circular buffer. You can visualize the same using following figure

 

ash-architecture-and-advanced-usage-rmoug2014-7-638

 

 

 

 

 

 

 

 

So how these samples are useful to us in finding DB Time ?

Lets say a session has spend 10 sec of active time in database. It means DB time for that session is 10 sec. During this 10 sec of active time, we will have 10 samples of that session taken as interval time between snapshot is 1 sec. So if we just take a count(*) from V$ACTIVE_SESSION_HISTORY, we will see 10 records for that session. This means that to get DB Time of a session  we just have to take count(*) from V$ACTIVE_SESSION_HISTORY for that session.

Its little difficult to adjust to the fact that time parameter can be obtained using count(*) from a view. But this is because interval time between samples is 1 sec.

So DB Time is actually – [ Number of samples ] X [ Time interval between samples ]

Since time interval between samples is 1 sec, we can get DB time as number of samples which is count(*) from V$ACTIVE_SESSION_HISTORY

Again, calculating DB Time using count(*) from ASH is approximation as sessions which are taking fraction of times < 1 sec will not be counted if they happens between 2 samples and DB time of other sessions which ends just before next samples is taken will be counted until last samples when they were active. But those were very few cases and amount of DB time not measure in such cases will be very less compared to total DB time we are measuring. You can visualize the same using following figure.

 

Aas_ashDBA_HIST_ACTIVE_SESS_HISTORY

DBA_HIST_ACTIVE_SESS_HISTORY is a history table which maintains history of active sessions between AWR snapshots. Since V$ACTIVE_SESSION_HISTORY is a memory view and has limited space for growth, data from V$ACTIVE_SESSION_HISTORY flows to DBA_HIST_ACTIVE_SESS_HISTORY table.

DBA_HIST_ACTIVE_SESS_HISTORY table enables us to view active session data from the past. Data in DBA_HIST_ACTIVE_SESS_HISTORY is retained based on retention level set for AWR snapshots (30 days by default). So every time AWR snapshot happens, data from V$ACTIVE_SESSION_HISTORY goes into this table.

But NOT all data from V$ACTIVE_SESSION_HISTORY goes into DBA_HIST_ACTIVE_SESS_HISTORY. When AWR snapshot happens, it picks only 1 out of 10 samples from V$ACTIVE_SESSION_HISTORY into history table.

Complete visualization of active sessions looks like below

ash-architecture-and-advanced-usage-rmoug2014-7-638

 

 

 

 

 

 

 

 

Since AWR snapshot interval is 1 hour, it will take AWR snapshots every 1 hour and during that time 1 out of 10 samples form V$ACTIVE_SESSION_HISTORY will move to DBA_HIST_ACTIVE_SESS_HISTORY table.

So if want to calculate DB Time from DBA_HIST_ACTIVE_SESS_HISTORY table, we need to consider sample time as 10 sec instead of 1 sec.

DB Time = [ Number of samples ] X [ Time interval between samples ]
        = count(1) * 10

That’s why when we are calculating DB Time of a session or SQL from DBA_HIST_ACTIVE_SESS_HISTORY table, we need to take count(*) * 10.

Calculating DB Time

With this brief understanding of ASH, we can use these tables to find out DB time for a session and consolidate sessions to calculate DB Time for module/user etc.

If we can list modules in terms of % of DB time they are consuming in database, modules with high % of DB time are the one using most resources in database and has high impact to database. Same holds true at user level as well. We can find out which user is contributing highest DB time.

I have given couple of queries to find out % of DB Time different modules are contributing in database. This gives a fair idea about module wise resource consumption in database.

Note that I have randomized module names using DBMS_RANDOM.string(‘L’,TRUNC(DBMS_RANDOM.value(10,21))) to avoid exposing actual module name outside.


set line 999
set pagesize 999
set verify off
col module format a30;
BREAK ON REPORT
COLUMN DUMMY HEADING ''
compute sum of PCT_CPU on REPORT
with total_cpu as
(select   count(1)
 from     V$ACTIVE_SESSION_HISTORY
 where    SESSION_TYPE = 'FOREGROUND')
 select   module,
          count(1) "module_cpu",
          (select * from total_cpu) "total_cpu",
          round((count(1)/(select * from total_cpu))*100,2) PCT_CPU
from      V$ACTIVE_SESSION_HISTORY
where     SESSION_TYPE = 'FOREGROUND'
group by module order by PCT_CPU desc;

MODULE                         module_cpu  total_cpu    PCT_CPU
------------------------------ ---------- ---------- ----------
vvadcagpas                        29355     187065      15.69
cbkcbqchts                        26506     187065      14.17
zugkvhbajxpff                     24655     187065      13.18
tvfqhrkqzktiiceolrfd              10046     187065       5.37
paczxkndgsaqt                      6740     187065        3.6
pzqacdgxeqzbtqhalfc                6220     187065       3.33
nvqqscmayx                         5905     187065       3.16
zbirttdyoxxbslopnm                 5412     187065       2.89
bbztiihehigeqwxvk                  5051     187065        2.7
kvosekxfhkmfw                      4687     187065       2.51
aenyopwmok                         3977     187065       2.13
brfqfuwdyhyrej                     3256     187065       1.74
abruxykxhvqgfg                     3227     187065       1.73
zfrahazoqgdxubpni                  2759     187065       1.47
ddxinnevqd                         2633     187065       1.41
lructcsjkxhxatfidpf                2518     187065       1.35
naawhsfqvcaoathn                   2246     187065        1.2
sljzltjhhigepqfmcji                2142     187065       1.15
chvjvntynkjb                       2102     187065       1.12
zlzhwwhzbhfdtsszfedj               2005     187065       1.07
yvkuwzjkjhyeamco                   1610     187065        .86
fdjwspubzyhyxyzzkgyi               1578     187065        .84
risowlcnkizrbpgt                   1557     187065        .83
ywwdycosdpcfsflaubia               1539     187065        .82
pxlqrkmvreppum                     1496     187065         .8
amivlnlivqeesnqdkto                1404     187065        .75
jjaqyosqsc                         1347     187065        .72
yxkgfgktafvx                       1354     187065        .72
gqistslkwdw                        1328     187065        .71
bhuzvdpklocmnyimtzux               1272     187065        .68
uimycabvbjcczibrn                  1053     187065        .56
ksnvqslyker                         997     187065        .53
ofekwpklwpjtha                      975     187065        .52
wmkmovueneno                        904     187065        .48
umdtggtvdptkubfs                    795     187065        .42
dddegpqpwyalotxtoh                  761     187065        .41
pwycwyhavccwjuufqjx                 733     187065        .39
pxvcbhpfmeuwen                      729     187065        .39
nhezqmgkqccdcog                     716     187065        .38
ixtljofuazhpp                       628     187065        .34
mbotmyoairyqbdg                     600     187065        .32
aocabnswozdg                        586     187065        .31
qtdbdbuffprxj                       586     187065        .31
suiwwwtolcnkbnkcfwxw                560     187065         .3
btbdbtaztym                         562     187065         .3
xmvdgeczrot                         567     187065         .3
gkwzuwlgsh                          538     187065        .29
yhgmyqgqkvzmvbpsi                   463     187065        .25
obahkcsiafagmkxxu                   457     187065        .24
tcjoiyhzpuutbfhegn                  448     187065        .24
brlrkszrpmtw                        426     187065        .23
spzvvwzvoh                          435     187065        .23
ckvvarzrtpedhjl                     334     187065        .18
mnamshpswmsif                       329     187065        .18
gwvwypqrpwxkbugms                   293     187065        .16
clkuqearqhguezagwjlc                302     187065        .16
fskwtuctalnxhhu                     306     187065        .16
dyljhijlwyfmjy                      277     187065        .15
eqdifihzda                          255     187065        .14
nkrvwqrldd                          229     187065        .12
jkevokwkwokbnkbx                    231     187065        .12
niatevdadex                         198     187065        .11
qfjiifhiubvstyp                     200     187065        .11
acquoqslwxedulsmekbu                180     187065         .1
hjfimltalvopvuum                    156     187065        .08
bmyyhtzdqebo                        152     187065        .08
tkehpofgfe                          157     187065        .08
qnhgoqgstvq                         131     187065        .07
ddlloytpmyhpzpmk                    125     187065        .07
bvborwlump                          133     187065        .07
juirsmdjuejot                       124     187065        .07
saeoojlzfepjfybpclls                129     187065        .07
ppfdzcozyhxmf                       135     187065        .07
xbrwqdopemjqrhebglzs                115     187065        .06
zrfdsejwxgitsorat                   119     187065        .06
vsmsefmxnkoeb                        97     187065        .05
dvhoxhzayqyhsny                      98     187065        .05
cwjpiesdhpxvjs                      101     187065        .05
bvffncgtiojaj                        98     187065        .05
flfuciblxfzdcbo                      87     187065        .05
asucxewzdruwyfmg                     66     187065        .04
dztczkivnnbuzyuh                     67     187065        .04
hslcljfjmrhduw                       73     187065        .04
lywascpugxxjidx                      74     187065        .04
cawgdisaagltmie                      73     187065        .04
dpqbuvfkef                           68     187065        .04
tjpaushwfftmm                        54     187065        .03
ulgktcqvsazktojskfo                  56     187065        .03
ckoayyqtcxybkukgqvgl                 49     187065        .03
aqmphwbrngbotpxehyhe                 49     187065        .03
asqllnbivusyu                        60     187065        .03
skvrrqoshnirxbud                     59     187065        .03
vobtqhfewje                          58     187065        .03
xurrifyontk                          34     187065        .02
rliqgkwdwao                          42     187065        .02
ziwftblonh                           36     187065        .02
wzybbgmxkx                           30     187065        .02
vhumxxhimauc                         16     187065        .01
vssaefphdkebdx                       26     187065        .01
vjbjocafrwgdv                        11     187065        .01
xgcwcxrlzrjvysanlijw                 14     187065        .01
ztzxyfykhnxozdn                      20     187065        .01
btlbvffybkyjsn                       12     187065        .01
pmmqhsjuqhdaf                        22     187065        .01
kaumdvjqfcprmaeygr                   11     187065        .01
kadliemxdxhtzleqjr                   28     187065        .01
hlebszxkejjgkfbpmjew                 16     187065        .01
atwvjasykmlmldbipswh                 16     187065        .01
                                                     ----------
sum                                                      99.95

Similarly to check DB Time for sessions in past, we need to use DBA_HIST_ACTIVE_SESS_HISTORY table. Following query will provide % DB time used by individual modules calculated as % of total DB time between 2 AWR snapshots

set line 999
set pagesize 999
set verify off
BREAK ON REPORT
COLUMN DUMMY HEADING ''
compute sum of PCT_CPU on REPORT
with total_cpu as
(select   count(1)*10
from      dba_hist_active_sess_history
where     snap_id >= &begin_snap_id
and       snap_id < &end_snap_id
and       SESSION_TYPE = 'FOREGROUND')
select    module,
          count(1)*10 "module_cpu",
          (select * from total_cpu) "total_cpu",
          round((count(1)*10/(select * from total_cpu))*100,2) PCT_CPU
from      dba_hist_active_sess_history
where     snap_id >= &&begin_snap_id
and       snap_id < &&end_snap_id
and       SESSION_TYPE = 'FOREGROUND'
group by module order by PCT_CPU desc

 

 

Note that we have used count(*) * 10 in above query because 1 of 10 samples go to this table. Also, you need to specify begin_snap_id and end_snap_id and module DB Time usage during those snapshots will be calculated.

CPU Time

If we want to ignore waits and only focus on pure CPU used by modules, we can list modules as per CPU time they consume as % of total CPU time of database.

I think DB time is still better metric than CPU time because DB time covers waits as well. Waits slow down database.

We can calculate CPU time of a session by doing the same count(*) from V$ACTIVE_SESSION_HISTORY but we need to count only those samples during which session was actually using CPU. An ACTIVE session could be either using CPU to do some processing or it could be actively waiting on something (example IO when doing db file sequential read). Active session using CPU is represented by session_state = ‘ON CPU’. So we will count only those samples for session where session_state was ON CPU

Following query will give CPU time used by each module as % of total CPU time using V$ACTIVE_SESSION_HISTORY


set line 999
set pagesize 999
set verify off
BREAK ON REPORT
COLUMN DUMMY HEADING ''
compute sum of PCT_CPU on REPORT
with total_cpu as
(select count(1)
from  V$ACTIVE_SESSION_HISTORY
where   SESSION_TYPE != 'BACKGROUND'
and session_state = 'ON CPU')
select  module,
count(1) "module_cpu",
(select * from total_cpu) "total_cpu",
round((count(1)/(select * from total_cpu))*100,2) PCT_CPU
from  V$ACTIVE_SESSION_HISTORY
where SESSION_TYPE != 'BACKGROUND'
and   session_state = 'ON CPU'
group by module order by PCT_CPU desc

We can find similar CPU time for past sessions using DBA_HIST_ACTIVE_SESS_HISTORY table. Following query will provide CPU time used by module as % of total CPU time between 2 AWR snapshots


set line 999
set pagesize 999
set verify off
BREAK ON REPORT
COLUMN DUMMY HEADING ''
compute sum of PCT_CPU on REPORT
with total_cpu as
(select count(1)*10
from  dba_hist_active_sess_history
where   snap_id >= &&begin_snap_id
and   snap_id < &&end_snap_id
and   SESSION_TYPE != 'BACKGROUND'
and session_state = 'ON CPU')
select  module,
count(1)*10 "module_cpu",
(select * from total_cpu) "total_cpu",
round((count(1)*10/(select * from total_cpu))*100,2) PCT_CPU
from  dba_hist_active_sess_history
where   snap_id >= &&begin_snap_id
and   snap_id < &&end_snap_id
and   SESSION_TYPE != 'BACKGROUND'
and   session_state = 'ON CPU'
group by module order by PCT_CPU desc

Note, that in above query from DBA_HIST_ACTIVE_SESS_HISTORY, we have used count(*) * 10 because only 1 of 10 samples goes into this table.

I think for now above information is sufficient to give a basic idea about ASH.

I will be writing more articles on performance tuning using ASH as I dig further.

Hope this helps !!

Redo Behavior

What it used to be

After Oracle 9.2, there has been a significant changes in redo behavior. Before 10g, Oracle used to have a single log buffer for writing redo information which eventually gets written to online redo logs in round robin fashion. Of course if your database is in archive log mode information from online redo logs will flow to archive logs before that online redo log file gets overwritten.

Why changed

Oracle 10g onwards behaviour has changed because of contention. You see whenever a session makes a change, oracle has to write the change vector to redo buffer and for that it has to acquire a latch “redo allocation”. If you have just single log buffer, Oracle will have a single latch to protect that memory area and any session making changes have to acquire this latch to write redo change vector to redo buffer.
Think about an OLTP system with many session doing many transactions at the same time. This can cause serious contention for redo allocation latch and was not a scalable solution.

What changed

So from 10g onwards, Oracle has split log buffer into multiple buffer – these are called public redo strands. In 10g it was controlled by parameter log_parallelism and in 11g its controlled by hidden parameter _log_parallelism_max. Default value for _log_parallelism_max is decided by number of CPU. Minimum value for this parameter is 2 for CPU count 16 or below. For every 16 CPUs _log_parallelism_max increases by 1. So for 64 CPU machine _log_parallelism_max will default to 4.

You can check number of public strands in your database using following SQL

SQL> select
 PTR_KCRF_PVT_STRAND ,
 FIRST_BUF_KCRFA ,
 LAST_BUF_KCRFA ,
 TOTAL_BUFS_KCRFA ,
 STRAND_SIZE_KCRFA ,
 indx
from
 x$kcrfstrand ;
PTR_KCRF_PVT_STR FIRST_BUF_KCRFA LAST_BUF_KCRFA TOTAL_BUFS_KCRFA STRAND_SIZE_KCRFA INDX
---------------- ---------------- ---------------- ---------------- ----------------- ----------
00 00000000726BD000 0000000075166E00 87376 44736512 0
00 0000000075167000 0000000077C10E00 87376 44736512 1
00 0000000077C11000 000000007A6BAE00 87376 44736512 2
00 000000007A6BB000 000000007D164E00 87376 44736512 3
00 000000007D165000 000000007FC0EE00 87376 44736512 4
00 000000007FC0F000 00000000826B8E00 87376 44736512 5

As you can see from the output, I have 6 strands of around 43MB each. So my log_buffer must be 43 * 6 ~= 256 MB

SQL>select value from v$parameter where name = 'log_buffer';
VALUE
--------------------
268419072

You can also see that for 6 public redo strands, we have 6 latches to protect those memory structures.

SQL>select ADDR, LATCH#, CHILD#, NAME from v$latch_children where name like 'redo allocation%' order by child#;
ADDR LATCH# CHILD# NAME
---------------- ---------- ---------- ------------------------------
0000000FE1971430 187 1 redo allocation
0000000FE19714D0 187 2 redo allocation
0000000FE1971570 187 3 redo allocation
0000000FE1971610 187 4 redo allocation
0000000FE19716B0 187 5 redo allocation
0000000FE1971750 187 6 redo allocation
6 rows selected.

Please note that I am not talking about private redo strands in this post. Private redo strands and in memory undo can be configured separately to further reduce the contention and improve performance. I will talk about private redo strands in my next post.

The Myth

One of the myth is that log switch happens whenever online redo log file is 100% full
Well, its not like that. There is an internal algorithm that determines the log switch moment. This also has a very good reason because Oracle research finds that doing the log switch at the last moment could incur performance problems. As a result, after the log switch occurs, the archivers are copying only the actual information from the redo log files. Since the redo logs are not 100% full after the log switch and the archive logs are not blank padded after the copy operation has finished, this results in uneven, smaller files than the original redo log files.

How log switch happens

So there is an internal algorithm which decides when log switch should happen. I dont know if this internal algorithm is publicized anywhere, but based on my understanding and putting togetther many random articles, I think here is what happens.

When the RDBMS switches into a new online redo logfile, all the log buffer redo strand memory is “mapped” to the logfile space. If the logfile is larger than the log buffer then each strand will map/reserve its strand size worth of logfile space, and the remaining logfile space (the “log residue”) is still available.

Following figure shows the situation

1

 

 

 

 

 

 

 

 

 

 

 

 

 

 

If online redo log file size is smaller than log buffer then the whole logfile space is divided/mapped/reserved equally among all the strands, and there is no unreserved space (ie no log residue).

Following figure shows the situation

2

 

 

 

 

 

 

 

 

 

 

 

 

 

 

When any process fills a strand such that all the reserved underlying logfile space for that strand is used, AND there is no log residue, then a log switch is scheduled.

Consider first case where we have residue space available in online redo logs. Now when a strand is full it will look out for another “mapping” space in redo log file equivalent to strand size. If its available it will use it.

3

 

 

 

 

 

 

 

 

 

 

 

 

 

 

This will continue until last set of equivalue strand space available in online redo log is allocated as shown in figure below

4

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Now at this point if one of the strand is full and it doesnt have space available, it will trigger log switch. So if other strands are not full and we have space available in other strands, that space gets wasted. So if you have 4 strand and at the end one strand is full, you will have space wastage equivalent to size of 3 strands (which are not full).

But since online redo log file is big enough wastage of space equivalue to 3 strands doesnt really look big.

Online redo log size – 1GB
Space wastage – 3 strands = 3 * 64MB = 192MB

So archive log size would be around (1GB – 192MB) = 810MB
Around 20% wastage

Consider second case where we dont have any residue space available. With initial allocations, strands are mapped to online redo logs and when one of the strand is full, it will trigger a log switch as there is no space available in online redo log file. So same space got wasted (3 strands), but in this case since online redo log file is small we see considerable wastage of space.

Online redo log size – 256MB
Space wastage – 3 strands = 3 * 64MB = 192MB

So archive log size would be around (256 MB – 192 MB) = 64 MB
Around 75% wastage.

Disadvantages of small redo logs

So you can see if you online redo logs are very small (size near to log buffer), you will end up wasting lot of space in online redo logs causing frequent log switches.

 Suggestions

Some suggestion to reduce log switches and to make use of all space in online redo logs

  1. Make online redo log files atleast 4-6 times log buffer size. This ensures we have enough number of allocations of strands before log switch happens
  2. Check optimum value of _log_parallelism_max. While Oracle decides default based on CPU, I would say its better to have lower value. Lower value will make less number of strands and wastage will reduce.
  3. Check for optimum value of log_buffer. default value is usually 1MB * number of CPU. In my opinion its not good to go beyond 128MB. But it may vary with system and in your system you may find higher value to be better.

References

Oracle Forum – https://forums.oracle.com/thread/2346745

https://nzdba.wordpress.com/2011/12/18/hollow-redo-logs/

Doc ID 1356604.1

Effect of Net Timeout Parameter in DG configuration

One of the parameter we configure in physical standby setup is about how much amount of time LGWR on primary should wait for physical standby to respond.

When changes happens on primary side, those redo changes are shipped on physical standby database. If physical standby database is down or if standby server is not reachable, we need to have some time limit on how much time primary should wait for standby to respond (and then move ahead without try to ship redo changes to standby). This limit is defined by Net Timeout parameter.

You can check definition in Oracle docs for the same – http://docs.oracle.com/cd/E11882_01/server.112/e17023/dbpropref.htm#i101032

"The NetTimeout configurable database property specifies the number of seconds the LGWR waits for Oracle Net Services to respond to a LGWR request. It is used to bypass the long connection timeout in TCP."

One of the issue I was seeing is my DG broker was giving following error

Dataguard Configuration...
  Protection Mode: MaxAvailability
  Databases:
    orcl_b - Primary database
      Error: ORA-16825: multiple errors or warnings, including fast-start failover-related errors or warnings, detected for the database
    orcl_a - (*) Physical standby database
      Warning: ORA-16817: unsynchronized fast-start failover configuration
  (*) Fast-Start Failover target

When I checked database info in verbose mode, I saw following

DGMGRL> show database verbose orcl_a

Database - orcl_a

  Role:            PHYSICAL STANDBY
  Intended State:  APPLY-ON
  Transport Lag:   1 minute 1 second
  Apply Lag:       3 minutes 7 seconds
  Real Time Query: OFF

This means that even when my DB is in MaxAvailbility mode, I still see lag and standby is not getting in synch with primary.

My broker log file (drc<ORACLE_SID>.log in diagnostic_dest location) was showing following error

08/03/2013 07:51:44
Redo transport problem detected: redo transport for database orcl_a has the following error:
  ORA-16198: Timeout incurred on internal channel during remote archival
Data Guard Broker Status Summary:
  Type                        Name                             Severity  Status
  Configuration               FSF                               Warning  ORA-16607
  Primary Database            orcl_b                              Error  ORA-16825
  Physical Standby Database   orcl_a                            Warning  ORA-16817

Oracle error ORA-16198 represent timeout issue that must be happening while contacting standby site.

When I sanity checked standby, everything was fine. So I checked NET Timeout parameter which define the timeout value when primary should be able to contact standby.

I realized that timeout value is very less on my system.

When you do show database verbose <unique name>, it shows you properties

NetTimeout                      = '4'

In my case it was set to 4, which is very low value.
As soon as I set this value to around 10, everything was back to normal.

There is no standard value for this parameter, but usual value should be between 10-30 depending on the network config you have. Basically primary should be able to contact standby within this timelimit and hear back from standby.

Downside for keeping this value higher is, in case if something goes wrong with your standby, your primary will hang for that much time.

So, in my case if I am setting a value of 10 sec for Net Timeout parameter and something goes wrong with standby, my primary database will keep trying to send redo entry to standby for 10 sec and till that time commit wont happen (if I am in MaxAvailability mode).

So we need to balance out the value of this parameter and make sure we set optimum value.

Hope this helps !!

Reference:

http://docs.oracle.com/cd/E11882_01/server.112/e17023/dbpropref.htm#i101032

Instance/service registration with Database listener

I have seen many times DBAs are getting confused with Static registration and dynamic registration of services/instance with listener.
As far back I remember, dynamic registration of services was introduced in Oracle 9i.

In this article, I am going to cover everything about static and dynamic service/instance registration with listener and what does it mean.

Many times you must have seen following error while making remote connection to database
ERROR:
ORA-12514: TNS:listener does not currently know of service requested in connect
Descriptor
Understanding of service/instance registration is essential to fix this issue.

Static Registration of instances/service – How it works

Static Instance Registration:

We all know that listener is a separate process that runs on database server and is responsible for receiving client connection and passing connection to database. After connection is handed over to database, listener is out of the picture.

Question is how does listener know what is the instance name/service name and where he should send the client connection ?
One way to do it is using static registration. This is done by including SID_LIST section in listener.ora configuration file.

Example, my listener.ora file looks like following

 

l_deo_001 =
    (description =
        (address_list =
            (address=(protocol=tcp)(host=deo-server.example.com)(port=1529))))
sid_list_l_deo_001 =
    (sid_list =
        (sid_desc =
            (sid_name = deo)
            (oracle_home = /opt/app/oracle/product/db_1))
    )

 

If you see I have a section called “SID_LIST_<listener_name>” which tells listener that SID name is “deo”.

If we check listener status we will see following

lsnrctl status l_deo_001

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 02-APR-2013 09:17:58

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(address=(protocol=tcp)(host=deo-server.example.com)(port=1529)))
STATUS of the LISTENER
------------------------
Alias                     l_deo_001
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                02-APR-2013 09:02:55
Uptime                    0 days 0 hr. 15 min. 2 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /opt/app/oracle/product/db_1/network/admin/listener.ora
Listener Log File         /opt/app/oracle/diag/tnslsnr/db-deo-r1/l_deo_001/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=deo-server.example.com)(PORT=1529)))
Services Summary...
Service "deo" has 1 instance(s).
  Instance "deo", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

“Instance “deo”, status UNKNOWN” meaning that instance DEO is registered with listener l_deo_001.

Why status is unknown is because listener does not know whether the instance is really up and available or not. There is no mechanism to guarantee that the specified instance even exists. The listener just assumes that the instance will be there when a connect request is received.

My tnsping is as below

(DESCRIPTION = (ENABLE = BROKEN) (LOAD_BALANCE = YES) (ADDRESS = (PROTOCOL = TCP)(HOST = deo-server.example.com)(PORT = 1529)) (CONNECT_DATA = (SID = deo)))

So I have “SID = deo” in my connect string. And since this SID/instance is registered with my listener, my connections goes fine with database (Offcource using correct host and port is required).

Static Service Registration:

Same is the case with service. Example if I have following TNS alias which is using SERVICE_NAME in CONNECT_DATA

(DESCRIPTION = (ENABLE = BROKEN) (LOAD_BALANCE = YES) (ADDRESS = (PROTOCOL = TCP)(HOST = deo-server.example.com)(PORT = 1529)) (CONNECT_DATA = (SERVICE_NAME = adg_deo)))

Now if you want to connect to database using above TNS alias which has SERVICE_NAME, you need to have this service registered in listener port

Way to do that using static registration is to use GLOBAL_DBNAME = <service_name> in listener.ora parameter as shown below and bounce listener for this to take affect

l_deo_001 =
    (description =
        (address_list =
            (address=(protocol=tcp)(host=deo-server.example.com)(port=1529))))

sid_list_l_deo_001 =
    (sid_list =
        (sid_desc =
            (sid_name = deo)
            (oracle_home = /opt/app/oracle/product/db_1))
    (sid_desc =
            (GLOBAL_DBNAME = adg_deo.example)
            (sid_name = deo)
            (oracle_home = /opt/app/oracle/product/db_1))
    )

 

After that listener will show adg_deo service

lsnrctl start l_deo_001

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 02-APR-2013 10:24:10

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Starting /opt/app/oracle/product/db_1/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 11.2.0.3.0 - Production
System parameter file is /opt/app/oracle/product/db_1/network/admin/listener.ora
Log messages written to /opt/app/oracle/diag/tnslsnr/db-deo-r1/l_deo_001/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=deo-server.example.com)(PORT=1529)))

Connecting to (DESCRIPTION=(address=(protocol=tcp)(host=deo-server.example.com)(port=1529)))
STATUS of the LISTENER
------------------------
Alias                     l_deo_001
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                02-APR-2013 10:24:10
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /opt/app/oracle/product/db_1/network/admin/listener.ora
Listener Log File         /opt/app/oracle/diag/tnslsnr/db-deo-r1/l_deo_001/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=deo-server.example.com)(PORT=1529)))
Services Summary...
Service "adg_deo.example" has 1 instance(s).
  Instance "deo", status UNKNOWN, has 1 handler(s) for this service...
Service "deo" has 1 instance(s).
  Instance "deo", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

 

After that you will be able to connect using SERVICE_NAME = adg_deo.example

Remember that in static registration you need to have instance name in SID_LIST section in listener.ora as specified by (sid_name = deo). If you have some service_name configured in TNS alias, you need to make sure that in case of static registration those service_names should be part of listener.ora file as specified by (GLOBAL_DBNAME = adg_deo.example).

Now, if I remove SID_LIST section from listener.ora, listener still works. Here is the change

My listener.ora now looks as below

l_deo_001 =
    (description =
        (address_list =
            (address=(protocol=tcp)(host=deo-server.example.com)(port=1529))))

 

If I check listener status
lsnrctl start l_deo_001

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 02-APR-2013 09:01:25

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Starting /opt/app/oracle/product/db_1/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 11.2.0.3.0 - Production
System parameter file is /opt/app/oracle/product/db_1/network/admin/listener.ora
Log messages written to /opt/app/oracle/diag/tnslsnr/db-deo-r1/l_deo_001/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=deo-server.example.com)(PORT=1529)))

Connecting to (DESCRIPTION=(address=(protocol=tcp)(host=deo-server.example.com)(port=1529)))
STATUS of the LISTENER
------------------------
Alias                     l_deo_001
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                02-APR-2013 09:01:25
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /opt/app/oracle/product/db_1/network/admin/listener.ora
Listener Log File         /opt/app/oracle/diag/tnslsnr/db-deo-r1/l_deo_001/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=deo-server.example.com)(PORT=1529)))
The listener supports no services
The command completed successfully

It says “The listener supports no services”. If you try connecting now, you will hit following error

 

ERROR:
ORA-12514: TNS:listener does not currently know of service requested in connect
Descriptor

Why? because your instance is not registered with listener. So l_deo_001 is just a standalone listener process running on the host with no instances registered with it. It doesnt transfer connections to any database. So connecting to “deo” database using this listener will not work.

Dynamic Registration of instances/service – How it works

Dynamic Instance Registration:

Do we need to have SID_LIST always in listener.ora file ? The answer is NO. This requirement is gone since Oracle 9i when dynamic registration was introduced. In dynamic registration database automatically register the instances/services to listener ports.

Then how does instance gets registered with listener ?

To answer that, I have to explain the concept of local_listener parameter.

If you are using default port (1521) for listener, then database will automatically register the instance with listener. To test this out, lets start another listener which will have default name and port (1521)

lsnrctl start

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 02-APR-2013 09:57:44

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Starting /opt/app/oracle/product/db_1/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 11.2.0.3.0 - Production
System parameter file is /opt/app/oracle/product/db_1/network/admin/listener.ora
Log messages written to /opt/app/oracle/diag/tnslsnr/db-deo-r1/listener/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=deo-server.example.com)(PORT=1521)))

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                02-APR-2013 09:57:44
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /opt/app/oracle/product/db_1/network/admin/listener.ora
Listener Log File         /opt/app/oracle/diag/tnslsnr/db-deo-r1/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=deo-server.example.com)(PORT=1521)))
The listener supports no services
The command completed successfully

 

It still says that “The listener supports no services”. To have database register the instance, you need to bounce the database so that during the start of DB, it will register the instance name with default listener

Now if you check the listener status, you will see service is registered automatically with default listener.

lsnrctl status

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 02-APR-2013 10:03:13

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                02-APR-2013 09:57:44
Uptime                    0 days 0 hr. 5 min. 28 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /opt/app/oracle/product/db_1/network/admin/listener.ora
Listener Log File         /opt/app/oracle/diag/tnslsnr/db-deo-r1/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=deo-server.example.com)(PORT=1521)))
Services Summary...
Service "deo_a.example" has 1 instance(s).
  Instance "deo", status READY, has 1 handler(s) for this service...
Service "deo_a_DGB.example" has 1 instance(s).
  Instance "deo", status READY, has 1 handler(s) for this service...
The command completed successfully

 

But what if you are not using default listener port (1521). Instead, you are using some other port – Say 1529. Oracle doesnt do dyamic registration on non-default listener port automatically.

So how do we tell Oracle what ports our listener is running on ? Answer to this question is local_listener parameter.

Lets stop default listener and start our original listener l_deo_001 on 1529 port. Also, I am not having SID_LIST section in my listener.ora file as I am going for dynamic registration

l_deo_001 =
    (description =
        (address_list =
            (address=(protocol=tcp)(host=deo-server.example.com)(port=1529))))
lsnrctl stop

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 02-APR-2013 10:08:42

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
The command completed successfully

lsnrctl start l_deo_001

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 02-APR-2013 10:10:49

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Starting /opt/app/oracle/product/db_1/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 11.2.0.3.0 - Production
System parameter file is /opt/app/oracle/product/db_1/network/admin/listener.ora
Log messages written to /opt/app/oracle/diag/tnslsnr/db-deo-r1/l_deo_001/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=deo-server.example.com)(PORT=1529)))

Connecting to (DESCRIPTION=(address=(protocol=tcp)(host=deo-server.example.com)(port=1529)))
STATUS of the LISTENER
------------------------
Alias                     l_deo_001
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                02-APR-2013 10:10:49
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /opt/app/oracle/product/db_1/network/admin/listener.ora
Listener Log File         /opt/app/oracle/diag/tnslsnr/db-deo-r1/l_deo_001/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=deo-server.example.com)(PORT=1529)))
The listener supports no services
The command completed successfully

In local_listener parameter we can specify non-default ports on which listener is running and Oracle will register services on those ports.

 

Example in my case, I am setting local_listener to following value

alter system set local_listener = '(address_list=(address=(protocol=tcp)(host=deo-server.example.com)(port=1529)))';

As soon as I set the value of local_listener, PMON will know the port number on which listener is running and it will dynamically register the instance with listener.

lsnrctl status l_deo_001

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 02-APR-2013 10:12:35

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(address=(protocol=tcp)(host=deo-server.example.com)(port=1529)))
STATUS of the LISTENER
------------------------
Alias                     l_deo_001
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                02-APR-2013 10:10:49
Uptime                    0 days 0 hr. 1 min. 45 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /opt/app/oracle/product/db_1/network/admin/listener.ora
Listener Log File         /opt/app/oracle/diag/tnslsnr/db-deo-r1/l_deo_001/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=deo-server.example.com)(PORT=1529)))
Services Summary...
Service "deo_a.example" has 1 instance(s).
  Instance "deo", status READY, has 1 handler(s) for this service...
Service "deo_a_DGB.example" has 1 instance(s).
  Instance "deo", status READY, has 1 handler(s) for this service...
The command completed successfully

 

So by setting LOCAL_LISTENER to the values appropriate for the listener, Oracle PMON process is able to contact the listener and register its instance.

 

Dynamic Service Registration:

We have seen dynamic instance registration which needs LOCAL_LISTENER database parameter to be set if we are not running the listener on default port (1521). What if we are using SERVICE_NAME in our TNS connect string as shown below

(DESCRIPTION = (ENABLE = BROKEN) (LOAD_BALANCE = YES) (ADDRESS = (PROTOCOL = TCP)(HOST = deo-server.example.com)(PORT = 1529)) (CONNECT_DATA = (SERVICE_NAME = adg_deo)))

If we want to register the service name dynamically in listener, we need to set service_names parameter in database

Currently we dont have ADG_DEO service in our listener

lsnrctl start l_deo_001

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 03-APR-2013 06:56:24

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Starting /opt/app/oracle/product/db_1/bin/tnslsnr: please wait...

TNSLSNR for Linux: Version 11.2.0.3.0 - Production
System parameter file is /opt/app/oracle/product/db_1/network/admin/listener.ora
Log messages written to /opt/app/oracle/diag/tnslsnr/db-deo-r1/l_deo_001/alert/log.xml
Listening on: (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=deo-server.example.com)(PORT=1529)))

Connecting to (DESCRIPTION=(address=(protocol=tcp)(host=deo-server.example.com)(port=1529)))
STATUS of the LISTENER
------------------------
Alias                     l_deo_001
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                03-APR-2013 06:56:24
Uptime                    0 days 0 hr. 0 min. 0 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /opt/app/oracle/product/db_1/network/admin/listener.ora
Listener Log File         /opt/app/oracle/diag/tnslsnr/db-deo-r1/l_deo_001/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=deo-server.example.com)(PORT=1529)))
Services Summary...
Service "deo" has 1 instance(s).
  Instance "deo", status UNKNOWN, has 1 handler(s) for this service...
The command completed successfully

 

I have local_listener value set as below

 

SYS.DEO>show parameters local_listener

NAME_COL_PLUS_SHOW_PARAM                             TYPE
-------------------------------------------------------------------------------- -----------
VALUE_COL_PLUS_SHOW_PARAM
-------------------------------------------------------------------------------------
local_listener                                     string
(address_list=(address=(protocol=tcp)(host=deo-server.example.com)(port=1529)))

Now I will set service_names parameter as below

SYS.DEO>alter system set service_names = 'adg_deo.example';

System altered.

SYS.DEO>

 

If you check the status of listener now, it will have that service name shown in the list

 

lsnrctl status l_deo_001

LSNRCTL for Linux: Version 11.2.0.3.0 - Production on 03-APR-2013 06:58:04

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(address=(protocol=tcp)(host=deo-server.example.com)(port=1529)))
STATUS of the LISTENER
------------------------
Alias                     l_deo_001
Version                   TNSLSNR for Linux: Version 11.2.0.3.0 - Production
Start Date                03-APR-2013 06:56:24
Uptime                    0 days 0 hr. 1 min. 40 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      OFF
Listener Parameter File   /opt/app/oracle/product/db_1/network/admin/listener.ora
Listener Log File         /opt/app/oracle/diag/tnslsnr/db-deo-r1/l_deo_001/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=deo-server.example.com)(PORT=1529)))
Services Summary...
Service "adg_deo.example" has 1 instance(s).
  Instance "deo", status READY, has 1 handler(s) for this service...
Service "deo" has 1 instance(s).
  Instance "deo", status UNKNOWN, has 1 handler(s) for this service...
Service "deo_a.example" has 1 instance(s).
  Instance "deo", status READY, has 1 handler(s) for this service...
Service "deo_a_DGB.example" has 1 instance(s).
  Instance "deo", status READY, has 1 handler(s) for this service...
The command completed successfully

So basically all the services which are listed in service_names database parameter will be registered by PMON in listener ports which are mentioned in local_listener parameter.

I hope I am able to clearify how static and dynamic instnace/service registration works.

 

Oracle SQL Patch – I

In my previous posts we have seen fixing plans by applying baselines and profiles.
For profiles we saw in details how to generate the same using SQL hints.

This article is about another feature of Oracle 11.2, called SQL Patch.
I am not sure if this is supported by Oracle, but in days to come they will make this official.
This is a kind of silver bullet for doing minor changes in the plan which is difficult to get it done using baselines and profiles.

What is SQL Patch:

A SQL patch is a SQL manageability object that can be generated by the SQL Repair Advisor, in order to circumvent a plan which causes a failure.  In essence, a SQL patch tells the optimizer to change the plan in some way or avoid an optimization so that the failure does not occur.  For instance, if a failure happens during index access, then generating a plan without that index (or without any index) may work around the failure.  You can read more about this in the documentation here

But we are going to use SQL Patch to fix a query plan.

Lets take an example:

SQL>create table T as select * from dba_objects;

Table created.

SQL>create index T_IDX on T(object_ID);

Index created.

SQL>exec dbms_stats.gather_table_stats(USER,'T',cascade=>true);

PL/SQL procedure successfully completed.

SQL>explain plan for 
  2  select * from T where object_id = 10;

Explained.

SQL>select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1601196873

--------------------------------------------------------------------------
| Id  | Operation      | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |     |     1 |    89 |    62   (2)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| T     |     1 |    89 |    62   (2)| 00:00:01 |
--------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("OBJECT_ID"=10)

13 rows selected.

SQL>

We see that its going for a FTS for accessing the table.

Now, lets try to use patch so that same query will start using index.

We can use sys.dbms_sqldiag_internal.i_create_patch procedure to create patch
This procedure needs

– sql text
– hints to be applied
– catagory in which to save the patch
– name of the SQL Patch

If SQL text is too big, I have given a procedure at the end of this artical which can be used. It ask for SQL ID, child number and hint to be applied. Its very easy to use.

So lets try to create a SQL patch using sys.dbms_sqldiag_internal.i_create_patch procedure

 

SQL>exec sys.dbms_sqldiag_internal.i_create_patch(sql_text  => 'select * from T where object_id = 10',hint_text => 'INDEX_RS_ASC(@"SEL$1" "T"@"SEL$1" ("T"."OBJECT_ID"))',category  => 'DEFAULT',name => 'PATCH_gz85dtvwaj4fw');

PL/SQL procedure successfully completed.

SQL>select count(1) from dba_sql_patches where name = 'PATCH_gz85dtvwaj4fw';

  COUNT(1)
----------
     1

1 row selected.

SQL>

 

Here are the meanings of values I provided

sql_text – This is the text of the SQL. If text is too long, conside using PLSQL procedure provided at the end of the article.

hint_text – This is the hint we want to provide. Now this will seem different that what we usually provide in the SQL. If you are not sure of the exact hint, here is what you can do.

I used normal hint in my SQL to generate a plan

explain plan for
select /*+ index(T_IDX T) */ * from T where object_id = 10;

based on this you can parse other_xml column in plan_table using following SQL

SELECT regexp_replace(extractvalue(value(d), '/hint'),'''','''''') plan_hint
        from
        xmltable('/*/outline_data/hint'
                passing (
                        select
                                xmltype(other_xml) as xmlval
                        from    plan_table
                        where   other_xml is not null
                        and     plan_id = (select max(plan_id) from plan_table)
                        and     rownum=1
                        )
                ) d;

PLAN_HINT
--------------------------------------------------------------------------------------------------------------------------------------------
INDEX_RS_ASC(@"SEL$1" "T"@"SEL$1" ("T"."OBJECT_ID"))
OUTLINE_LEAF(@"SEL$1")
OPT_PARAM(''optimizer_index_caching'' 80)
OPT_PARAM(''optimizer_index_cost_adj'' 1)
OPT_PARAM(''_optimizer_connect_by_cost_based'' ''false'')
OPT_PARAM(''_optim_peek_user_binds'' ''false'')
OPT_PARAM(''_b_tree_bitmap_plans'' ''false'')
DB_VERSION(''11.2.0.2'')
OPTIMIZER_FEATURES_ENABLE(''11.2.0.2'')
IGNORE_OPTIM_EMBEDDED_HINTS

10 rows selected.

 

Once you get the output, you can pick the index hint from above – INDEX_RS_ASC(@”SEL$1″ “T”@”SEL$1” (“T”.”OBJECT_ID”))

Catagory can be any catagory you want to have. Usually everything should go to default if not specified.

Name of the patch can also be anything you want to give.

Now, if we check the plan for original query, it will start using index.

 

SQL>select * from table(dbms_xplan.display);

PLAN_TABLE_OUTPUT
--------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 470836197

-------------------------------------------------------------------------------------
| Id  | Operation            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |        |      1 |     89 |    200   (0)| 00:00:03 |
|   1 |  TABLE ACCESS BY INDEX ROWID| T     |      1 |     89 |    200   (0)| 00:00:03 |
|*  2 |   INDEX RANGE SCAN        | T_IDX |      1 |        |    100   (0)| 00:00:02 |
-------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("OBJECT_ID"=10)

Note
-----
   - SQL patch "PATCH_gz85dtvwaj4fw" used for this statement
   - SQL plan baseline "SQL_PLAN_2anpx5hbuf3cbae82cf72" used for this statement

19 rows selected.

SQL>

 

Also, in the Note section you can see patch “PATCH_gz85dtvwaj4fw” is getting used for this statement.

Hope this helps !!

Redefining tables online – Oracle 11g

Introduction:

One of the many challenges that we face in production environment is make changes to big tables.
If you consider a case of any OLTP systems, its easy to have tables whose size is beyond 10G.

This again depends on the nature of the database and kind of transactions happening on the table.

So lets consider a case where you have a big table which is also a very hot table having very high number of transactions / sec and something that continues 24X7.
Its difficult to take downtime on such tables and making DDL changes to such tables could be a nightmare.

Oracle has a great feature introudcued since Oracle 9i, but many DBAs doesnt seem to be aware of this feature – Online table redefiniation.

Online table redefinition allows you to make DDL changes to the table definition and requires very little downtime (less than a minute).
Techinically its not the same table that gets modified, but its another copy of the same table which has the required modification made.

You might question if we are making a new copy of the table we can as well use CTAS (Create Table as Select) and make the required changes.
But its not just about creating new object and copying the data. Online redefinition does lot more than that.

I will briefly explain you the features of online redefinition, followed by process and then we will straight way get to the examples which will help you to understand better.

Features – What it can do:

Online table redefinition enables you to:

  • Modify the storage parameters of a table or cluster
  • Move a table or cluster to a different tablespace
  • Add, modify, or drop one or more columns in a table or cluster
  • Add or drop partitioning support (non-clustered tables only)
  • Change partition structure
  • Change physical properties of a single table partition, including moving it to a different tablespace in the same schema
  • Change physical properties of a materialized view log or an Oracle Streams Advanced Queueing queue table
  • Re-create a table or cluster to reduce fragmentation

Process – How its done:

To briefly explain the process, it involves following steps

1) Check if the table can be refined by running DBMS_REDEFINITION.CAN_REDEF_TABLE procedure.

We have following restrictions on redefining the table

  • One cannot redefine Materialized Views (MViews) and tables with MViews or MView Logs defined on them.
  • One cannot redefine Temporary and Clustered Tables
  • One cannot redefine tables with BFILE, LONG or LONG RAW columns
  • One cannot redefine tables belonging to SYS or SYSTEM
  • One cannot redefine Object tables
  • Table redefinition cannot be done in NOLOGGING mode (watch out for heavy archiving)

2) Create a new table with all of the desired logical and physical attributes.

If you want to change non-partition table to partition, you can create a new partition table. Structure of the table should be exactly the way you want to convert to.

3) Start redefinition process using DBMS_REDEFINITION.START_REDEF_TABLE procedure.

Be careful before running this command. If you must know, this command will start populating new table from the data in old table.
So if your old table is very big, then you need to have same amount of space available in the tablespace where new table is created.
Also, this command might take very long time if the size is big, so make sure you don’t have any disconnection in between.

If needed you can enable parallel before starting redefinition using following commands

alter session force parallel dml parallel degree-of-parallelism;
alter session force parallel query parallel degree-of-parallelism;

4) Sync new table on regular basis till cut off time

You should use DBMS_REDEFINITION.SYNC_INTERIM_TABLE and keep new table in sync with changes that happens on current production table.
This will reduce the cut off time. Cut off time if when you are going to point everything to new table and services will start writing to new table.
The more you keep new table and current production table in sync, lesser will be cut off time and downtime.

5) Copy dependent objects (such as triggers, indexes, grants, and constraints)

Copy dependent objects (such as triggers, indexes, materialized view logs, grants, and constraints) and statistics from production table to new table.
You should make sure that all dependents are copied.
You can do this manually by creating each dependent object or you can do it automatically using DBMS_REDEFINITION.COPY_TABLE_DEPENDENTS

6) Complete redefinition process by using DBMS_REDEFINITION.FINISH_REDEF_TABLE procedure

This will complete the redefinition process. This needs exclusive lock on production table which you want to redefine.
So you need to arrange for short downtime. However, FINISH_REDEF_TABLE will wait for all pending DML to commit before completing the redefinition.

Example:

Lets take an example:

We have a table T as shown below.
We have a primary key on OBJECT_ID column.
We have a public synonym for table T.

SQL>desc T
 Name                       Null?    Type
 ----------------------------------------- -------- ----------------------------
 OWNER                            VARCHAR2(30)
 OBJECT_NAME                        VARCHAR2(128)
 SUBOBJECT_NAME                     VARCHAR2(30)
 OBJECT_ID                        NUMBER
 DATA_OBJECT_ID                     NUMBER
 OBJECT_TYPE                        VARCHAR2(19)
 CREATED                        DATE
 LAST_DDL_TIME                        DATE
 TIMESTAMP                        VARCHAR2(19)
 STATUS                         VARCHAR2(7)
 TEMPORARY                        VARCHAR2(1)
 GENERATED                        VARCHAR2(1)
 SECONDARY                        VARCHAR2(1)
 NAMESPACE                        NUMBER
 EDITION_NAME                        VARCHAR2(30)

SQL>

Currently this is not a partitioned table

SQL>select table_name, partitioned from user_tables where table_name = 'T';

TABLE_NAME               PAR
------------------------------ ---
T                   NO

Lets try to convert this into partition table.

You can check the meaning of every parameter supplied to below procedures at – http://docs.oracle.com/cd/B28359_01/appdev.111/b28419/d_redefi.htm
Step 1) Check if the table can be refined by running DBMS_REDEFINITION.CAN_REDEF_TABLE procedure.

SQL>set serveroutput on
SQL>EXEC DBMS_REDEFINITION.CAN_REDEF_TABLE(user,'T',DBMS_REDEFINITION.CONS_USE_PK);

PL/SQL procedure successfully completed.

SQL>

If the table is not a candidate for online redefinition, an error message is raised.

Step 2) Create a new intrim table with all of the desired logical and physical attributes.

For table T lets try to partition by CREATED which is a date column. I am planning to partition by year so we can get 10 partitions

SQL>select to_char(CREATED,'YYYY') from T group by to_char(CREATED,'YYYY');

TO_C
----
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012

10 rows selected.

CREATE TABLE "T_INTRIM"
   (    "OWNER" VARCHAR2(30),
    "OBJECT_NAME" VARCHAR2(128),
    "SUBOBJECT_NAME" VARCHAR2(30),
    "OBJECT_ID" NUMBER,
    "DATA_OBJECT_ID" NUMBER,
    "OBJECT_TYPE" VARCHAR2(19),
    "CREATED" DATE,
    "LAST_DDL_TIME" DATE,
    "TIMESTAMP" VARCHAR2(19),
    "STATUS" VARCHAR2(7),
    "TEMPORARY" VARCHAR2(1),
    "GENERATED" VARCHAR2(1),
    "SECONDARY" VARCHAR2(1),
    "NAMESPACE" NUMBER,
    "EDITION_NAME" VARCHAR2(30),
     CONSTRAINT "T_PK_INTRIM_01" PRIMARY KEY ("OBJECT_ID")
    )
PARTITION BY RANGE(CREATED)
(
PARTITION T_2003 VALUES LESS THAN (TO_DATE('01-JAN-2004','DD-MON-YYYY')),
PARTITION T_2004 VALUES LESS THAN (TO_DATE('01-JAN-2005','DD-MON-YYYY')),
PARTITION T_2005 VALUES LESS THAN (TO_DATE('01-JAN-2006','DD-MON-YYYY')),
PARTITION T_2006 VALUES LESS THAN (TO_DATE('01-JAN-2007','DD-MON-YYYY')),
PARTITION T_2007 VALUES LESS THAN (TO_DATE('01-JAN-2008','DD-MON-YYYY')),
PARTITION T_2008 VALUES LESS THAN (TO_DATE('01-JAN-2009','DD-MON-YYYY')),
PARTITION T_2009 VALUES LESS THAN (TO_DATE('01-JAN-2010','DD-MON-YYYY')),
PARTITION T_2010 VALUES LESS THAN (TO_DATE('01-JAN-2011','DD-MON-YYYY')),
PARTITION T_2011 VALUES LESS THAN (TO_DATE('01-JAN-2012','DD-MON-YYYY')),
PARTITION T_2012 VALUES LESS THAN (TO_DATE('01-JAN-2013','DD-MON-YYYY'))
);

3) Start redefinition process using DBMS_REDEFINITION.START_REDEF_TABLE procedure.

BEGIN
DBMS_REDEFINITION.START_REDEF_TABLE(
   uname        => user,
   orig_table   => 'T',
   int_table    => 'T_INTRIM',
   col_mapping  => NULL,
   options_flag => DBMS_REDEFINITION.CONS_USE_ROWID
   );
END;
/

After this both table should have near about same amount of record

SQL>select count(1) from T_INTRIM;

  COUNT(1)
----------
     61536

SQL>select count(1) from T;

  COUNT(1)
----------
     61536

SQL>

If you have continuous inserts going on your original table than you might have little more records in original table than intrim table.

4) Sync new table on regular basis till cut off time

BEGIN 
DBMS_REDEFINITION.SYNC_INTERIM_TABLE(
   uname      => user, 
   orig_table => 'T', 
   int_table  => 'T_INTRIM'
   );
END;
/

The way this sync works is, online redefinition will automatically create a MLOG table on original table.
In any of the above step we didn’t create any MLOG table on table T.

But if you check now, you will see MLOG table created automatically.

SQL>select log_table from user_snapshot_logs where master = 'T';

LOG_TABLE
------------------------------
MLOG$_T

This is required for syncing changed made to table T.

5) Copy dependent objects (such as triggers, indexes, grants, and constraints)

DECLARE
num_errors PLS_INTEGER;
BEGIN
DBMS_REDEFINITION.COPY_TABLE_DEPENDENTS(
    uname               => user,
    orig_table          => 'T',
    int_table           => 'T_INTRIM',
    copy_indexes        => DBMS_REDEFINITION.CONS_ORIG_PARAMS,
    copy_triggers       => TRUE,
    copy_constraints    => TRUE,
    copy_privileges     => TRUE,
    ignore_errors       => TRUE,
    num_errors          => num_errors);
END;
/

PL/SQL procedure successfully completed.

Before we finish online redefinition you can check if table is partition and data is distributed to all partitions

SQL>select table_name, partitioned from user_tables where table_name in ('T','T_INTRIM');

TABLE_NAME               PAR
------------------------------ ---
T                   NO
T_INTRIM               YES

SQL>select to_char(created,'YYYY'), count(1) from T_INTRIM group by to_char(created,'YYYY');

TO_C   COUNT(1)
---- ----------
2003       7902
2005       1820
2009       2742
2010       6765
2008       2612
2007       1016
2011      10474
2004        756
2012      23474
2006       3975

10 rows selected.

Once we finish redefinition table T will become partition table and T_INTRIM will become non-partition table.
For this it needs exclusive lock.

6) Complete redefinition process by using DBMS_REDEFINITION.FINISH_REDEF_TABLE procedure

BEGIN
DBMS_REDEFINITION.FINISH_REDEF_TABLE(user, 'T', 'T_INTRIM');
END;
/

PL/SQL procedure successfully completed.

SQL>select table_name, partitioned from user_tables where table_name in ('T','T_INTRIM');

TABLE_NAME               PAR
------------------------------ ---
T                   YES
T_INTRIM               NO

SQL>

so now table T is partitioned table.

References:

http://docs.oracle.com/cd/B28359_01/server.111/b28310/tables007.htm

http://docs.oracle.com/cd/B28359_01/appdev.111/b28419/d_redefi.htm

Hope this helps !!

Materialized view Concepts – Discussion Series 3

We have seen Materialized view Concepts – Discussion Series 1 and Materialized view Concepts – Discussion Series 2.
This is the third article about Materialized views.

In this article we are going to discuss “How fast refresh works ?”

 

How fast refresh works ?

As we know in case of fast refresh only the changes that happened on master site (or master table) will be applied to MView on target site.
So the changes that happens on master table will be stored in MLOG table created on top of master table.
This is more efficient way than doing complete refresh.

As we know MLOG can be based on primary key or ROWID. ROWID MLOG is uncommon and is not used because if master table gets moved then ROWID will get changed and changes saved in MLOG will be invalid.
So to identify the changes on master table usually primary key based MLOG is created on top of master table.

Only 1 MLOG can be created on master table even if we have multiple sites refreshing from 1 master table.

Before we check on how fast refresh works, lets understand some of the components of fast refresh

MLOG$ table and its important columns

SNAPTIME$$             – This is a date columns and holds the date of 1st Mview refresh time. Example if we have 3 Mviews registered on 1 master table, than this column will hold oldest refresh date among the 3 Mviews
DMLTYPE$$              – This column tells you the type of DML ( U – Update, D – Delete, I – Insert )
OLD_NEW$$             – This column allow the fast-refresh mechanism to distinguish between rows inserted at the mview site and rows with modified primary key values.
CHANGE_VECTOR$$     – Used for subquery and LOB MViews

Apart from above standard columns in MLOG$ table, there will also be primary key columns. These columns are same as primary key columns of master table

AFTER ROW trigger on the master table

From Oracle8 and onward this trigger is kernelized and is no longer visible in the data dictionary. The same trigger can support both ROWID and primary key MViews and will populate the MLOG$_<table_name> with the proper values. It also populates the SNAPTIME$$ column (indicating the latest refresh time so far a particular row) and the DMLTYPE$$ column. The snaptime$$ column is populated based on the value of the snaptime column in snap$ table at the MView site and is not updated until the log is first used by a MView refresh.

Registration of Mview on master site

Oracle automatically tries to register a materialized view at its master site or master materialized view site when you create the materialized view, and unregisters when you drop it. The same applies to materialized view groups. Registration of fast refreshable materialized view logs in the master database (SYS.SLOG$) is needed to perform fast refreshes. This information is also used to maintain the materialized view log of the master table.

Fast refresh operation

Fast refresh operation consists of 3 phases

  1. Setup Phase
  2. Refresh Phase
  3. Wrap-up Phase

1) Setup Phase:

Setup has to check if the Mview being refreshed is ROWID based Mview or Primary key based Mview.
After that it has to verify if fast-refresh can be performed for this MView. An MView can perform a fast refresh only if it can use the MView log. This can be determined by checking entry in SYS.SLOG$ table on master site.


DB1>select MASTER, SNAPID, SNAPTIME from SYS.SLOG$ where MASTER = 'T';

MASTER                   SNAPID SNAPTIME
------------------------------ ---------- ----------------
T                    15119 2012-10-14:03:01
T                    15120 2012-10-17:09:47

 

If an entry is present in this table for that SNAPID, then it can be fast refreshed.
Its possible that you have created a complete refreshable MView on some master table which didn’t had MLOG. Later point of time you created MLOG and may be wondering why fast refresh not happening.
You can check the entry in this table and verity. Also in such cases you need to drop and recreate Mview on prebuilt table and make it fast refreshable.

Once its confirmed that fast refresh is possible for MView in question, snaptime$$ column is updated in the MLOG$ table of the altered rows to its own refresh date and time for the first MView that refreshes. This value does not change until the rows are eventually purged from the log.

2) Refresh Operation:

After setup phase, a second check is made to see if fast refresh can be done. This time its the date comparison that is done to ensure their is no mismatch of data.
For fast refresh to work, timestamp of oldest/oldest_pk column in mlog$ must be older than the time of last refresh.

How fast refresh work

Lets check out in detail how the date changes happen.
We will keep an eye on 3 tables

1) SYS.SLOG$ on master
2) SYS.MLOG$ on master
3) MLOG$_<master_table> on master (This is out MLOG$ table)

I will explain you with live example.

Master table name – T (on DB1 database)
MLOG table name – MLOG$_T (no DB1 database)

We have 2 snapshot sites and each has fast refreshable MView created.

MView Name – T_REP (on DB2 database)
MView Name – T_REP (on DB3 database)

On Master site:


DB1>select MASTER, SNAPSHOT, SNAPID, SNAPTIME from SYS.SLOG$ where MASTER = 'T';

MASTER                   SNAPSHOT         SNAPID SNAPTIME
------------------------------ ---------------- ---------- ----------------
T                             15119 2012-10-24:01:16
T                             15120 2012-10-24:01:17

Above output represents 2 snapshot site registered on Master table/site

DB1>select MASTER, OLDEST_PK, YOUNGEST, LOG, MTIME, LAST_PURGE_DATE from SYS.MLOG$ where MASTER = 'T';

MASTER                   OLDEST_PK    YOUNGEST     LOG                MTIME         LAST_PURGE_DATE
------------------------------ ---------------- ---------------- ------------------------------ ---------------- ----------------
T                   2012-10-24:01:16 2012-10-24:01:17 MLOG$_T            2012-10-13:22:25 2012-10-24:01:17

 

In SYS.MLOG$ table you will have 1 record for each MLOG$ that you create. Since a master table can have only 1 MLOG you will have 1 record for each master table.

There are no records in MLOG$_T table

DB1>select count(1) from MLOG$_T;

  COUNT(1)
----------
     0

Lets make changes in master table T and see what happens in above 3 tables

DB1>update T set DATA_OBJECT_ID = 0 where OBJECT_ID = 2300;

1 row updated.

DB1>commit;

Commit complete.

DB1>

With above modification, no change seen in SYS.SLOG$ table and SYS.MLOG$ table
DB1>select MASTER, SNAPSHOT, SNAPID, SNAPTIME from SYS.SLOG$ where MASTER = 'T';

MASTER                   SNAPSHOT         SNAPID SNAPTIME
------------------------------ ---------------- ---------- ----------------
T                             15119 2012-10-24:01:16
T                             15120 2012-10-24:01:17

DB1>
DB1>select MASTER, OLDEST_PK, YOUNGEST, LOG, MTIME, LAST_PURGE_DATE from SYS.MLOG$ where MASTER = 'T';

MASTER                   OLDEST_PK    YOUNGEST     LOG                MTIME         LAST_PURGE_DATE
------------------------------ ---------------- ---------------- ------------------------------ ---------------- ----------------
T                   2012-10-24:01:16 2012-10-24:01:17 MLOG$_T            2012-10-13:22:25 2012-10-24:01:17

However we see new entry in MLOG$_T table corresponding to the row changed

DB1>select OBJECT_ID, SNAPTIME$$, DMLTYPE$$, OLD_NEW$$ from MLOG$_T;

 OBJECT_ID SNAPTIME$$        D O
---------- ---------------- - -
      2300 4000-01-01:00:00 U U

OBJECT_ID is the primary key column and value in MLOG$_T table represent the row we changed

Observe the value of SNAPTIME$$ – 4000-01-01. This is a date in future which will not be reached in lifetimes. This date tells us that none of the Mview sites has done a fast refresh of this change.

So lets rewind our statement about second check done during “Refresh Phase”

For fast refresh to work, timestamp of oldest/oldest_pk column in mlog$ must be older than the time of last refresh.

It means oldest_pk <= last refresh time on Mview sites

We can check last refresh time of Mviews by checking DBA_SHAPSHOTS table or DBA_SNAPSHOT_REFRESH_TIMES table on Mview site.  But same thing can be checked on master site using SYS.SLOG$ table

We know oldest_pk timestamp – 2012-10-24:01:16
Min(last_refresh time) – 2012-10-24:01:16

So since oldest_pk <= last refresh time on Mview sites, refresh can proceed

If one of the site does a fast refresh we can see that MLOG$_T.SNAPTIME$$ timestamp gets updated to refresh time

DB2>exec dbms_snapshot.refresh('T_REP','F');

PL/SQL procedure successfully completed.

DB2>

DB1>select OBJECT_ID, SNAPTIME$$, DMLTYPE$$, OLD_NEW$$ from MLOG$_T;

 OBJECT_ID SNAPTIME$$        D O
---------- ---------------- - -
      2300 2012-10-24:01:52 U U

DB1>

We are yet to refresh second site which is registered for this master table and so the record from MLOG$_T is not deleted.

Also, 1 note with respect to SYS.MLOG$ table


DB1>select MASTER, OLDEST_PK, YOUNGEST, LOG, MTIME, LAST_PURGE_DATE from SYS.MLOG$ where MASTER = 'T';

MASTER                   OLDEST_PK    YOUNGEST     LOG                MTIME         LAST_PURGE_DATE
------------------------------ ---------------- ---------------- ------------------------------ ---------------- ----------------
T                   2012-10-24:01:17 2012-10-24:01:52 MLOG$_T            2012-10-13:22:25 2012-10-24:01:52

 

YOUNGEST column represent the latest refresh time. If you have 5 sites, the one you refresh latest will have that timestamp updated in YOUNGEST column
OLDEST_PK column represent the oldest refresh time. If you have 5 sites, the one you refresh first will have that timestamp updated in OLDEST_PK column
OLDEST column is used in ROWID based MLOG

3) Wrap-up Phase

In this phase it checks if all Mviews are refreshed and if the changes in MLOG$_T table has gone to all site and if the entries in MLOG$_T table can be purged.

Again Oracle checks dates in above 3 tables to determine which records in MLOG$ table can be purged.

How MLOG$ purge works

Oracle automatically tracks which rows in a MView log have been used during the refreshes of MViews, and purges these rows from the log so that the log does not grow endlessly. Because multiple simple mviews can use the same MView log, rows already used to refresh one MView may still be needed to refresh another MView. Oracle does not delete rows from the log until all MViews have used them. In the wrap-up phase of the refresh process, the master MView log is purged. It deletes all unnecessary rows from the MView log. Rows in the MView log are unnecessary if their refresh timestamps MLOG$_<table_name>.SNAPTIME$$ are older or equal than the oldest entry in SLOG$.SNAPTIME for this log.

It means rows in MLOG$_<master_table> will be deleted whose MLOG$_<master_table>.SNAPTIME$$ <= min(SYS.SLOG$.SNAPTIME)

Lets take an example

We have a live example going where we have updated 1 record in master table and we have refreshed 1 site (out of 2 sites registered)

Here are the outputs of 2 required tables


DB1>select OBJECT_ID, SNAPTIME$$, DMLTYPE$$, OLD_NEW$$ from MLOG$_T;

OBJECT_ID SNAPTIME$$        D O
---------- ---------------- - -
2300 2012-10-24:01:52 U U

DB1>select MASTER, SNAPSHOT, SNAPID, SNAPTIME from SYS.SLOG$ where MASTER = 'T';

MASTER                   SNAPSHOT         SNAPID SNAPTIME
------------------------------ ---------------- ---------- ----------------
T                             15119 2012-10-24:01:52
T                             15120 2012-10-24:01:17

As you can see SNAPTIME$$ timestamp has the time when this record was first refresh by any site. If we have 3 sites and 1st site refreshes the records this timestamp will get updated.
But if 2nd site refreshes the record, this timestamp will NOT change. However refresh of second site will change the SNAPTIME column in SYS.SLOG$ table. SNAPTIME column in SYS.SLOG$ table always has the latest refresh time for corresponding site.

So we can see that MLOG$_T.SNAPTIME$$ – 2012-10-24:01:52 (this is the time when 1st site got refreshed)
min(SYS.SLOG$.SNAPTIME) – 2012-10-24:01:17

Since MLOG$_T.SNAPTIME$$ > min(SYS.SLOG$.SNAPTIME), row will not be deleted.

If I refresh 2nd site than SYS.SLOG$.SNAPTIME corresponding to that site will get updated with refresh time and in that case MLOG$_T.SNAPTIME$$ <= MIN(SYS.SLOG$.SNAPTIME). When this condition happens it will delete the record from MLOG$ table.

Logically this means that all the sites registered for this master table is been refreshed.

Lets try to refresh 2nd site. This should purge the record from MLOG. We will also see how the dates are updated in 3 tables on master site

After refreshing second site

No rows in MLOG$_T table

DB1>select OBJECT_ID, SNAPTIME$$, DMLTYPE$$, OLD_NEW$$ from MLOG$_T;

no rows selected

SNAPTIME for second site got updated in SYS.SLOG$ table

DB1>select MASTER, SNAPSHOT, SNAPID, SNAPTIME from SYS.SLOG$ where MASTER = 'T';

MASTER                   SNAPSHOT         SNAPID SNAPTIME
------------------------------ ---------------- ---------- ----------------
T                             15119 2012-10-24:01:52
T                             15120 2012-10-24:02:56

In SYS.MLOG$ table, Previous YOUNGEST became OLDEST_PK and new YOUNGEST is the latest refreshed MView timestamp.
Also, LAST_PURGE_DATE gets updated when a record gets purged in MLOG$_T table


DB1>select MASTER, OLDEST_PK, YOUNGEST, LOG, MTIME, LAST_PURGE_DATE from SYS.MLOG$ where MASTER = 'T';

MASTER                   OLDEST_PK    YOUNGEST     LOG                MTIME         LAST_PURGE_DATE
------------------------------ ---------------- ---------------- ------------------------------ ---------------- ----------------
T                   2012-10-24:01:52 2012-10-24:02:56 MLOG$_T            2012-10-13:22:25 2012-10-24:02:56

In next article (probably last one of this discussion series), we will understand problems associated with MView Log, Modifying primary key on master table and what is “I am refresh”.

Hope this helps !!