Fixing SQL plans on ADG using SQL Profiles

With Active Dataguard setup, many of the read only applications runs on ADG and sometime the SQLs that are used by these applications runs into bad plans. ADG being a read only database, its not possible to create a baseline or a profile on ADG.

One of the way to have good plan for SQLs running on ADG is to make them run on primary first and fix the plan by creating profile or baseline and then transfer these SQLs to ADG. Practically I have seen many times that Dev teams are not very flexible in changing the configuration to run the queries on primary. Either there configuration setup is complex and takes more time and efforts to change them or most of the time in bigger enterprises, they have central configuration file which when changed, changes the configuration for entire region. In that case complete software techstack will run on primary for entire region and not just 1 database.

Another way to deal make queries run on primary is to point ADG CNAME to primary. That way all the applications which are supposed to run on ADG will start running on primary. Down side for this is the increase in load on primary as all the applications on ADG will connect to primary. Its too much of a change to fix 1 SQL on ADG.

This article is about fixing plans on ADG using profile, without making any other change. We don’t have to run the query on primary to fix the plan.

Fixing SQL plan on ADG

In one of my old article “Fixing SQL Plans: The hard way – Part 1“, I mentioned about DBMS_SQLTUNE.IMPORT_SQL_PROFILE procedure. This procedure takes internal SQL hints that optimizer understands as input and create a profile to fix the plan.

In that method, I am providing SQL text as input, but the way I am providing SQL text as input is by selecting “SQL_TEXT” column from either V$SQL_TEXT view or from DBA_HIST_SQLTEXT view.

Following is the partial code of the PL/SQL block I used to create profile

for sql_rec in (
 select t.sql_id, t.sql_text
 from dba_hist_sqltext t, dba_hist_sql_plan p
 where t.sql_id = p.sql_id
 and p.sql_id = '&SQL_ID_TO_FIX'
 and p.plan_hash_value = &BAD_PLAN_HASH_VALUE
 and p.parent_id is null
)
loop
 DBMS_SQLTUNE.IMPORT_SQL_PROFILE(
 sql_text => sql_rec.sql_text,
 profile => ar_profile_hints,
 name => 'PROFILE_33fndgzsas09k');

end loop;

Why we cannot use this method for ADG?

We cannot use this method for ADG. You will find the required SQL text in V$SQL_TEXT view in ADG, but you cannot create SQL profile on ADG as its a read-only database.
Also, you cannot create a profile on primary because you will not find SQL text on primary.

So, How do we fix this?

Method 1:

One of the method I thought of for fixing the SQL is by creating a database link between primary and ADG (say adg_db_link) and use that DB link to get SQL text from standby. So above PL/SQL code will look like below. Note that I have change DBA_HIS* table with V$ views because ADG data will be present in only V$ views. DBA_HIST* tables will have data from primary database only as part of AWR snapshot.

for sql_rec in (
 select t.sql_id, t.sql_text
 from v$sqltext@adg_db_link t, v$sql_plan@adg_db_link p
 where t.sql_id = p.sql_id
 and p.sql_id = '&SQL_ID_TO_FIX'
 and p.plan_hash_value = &BAD_PLAN_HASH_VALUE
 and p.parent_id is null
)
loop
 DBMS_SQLTUNE.IMPORT_SQL_PROFILE(
 sql_text => sql_rec.sql_text,
 profile => ar_profile_hints,
 name => 'PROFILE_33fndgzsas09k');

end loop;

You can run the complete PL/SQL block on primary and using the DB link will fetch the required SQL text from ADG and you can create SQL profile on primary.

Once this SQL profile is created, it should be picked up by SQL query on ADG. Note that you need to purge the SQL from shared pool of ADG so that SQL will go for hard parse and pick the profile.

Method 2:

Another method we can use, is to pass SQL text as input to DBMS_SQLTUNE.IMPORT_SQL_PROFILE procedure. This is the method I implemented and it worked for me.

Let’s take a example.

Following is my SQL text that I want to fix. I was having a completely different SQL in my environment, but I cannot provide actual text for that SQL. My SQL was very big and complex, So I simplified the same and changed table name and column name to understand easily.

select * from ( 
          select this_.col1 as col1,
                 this_.col2 as col2, 
                 this_.col3 as col3, 
                 this_.col4 as col4
          from TAB1 this_ 
          inner join TAB2 compaddres3_ on this_.col2 = compaddres3_.col2 
          inner join TAB3 compshipme1_ on this_.col1=compshipme1_.col1
          where this_.col1 in (:1) 
          and   this_.col2 in (:2 , :3) 
          and   compshipme1_.col1 in (:4 )
          and   this_.col8 = 'OPEN'
          and   this_.col5>=:5 
          and   this_.col5<=:6 
          order by this_.col1 asc ) 
where rownum <= :7

Assuming that existing plan in ADG database is bad, we need to find a good plan of this SQL and also create a profile so that new good plan will be picked in ADG.
We are going to do that without running this SQL on primary or without pointing ADG CNAME to primary. Also, we are not going to do any code changes to add any hints.

Following are the steps to fix the plan on ADG

1) Get a good plan for this SQL

You can use the hints in SQL text and do an explain plan on primary to check if the plan looks good. Alternatively, if you want to be sure, you can run the hinted SQL in ADG to make sure that SQL runs good and efficiently.

I used few index hints in my SQL to make sure correct index is getting picked. I did “explain plan” for the SQL to check the explain plan.

explain plan for
select * from ( 
          select /*+ index(tab1 I_UNIQ_COL2) */ this_.col1 as col1,
                 this_.col2 as col2, 
                 this_.col3 as col3, 
                 this_.col4 as col4
          from TAB1 this_ 
          inner join TAB2 compaddres3_ on this_.col2 = compaddres3_.col2 
          inner join TAB3 compshipme1_ on this_.col1=compshipme1_.col1
          where this_.col1 in (:1) 
          and   this_.col2 in (:2 , :3) 
          and   compshipme1_.col1 in (:4 )
          and   this_.col8 = 'OPEN'
          and   this_.col5>=:5 
          and   this_.col5<=:6 
          order by this_.col1 asc ) 
where rownum <= :7

I got following plan after using required hints

Plan hash value: 2524091007

-------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                                 | Name         | Rows | Bytes | Cost (%CPU)| Time     | Pstart| Pstop |
-------------------------------------------------------------------------------------------------------------------------
| 0   | SELECT STATEMENT                          |              | 134  | 1524K | 4134   (3) | 00:00:50 |       |       |
|* 1  |  COUNT STOPKEY                            |              |      |       |            |          |       |       |
| 2   |   VIEW                                    |              | 134  | 1524K | 4134   (3) | 00:00:50 |       |       |
|* 3  |    SORT ORDER BY STOPKEY                  |              | 134  | 71154 | 4134   (3) | 00:00:50 |       |       |
|* 4  |     FILTER                                |              |      |       |            |          |       |       |
| 5   |      NESTED LOOPS                         |              | 134  | 71154 | 4133   (3) | 00:00:50 |       |       |
| 6   |       NESTED LOOPS                        |              | 134  | 43550 | 3731   (3) | 00:00:45 |       |       |
| 7   |        TABLE ACCESS BY INDEX ROWID        | TAB1         | 1    | 22    | 1      (0) | 00:00:01 |       |       |
|* 8  |         INDEX UNIQUE SCAN                 | I_UNIQ_COL2  | 1    |       | 0      (0) | 00:00:01 |       |       |
| 9   |        TABLE ACCESS BY GLOBAL INDEX ROWID | TAB2         | 134  | 40602 | 3730   (3) | 00:00:45 | ROWID | ROWID |
|* 10 |         INDEX RANGE SCAN                  | I_TAB2_COL2  | 242  |       | 3507   (3) | 00:00:43 |       |       |
| 11  |        TABLE ACCESS BY GLOBAL INDEX ROWID | TAB3         | 1    | 206   | 3      (0) | 00:00:01 | ROWID | ROWID |
|* 12 |         INDEX RANGE SCAN                  | PK_TAB3_COL1 | 1    |       | 2      (0) | 00:00:01 |       |       |
-------------------------------------------------------------------------------------------------------------------------

2) Obtain outline hints for SQL

Once you perform explain plan for the SQL, you can use FORMAT=>’ADVANCED’ to get the neccessary outline hints

select * from table(dbms_xplan.display(format=>’advanced’));

/*+
 BEGIN_OUTLINE_DATA
 USE_NL(@"SEL$54D64B3C" "COMPADDRES3_"@"SEL$2")
 USE_NL(@"SEL$54D64B3C" "THIS_"@"SEL$2")
 LEADING(@"SEL$54D64B3C" "COMPSHIPME1_"@"SEL$3" "THIS_"@"SEL$2" "COMPADDRES3_"@"SEL$2")
 INDEX_RS_ASC(@"SEL$54D64B3C" "COMPADDRES3_"@"SEL$2" ("TAB2"."COL2"))
 INDEX_RS_ASC(@"SEL$54D64B3C" "THIS_"@"SEL$2" ("TAB1"."COL1" "TAB1"."COL2" "TAB1"."COL3" "TAB1"."COL4"))
 INDEX_RS_ASC(@"SEL$54D64B3C" "COMPSHIPME1_"@"SEL$3" ("TAB3"."COL1"))
 NO_ACCESS(@"SEL$1" "from$_subquery$_001"@"SEL$1")
 OUTLINE(@"SEL$2")
 OUTLINE(@"SEL$3")
 MERGE(@"SEL$2")
 OUTLINE(@"SEL$64EAE176")
 OUTLINE(@"SEL$4")
 OUTLINE_LEAF(@"SEL$1")
 MERGE(@"SEL$64EAE176")
 OUTLINE_LEAF(@"SEL$54D64B3C")
 ALL_ROWS
 OPT_PARAM('_optimizer_extended_cursor_sharing_rel' 'none')
 DB_VERSION('11.2.0.4')
 OPTIMIZER_FEATURES_ENABLE('11.2.0.4')
 IGNORE_OPTIM_EMBEDDED_HINTS
 END_OUTLINE_DATA
 */

Step 3) Above outline data is useful for creating profile for query running on ADG

SQL Profile have to created on primary as we cannot create SQL profile on ADG (read only database).
Previous method described in”Fixing SQL Plans: The hard way – Part 1” uses the SQL ID as input and queries V$SQLTEXT or DBA_HIST* tables to get the required SQL text. This SQL text is then supplied to DBMS_SQLTUNE.IMPORT_SQL_PROFILE procedure.

But since we don’t have SQL text in primary, we can directly supply SQL text to DBMS_SQLTUNE.IMPORT_SQL_PROFILE procedure as shown below.

Note – If you have single quotes in your script, you need to change that to 2 single quotes. 1st single quote works as escape character

In my SQL text, I have a single quote literal. So I have to enclose that to 2 single quotes as shown below

select * from ( 
       select this_.col1 as col1,
              this_.col2 as col2, 
              this_.col3 as col3, 
              this_.col4 as col4
       from TAB1 this_ 
       inner join TAB2 compaddres3_ on this_.col2 = compaddres3_.col2 
       inner join TAB3 compshipme1_ on this_.col1=compshipme1_.col1
       where this_.col1 in (:1) 
       and   this_.col2 in (:2 , :3) 
       and   compshipme1_.col1 in (:4 )
       and   this_.col8 = ''OPEN''
       and   this_.col5>=:5 
       and   this_.col5<=:6 
       order by this_.col1 asc ) 
where rownum <= :7

Above SQL will be given as input to DBMS_SQLTUNE.IMPORT_SQL_PROFILE procedure.
Similarly, if we have single quotation in outline hints that we extracted, we need to change that to 2 single quotes as shown below.

Also, we need to mark every outline hint in single quotation.

'BEGIN_OUTLINE_DATA',
'USE_NL(@"SEL$54D64B3C" "COMPADDRES3_"@"SEL$2")',
'USE_NL(@"SEL$54D64B3C" "THIS_"@"SEL$2")',
'LEADING(@"SEL$54D64B3C" "COMPSHIPME1_"@"SEL$3" "THIS_"@"SEL$2" "COMPADDRES3_"@"SEL$2")',
'INDEX_RS_ASC(@"SEL$54D64B3C" "COMPADDRES3_"@"SEL$2" ("TAB2"."COL2"))',
'INDEX_RS_ASC(@"SEL$54D64B3C" "THIS_"@"SEL$2" ("TAB1"."COL1" "TAB1"."COL2" "TAB1"."COL3" "TAB1"."COL4"))',
'INDEX_RS_ASC(@"SEL$54D64B3C" "COMPSHIPME1_"@"SEL$3" ("TAB3"."COL1"))',
'NO_ACCESS(@"SEL$1" "from$_subquery$_001"@"SEL$1")',
'OUTLINE(@"SEL$2")',
'OUTLINE(@"SEL$3")',
'MERGE(@"SEL$2")',
'OUTLINE(@"SEL$64EAE176")',
'OUTLINE(@"SEL$4")',
'OUTLINE_LEAF(@"SEL$1")',
'MERGE(@"SEL$64EAE176")',
'OUTLINE_LEAF(@"SEL$54D64B3C")',
'ALL_ROWS',
'OPT_PARAM(''_optimizer_extended_cursor_sharing_rel'' ''none'')',
'DB_VERSION(''11.2.0.4'')',
'OPTIMIZER_FEATURES_ENABLE(''11.2.0.4'')',
'IGNORE_OPTIM_EMBEDDED_HINTS',
'END_OUTLINE_DATA'

Note that we have enclosed the hints in single quotes and also put a comma at the end of every hint except the last hint.

So we have required outline hints and SQL text that should be provided as input to DBMS_SQLTUNE.IMPORT_SQL_PROFILE procedure

Step 4) Creating required profile on primary

Below shows the code for creating SQL profile on primary after providing required inputs.

declare
ar_profile_hints sys.sqlprof_attr;
begin
  ar_profile_hints := sys.sqlprof_attr(
     'BEGIN_OUTLINE_DATA',
     'USE_NL(@"SEL$54D64B3C" "COMPADDRES3_"@"SEL$2")',
     'USE_NL(@"SEL$54D64B3C" "THIS_"@"SEL$2")',
     'LEADING(@"SEL$54D64B3C" "COMPSHIPME1_"@"SEL$3" "THIS_"@"SEL$2" "COMPADDRES3_"@"SEL$2")',
     'INDEX_RS_ASC(@"SEL$54D64B3C" "COMPADDRES3_"@"SEL$2" ("TAB2"."COL2"))',
     'INDEX_RS_ASC(@"SEL$54D64B3C" "THIS_"@"SEL$2" ("TAB1"."COL1" "TAB1"."COL2" "TAB1"."COL3" "TAB1"."COL4"))',
     'INDEX_RS_ASC(@"SEL$54D64B3C" "COMPSHIPME1_"@"SEL$3" ("TAB3"."COL1"))',
     'NO_ACCESS(@"SEL$1" "from$_subquery$_001"@"SEL$1")',
     'OUTLINE(@"SEL$2")',
     'OUTLINE(@"SEL$3")',
     'MERGE(@"SEL$2")',
     'OUTLINE(@"SEL$64EAE176")',
     'OUTLINE(@"SEL$4")',
     'OUTLINE_LEAF(@"SEL$1")',
     'MERGE(@"SEL$64EAE176")',
     'OUTLINE_LEAF(@"SEL$54D64B3C")',
     'ALL_ROWS',
     'OPT_PARAM(''_optimizer_extended_cursor_sharing_rel'' ''none'')',
     'DB_VERSION(''11.2.0.4'')',
     'OPTIMIZER_FEATURES_ENABLE(''11.2.0.4'')',
     'IGNORE_OPTIM_EMBEDDED_HINTS',
     'END_OUTLINE_DATA'
 );
 DBMS_SQLTUNE.IMPORT_SQL_PROFILE(
 sql_text => 'select * from ( 
                   select /*+ index(tab1 I_TAB1_IDX1) */ this_.col1 as col1,
                          this_.col2 as col2, 
                          this_.col3 as col3, 
                          this_.col4 as col4
                   from TAB1 this_ 
                   inner join TAB2 compaddres3_ on this_.col2 = compaddres3_.col2 
                   inner join TAB3 compshipme1_ on this_.col1=compshipme1_.col1
                   where this_.col1 in (:1) 
                   and   this_.col2 in (:2 , :3) 
                   and   compshipme1_.col1 in (:4 )
                   and   this_.col8 = ''OPEN''
                   and   this_.col5>=:5 
                   and   this_.col5<=:6 
                   order by this_.col1 asc ) 
             where rownum <= :7',
 profile => ar_profile_hints,name => 'PROFILE_d7w7fruzwyh2s'
 );
end;
/

You can run above code on primary to create required profile.
Once you create profile, purge the SQL on ADG using sys.dbms_shared_pool.purge procedure.

When the SQL runs next time on ADG, it will automatically pick the above created profile and plan will be the one we tested before by including hints.

I was able to see following in the NOTE section when I did DBMS_XPLAN.DISPLAY_CURSOR on ADG

Note
-----
   - SQL profile PROFILE_d7w7fruzwyh2s used for this statement

Hope this helps !!

Reference

Fixing SQL Plans: The hard way – Part 1

Fixing SQL Plans: The hard way – Part 2

Fixing SQL Plans: The hard way – Part 3

 

Using UDev to configure ASM disks

This is a small article on using UDev on RHEL 7. I have a virtual box with RHEL 7 and I am configuring ASM diskgroups.

Before we configure ASM diskgroups, we need to have disks/partitions available at OS level and those should be recognized by ASM. Oracle provides a utility called ASMLib which can be installed and used very easily to configure disk/partitions at OS level.

ASMLib stamps the header of partitions/disk at OS level with Oracle format making them easily detected by ASM instance doing disk discovery. You can go through ASMLib installation, configuration and usage in an article write by my friend Anand Prakashhttps://aprakash.wordpress.com/2016/05/19/oracle-asmlib

In this article we will focus on using UDev for setting up ASM disks.

Let’s start with some information on UDev

What is Udev?

Udev is the device manager for the Linux 2.6 kernel that creates/removes device nodes in the /dev directory dynamically. It is the successor of devfs and hotplug. It runs in userspace and the user can change device names using Udev rules.

Why Do We Need It ?

In the older kernels, the /dev directory contained statics device files. But with dynamic device creation, device nodes for only those devices which are actually present in the system are created. Let us see the disadvantages of the static /dev directory, which led to the development of Udev.

Problems Identifying the Exact Hardware Device for a Device Node in /dev

The kernel will assign a major/minor number pair when it detects a hardware device while booting the system. Let us consider two hard disks. The connection/alignment is in such a way that one is connected as a master and the other, as a slave. The Linux system will call them, /dev/hda and /dev/hdb. Now, if we interchange the disks the device name will change. This makes it difficult to identify the correct device that is related to the available static device node. The condition gets worse when there are a bunch of hard disks connected to the system.

Udev provides a persistent device naming system through the /dev directory, making it easier to identify the device.

Huge Number of Device Nodes in /dev

In the static model of device node creation, no method was available to identify the hardware devices actually present in the system. So, device nodes were created for all the devices that Linux was known to support at the time. The huge mess of device nodes in /dev made it difficult to identify the devices actually present in the system.

Not Enough Major/Minor Number Pairs

The number of static device nodes to be included increased a lot in recent times and the 8-bit scheme, that was used, proved to be insufficient for handling all the devices. As a result the major/minor number pairs started running out.

Working of Udev

The Udev daemon listens to the netlink socket that the kernel uses for communicating with user space applications. The kernel will send a bunch of data through the netlink socket when a device is added to, or removed from a system. The Udev daemon catches all this data and will do the rest, i.e., device node creation, module loading etc.

Kernel Device Event Management

  • When bootup is initialized, the /dev directory is mounted in tmpfs.
  • After that, Udev will copy the static device nodes from /lib/udev/devices to the /dev directory.
  • The Udev daemon then runs and collects uevents from the kernel, for all the devices connected to the system.
  • The Udev daemon will parse the uevent data and it will match the data with the rules specified in /etc/udev/rules.d.
  • It will create the device nodes and symbolic links for the devices as specified in the rules.
    The Udev daemon reads the rules from /etc/udev/rules.d/*.rules and stores them in the memory.
  • Udev will receive an inotify event, if any rules were changed. It will read the changes and will update the memory.

Let’s start by creating disks at OS level. VirtualBox has commands which creates block devices at OS level.

Using Udev to configure disks

Step 1) Creating Block devices at OS level

I am creating 4 disks, each of 10GB using following command

VBoxManage createhd --filename asm1.vdi --size 10240 --format VDI --variant Fixed
VBoxManage createhd --filename asm2.vdi --size 10240 --format VDI --variant Fixed
VBoxManage createhd --filename asm3.vdi --size 10240 --format VDI --variant Fixed
VBoxManage createhd --filename asm4.vdi --size 10240 --format VDI --variant Fixed

Step 2) Attaching the disk to the correct storage

VBoxManage storageattach 12102 --storagectl "SATA" --port 1 --device 0 --type hdd --medium asm1.vdi --mtype shareable
VBoxManage storageattach 12102 --storagectl "SATA" --port 2 --device 0 --type hdd --medium asm2.vdi --mtype shareable
VBoxManage storageattach 12102 --storagectl "SATA" --port 3 --device 0 --type hdd --medium asm3.vdi --mtype shareable
VBoxManage storageattach 12102 --storagectl "SATA" --port 4 --device 0 --type hdd --medium asm4.vdi --mtype shareable

12102 is the name of virtual machine. You can see the virtual machine name as shown in below screen shot.

vbox

Step 3) (optional) Making disk sharable

This is optional step and is required only if you are using ASM cluster. In that case ASM disks should be made shareable as should be attached to all machines in cluster. You can use following commands to make the disks sharable

VBoxManage modifyhd asm1.vdi --type shareable
VBoxManage modifyhd asm2.vdi --type shareable
VBoxManage modifyhd asm3.vdi --type shareable
VBoxManage modifyhd asm4.vdi --type shareable

Step 4) Check if the disks are now shown in your virtual box

Following screen shot shows that disks are now added to correct virtual machine

vbox copy

Also, after you login to virtual box, you should be able to see new devices under /dev directory

[root@advait ~]# ls -l /dev/sd*
brw-rw---- 1 root disk 8, 0 Jul 18 05:36 /dev/sda
brw-rw---- 1 root disk 8, 1 Jul 18 05:36 /dev/sda1
brw-rw---- 1 root disk 8, 2 Jul 18 05:36 /dev/sda2
brw-rw---- 1 root disk 8, 16 Jul 18 05:36 /dev/sdb
brw-rw---- 1 root disk 8, 32 Jul 18 05:36 /dev/sdc
brw-rw---- 1 root disk 8, 48 Jul 18 05:36 /dev/sdd
brw-rw---- 1 root disk 8, 64 Jul 18 05:36 /dev/sde

/dev/sda, /dev/sda1 and /dev/sda2 are partitions of main device used for virtual box.
/dev/sdb, /dev/sdc, /dev/sdd and /dev/sde are the devices we added to virtual box as above.

Step 5) Format the new devices and create partitions

You need to create new partitions using fdisk utility. I am showing the command for one partition, you can repeate the same for remaining partitions.

[root@advait ~]# fdisk /dev/sdb
Welcome to fdisk (util-linux 2.23.2).

Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

Device does not contain a recognized partition table
Building a new DOS disklabel with disk identifier 0xbda01838.

Command (m for help): n
Partition type:
 p primary (0 primary, 0 extended, 4 free)
 e extended
Select (default p): p
Partition number (1-4, default 1): 1
First sector (2048-20971519, default 2048): 
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-20971519, default 20971519): 
Using default value 20971519
Partition 1 of type Linux and of size 10 GiB is set

Command (m for help): p

Disk /dev/sdb: 10.7 GB, 10737418240 bytes, 20971520 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk label type: dos
Disk identifier: 0xbda01838

Device Boot Start End Blocks Id System
/dev/sdb1 2048 20971519 10484736 83 Linux

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

This will create single partition for each of the devices.

[root@advait ~]# ls -rlt /dev/sd*1
brw-rw---- 1 root disk 8, 1 Jul 18 05:36 /dev/sda1
brw-rw---- 1 root disk 8, 17 Jul 18 05:40 /dev/sdb1
brw-rw---- 1 root disk 8, 33 Jul 18 05:40 /dev/sdc1
brw-rw---- 1 root disk 8, 49 Jul 18 05:41 /dev/sdd1
brw-rw---- 1 root disk 8, 65 Jul 18 05:41 /dev/sde1

Step 5) Configure UDev rules

So before creating UDev rule, we need to understand what exactly we want to do. We want to create alias for each of the disk that we created at OS level so that it is always identified the same way, regardless of the device name Linux assigns it. This can be done by recognizing each device based on some unique IDs and assining the alias to that device. UDev can do just that.

Each disk has a unique SCSI ID. We can use this unique ID to recognize the disk and assign the required alias to that disk.
We can get unique SCSI_ID using following command

[root@advait ~]# /usr/lib/udev/scsi_id -g -u -d /dev/sdb1
1ATA_VBOX_HARDDISK_VB544d069c-abd3901e

scsi_id command is located in /usr/lib/udev/ directory on RHEL 7. But in previous release this used to be in /sbin/ location.

Like wise we can find SCSI_ID for each disk that we added.

Rules are defined in “/etc/udev/rules.d” directory. Udev reads these rules and apply them to devices listed in /dev directory.

Rules looks like below

KERNEL=="sd?1", SUBSYSTEM=="block", -
PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", -
RESULT=="1ATA_VBOX_HARDDISK_VB88ee7178-f28aa887", -
SYMLINK+="asm-disk1", OWNER="oracle", GROUP="dba", MODE="0660"

Following is the explanation of each parameter

  • KERNEL==”sd?1″ – This matches the kernel name of the device. In our case all our partitions are having names as sd?1 (sdb1, sdc1 etc). So this match key matches the kernel name of the devices
  • SUBSYSTEM==”block” – This match key matches the subsystem of the devices. SUBSYSTEM could be block, scsi, usb etc. We have all block devices.
  • PROGRAM==”/usr/lib/udev/scsi_id -g -u -d /dev/$parent” – This will get the unique SCSI_ID for the device searched by first 2 match parameters (KERNEL and SUBSYSTEM)
  • RESULT==”1ATA_VBOX_HARDDISK_VB544d069c-abd3901e” – This will match the output of PROGRAM command with RESULT. If the result matches, then further action will be taken
  • SYMLINK+=”asm-disk1″ – This parameter is part of action key. If PROGRAM output matches RESULT, then a SYMLINK will be created, which is named asm-disk1 and will be pointing to the device in question.
  • OWNER=”oracle” – This parameter is also part of action. This will change the ownership of device to oracle user
  • GROUP=”dba” – This parameter changes the group of device to dba group
  • MODE=”0660″ – This changes the permission of device file to 0660

So above rule means that the device pointing to the partition “sd*1” on the disk with the SCSI ID of “1ATA_VBOX_HARDDISK_VB88ee7178-f28aa887” will always be referred with symlink “/dev/asm-disk1” pointing to the device, regardless of the letter “?” Linux assigns when the device is discovered. In addition, the device will have the correct ownership and permissions for ASM.

We can create such rule for each of the device or if number of devices are huge, we can use wildcard and matching patterns to create more intelligent rules to search and take required actions.

I have created following rules and we can create a new rule file /etc/udev/rules.d/99-oracle-asm-disks.rules and put following rules, one for each device

KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="1ATA_VBOX_HARDDISK_VB544d069c-abd3901e", SYMLINK+="asm-disk1", OWNER="oracle", GROUP="dba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="1ATA_VBOX_HARDDISK_VB9a630cc5-d9697727", SYMLINK+="asm-disk2", OWNER="oracle", GROUP="dba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="1ATA_VBOX_HARDDISK_VB446fdf92-8640efff", SYMLINK+="asm-disk3", OWNER="oracle", GROUP="dba", MODE="0660"
KERNEL=="sd?1", SUBSYSTEM=="block", PROGRAM=="/usr/lib/udev/scsi_id -g -u -d /dev/$parent", RESULT=="1ATA_VBOX_HARDDISK_VB3a71b4f2-8c603b78", SYMLINK+="asm-disk4", OWNER="oracle", GROUP="dba", MODE="0660"

Step 6) Load updated block device partition tables

You can use partprobe to load the partition tables for block devices

/sbin/partprobe /dev/sdb1
/sbin/partprobe /dev/sdc1
/sbin/partprobe /dev/sdd1
/sbin/partprobe /dev/sde1

Step 7) Test the rules

This is optional step to check if the rules are working as expected. You can run following commands to test the rules

udevadm test /block/sdb/sdb1
udevadm test /block/sdb/sdc1
udevadm test /block/sdb/sdd1
udevadm test /block/sdb/sde1

The output for one of the above command looks like following

calling: test
version 219
This program is for debugging only, it does not run any program
specified by a RUN key. It may show incorrect results, because
some values may be different, or not available at a simulation run.

=== trie on-disk ===
tool version: 219
file size: 6984832 bytes
header size 80 bytes
strings 1805856 bytes
nodes 5178896 bytes
Load module index
Created link configuration context.
timestamp of '/etc/udev/rules.d' changed
Reading rules file: /usr/lib/udev/rules.d/10-dm.rules
Reading rules file: /usr/lib/udev/rules.d/100-balloon.rules
Reading rules file: /usr/lib/udev/rules.d/13-dm-disk.rules
Reading rules file: /usr/lib/udev/rules.d/40-redhat.rules
Reading rules file: /usr/lib/udev/rules.d/42-usb-hid-pm.rules
Reading rules file: /usr/lib/udev/rules.d/50-udev-default.rules
Reading rules file: /usr/lib/udev/rules.d/60-alias-kmsg.rules
Reading rules file: /usr/lib/udev/rules.d/60-cdrom_id.rules
Reading rules file: /usr/lib/udev/rules.d/60-drm.rules
Reading rules file: /usr/lib/udev/rules.d/60-keyboard.rules
Reading rules file: /usr/lib/udev/rules.d/60-net.rules
Reading rules file: /usr/lib/udev/rules.d/60-persistent-alsa.rules
Reading rules file: /usr/lib/udev/rules.d/60-persistent-input.rules
Reading rules file: /usr/lib/udev/rules.d/60-persistent-serial.rules
Reading rules file: /usr/lib/udev/rules.d/60-persistent-storage-tape.rules
Reading rules file: /usr/lib/udev/rules.d/60-persistent-storage.rules
Reading rules file: /usr/lib/udev/rules.d/60-persistent-v4l.rules
Reading rules file: /usr/lib/udev/rules.d/60-raw.rules
Reading rules file: /usr/lib/udev/rules.d/61-accelerometer.rules
Reading rules file: /usr/lib/udev/rules.d/64-btrfs.rules
Reading rules file: /usr/lib/udev/rules.d/70-mouse.rules
Reading rules file: /usr/lib/udev/rules.d/70-power-switch.rules
Reading rules file: /usr/lib/udev/rules.d/70-touchpad.rules
Reading rules file: /usr/lib/udev/rules.d/70-uaccess.rules
Reading rules file: /usr/lib/udev/rules.d/71-biosdevname.rules
Reading rules file: /usr/lib/udev/rules.d/71-seat.rules
Reading rules file: /usr/lib/udev/rules.d/73-idrac.rules
Reading rules file: /usr/lib/udev/rules.d/73-seat-late.rules
Reading rules file: /usr/lib/udev/rules.d/75-net-description.rules
Reading rules file: /usr/lib/udev/rules.d/75-probe_mtd.rules
Reading rules file: /usr/lib/udev/rules.d/75-tty-description.rules
Reading rules file: /usr/lib/udev/rules.d/78-sound-card.rules
Reading rules file: /usr/lib/udev/rules.d/80-drivers.rules
Reading rules file: /usr/lib/udev/rules.d/80-net-name-slot.rules
Reading rules file: /usr/lib/udev/rules.d/80-net-setup-link.rules
Reading rules file: /usr/lib/udev/rules.d/81-kvm-rhel.rules
Reading rules file: /usr/lib/udev/rules.d/85-nm-unmanaged.rules
Reading rules file: /usr/lib/udev/rules.d/90-alsa-tools-firmware.rules
Reading rules file: /usr/lib/udev/rules.d/90-iprutils.rules
Reading rules file: /usr/lib/udev/rules.d/90-vconsole.rules
Reading rules file: /usr/lib/udev/rules.d/91-drm-modeset.rules
Reading rules file: /usr/lib/udev/rules.d/95-dm-notify.rules
Reading rules file: /usr/lib/udev/rules.d/95-udev-late.rules
Reading rules file: /usr/lib/udev/rules.d/98-kexec.rules
Reading rules file: /usr/lib/udev/rules.d/98-rdma.rules
Reading rules file: /etc/udev/rules.d/99-oracle-asm-disks.rules
Reading rules file: /usr/lib/udev/rules.d/99-systemd.rules
rules contain 24576 bytes tokens (2048 * 12 bytes), 12216 bytes strings
1803 strings (22584 bytes), 1184 de-duplicated (10988 bytes), 620 trie nodes used
GROUP 6 /usr/lib/udev/rules.d/50-udev-default.rules:52
LINK 'disk/by-id/ata-VBOX_HARDDISK_VB544d069c-abd3901e-part1' /usr/lib/udev/rules.d/60-persistent-storage.rules:43
IMPORT builtin 'blkid' /usr/lib/udev/rules.d/60-persistent-storage.rules:72
probe /dev/sdb1 raid offset=0
PROGRAM '/usr/lib/udev/scsi_id -g -u -d /dev/sdb' /etc/udev/rules.d/99-oracle-asm-disks.rules:1
starting '/usr/lib/udev/scsi_id -g -u -d /dev/sdb'
'/usr/lib/udev/scsi_id -g -u -d /dev/sdb'(out) '1ATA_VBOX_HARDDISK_VB544d069c-abd3901e'
'/usr/lib/udev/scsi_id -g -u -d /dev/sdb' [10122] exit with return code 0
OWNER 54321 /etc/udev/rules.d/99-oracle-asm-disks.rules:1
GROUP 54322 /etc/udev/rules.d/99-oracle-asm-disks.rules:1
MODE 0660 /etc/udev/rules.d/99-oracle-asm-disks.rules:1
LINK 'asm-disk1' /etc/udev/rules.d/99-oracle-asm-disks.rules:1
PROGRAM '/usr/lib/udev/scsi_id -g -u -d /dev/sdb' /etc/udev/rules.d/99-oracle-asm-disks.rules:2
starting '/usr/lib/udev/scsi_id -g -u -d /dev/sdb'
'/usr/lib/udev/scsi_id -g -u -d /dev/sdb'(out) '1ATA_VBOX_HARDDISK_VB544d069c-abd3901e'
'/usr/lib/udev/scsi_id -g -u -d /dev/sdb' [10123] exit with return code 0
PROGRAM '/usr/lib/udev/scsi_id -g -u -d /dev/sdb' /etc/udev/rules.d/99-oracle-asm-disks.rules:3
starting '/usr/lib/udev/scsi_id -g -u -d /dev/sdb'
'/usr/lib/udev/scsi_id -g -u -d /dev/sdb'(out) '1ATA_VBOX_HARDDISK_VB544d069c-abd3901e'
'/usr/lib/udev/scsi_id -g -u -d /dev/sdb' [10124] exit with return code 0
PROGRAM '/usr/lib/udev/scsi_id -g -u -d /dev/sdb' /etc/udev/rules.d/99-oracle-asm-disks.rules:4
starting '/usr/lib/udev/scsi_id -g -u -d /dev/sdb'
'/usr/lib/udev/scsi_id -g -u -d /dev/sdb'(out) '1ATA_VBOX_HARDDISK_VB544d069c-abd3901e'
'/usr/lib/udev/scsi_id -g -u -d /dev/sdb' [10125] exit with return code 0
handling device node '/dev/sdb1', devnum=b8:17, mode=0660, uid=54321, gid=54322
preserve permissions /dev/sdb1, 060660, uid=54321, gid=54322
preserve already existing symlink '/dev/block/8:17' to '../sdb1'
found 'b8:17' claiming '/run/udev/links/\x2fasm-disk1'
creating link '/dev/asm-disk1' to '/dev/sdb1'
preserve already existing symlink '/dev/asm-disk1' to 'sdb1'
found 'b8:17' claiming '/run/udev/links/\x2fdisk\x2fby-id\x2fata-VBOX_HARDDISK_VB544d069c-abd3901e-part1'
creating link '/dev/disk/by-id/ata-VBOX_HARDDISK_VB544d069c-abd3901e-part1' to '/dev/sdb1'
preserve already existing symlink '/dev/disk/by-id/ata-VBOX_HARDDISK_VB544d069c-abd3901e-part1' to '../../sdb1'
created db file '/run/udev/data/b8:17' for '/block/sdb/sdb1'
ACTION=add
DEVLINKS=/dev/asm-disk1 /dev/disk/by-id/ata-VBOX_HARDDISK_VB544d069c-abd3901e-part1
DEVNAME=/dev/sdb1
DEVPATH=/block/sdb/sdb1
DEVTYPE=partition
ID_ATA=1
ID_ATA_FEATURE_SET_PM=1
ID_ATA_FEATURE_SET_PM_ENABLED=1
ID_ATA_SATA=1
ID_ATA_SATA_SIGNAL_RATE_GEN2=1
ID_ATA_WRITE_CACHE=1
ID_ATA_WRITE_CACHE_ENABLED=1
ID_BUS=ata
ID_MODEL=VBOX_HARDDISK
ID_MODEL_ENC=VBOX\x20HARDDISK\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20
ID_PART_ENTRY_DISK=8:16
ID_PART_ENTRY_NUMBER=1
ID_PART_ENTRY_OFFSET=2048
ID_PART_ENTRY_SCHEME=dos
ID_PART_ENTRY_SIZE=20969472
ID_PART_ENTRY_TYPE=0x83
ID_PART_TABLE_TYPE=dos
ID_REVISION=1.0
ID_SERIAL=VBOX_HARDDISK_VB544d069c-abd3901e
ID_SERIAL_SHORT=VB544d069c-abd3901e
ID_TYPE=disk
MAJOR=8
MINOR=17
SUBSYSTEM=block
TAGS=:systemd:
USEC_INITIALIZED=404668
Unload module index
Unloaded link configuration context.

Step 8) Reload rules of udev

Use following command to reload the rules

udevadm control --reload-rules

Check if the required symlinks and other actions are performed on the devices or not

[root@advait ~]# ls -rlt /dev/asm*
lrwxrwxrwx 1 root root 4 Jul 18 06:20 /dev/asm-disk2 -> sdc1
lrwxrwxrwx 1 root root 4 Jul 18 06:20 /dev/asm-disk3 -> sdd1
lrwxrwxrwx 1 root root 4 Jul 18 06:20 /dev/asm-disk4 -> sde1
lrwxrwxrwx 1 root root 4 Jul 18 06:20 /dev/asm-disk1 -> sdb1

Above symlinks are owned by root, but devices will be owned by oracle:dba

[root@advait ~]# ls -rlt /dev/sd?1
brw-rw---- 1 root disk 8, 1 Jul 18 05:36 /dev/sda1
brw-rw---- 1 oracle dba 8, 33 Jul 18 06:20 /dev/sdc1
brw-rw---- 1 oracle dba 8, 49 Jul 18 06:20 /dev/sdd1
brw-rw---- 1 oracle dba 8, 65 Jul 18 06:20 /dev/sde1
brw-rw---- 1 oracle dba 8, 17 Jul 18 06:20 /dev/sdb1

Now ASM can identify the disks as /dev/asm-disk* and these symlinks will persist with host reboot.

Hope this helps !!

References:

https://www.linux.com/news/udev-introduction-device-management-modern-linux-system
http://www.reactivated.net/writing_udev_rules.html

Oracle 12c Learning Series: Automatic Data Optimization – ADO

I am starting Oracle 12c learning series, where I am planning to publish multiple articles on Oracle 12c new features. I hope these articles will be helpful to DBAs aspiring for OCP 12c certification and also others who are planning to implement these features in there databases.

This is a very long post about ADO – Automatic Data Optimization, which is one of the ILM strategy to manage aging data. So please be patient while reading this article. This article provides end-to-end details about ADO implementation, commands syntax etc.

Typical lifecycle of data

Lifecycle of data starts with insert statements. New data is inserted into the table. At this stage data is either kept in non-compressed form or is compressed by DBA. But compression for such active data is optimized for DMLs (example OLTP table compression). Compression has is less and has minimal impact on DMLs.
After a month, activity has subsided, although significant OLTP transactions are still carried out. At this stage, data in former most active partition stays in OLTP table compressed format and new partition is automatically created.

Two months down the line data is rarely modified and accessed rarely as well. At this stage, partition can be moved to low cost storage tier and at higher compression level like hybrid columnar compression (HCC).

After 1 year, data is considered dormant and is no longer accessed or updated. At this stage, data can be moved to low cost storage tier with highest level of HCC compression and may also be marked as read-only.

How (ADO) Automatic Data Optimization works

Automatic data optimization (ADO) is an ILM strategy which uses activity tracking by using heat map to determine if the data is still getting accessed on not.

Heat map provides ability to track and mark data as it goes through lifecycle changes. Heat map can track data at following level

  • Data accesses at segment level
  • Data modification at segment level and block level

These collected heat map statistics are stored in SYSAUX tablespace. These statistics provide “heat map” of hot and cold data based on how frequent the data is accessed or modified.

ADO allows you to create policies that uses heat map statistics to compress and move data when necessary. ADO automatically evaluates and executes policies that perform compression and storage tiring actions.

Following are the outline steps which can be used to implement and use ADO policies

1)    Enable heat map at database level. Heap map tracks data access at segment level and data modification at segment level and row level

2)    Create ADO policies.

You can create ADO policies at different scopes/level

a)    Tablespace level

b)    Group level

c)     Segment level

d)    Row level

You can also create ADO policies based on different operations to track.

a)    Creation

b)    Access

c)     Modification

You can also provide condition when this policy takes effect

a)    Based on time period. Ex: After 3 days, After 1 week, After 1 year

b)    Tablespace usage above threshold

You can also specify different kind of action for ADO policy

a)    Compression

b)    Moving to different storage tier

c)     Both compression + move

3)    Evaluating and executing ADO policies. By default, segment level ADO policies are evaluated every day during maintenance window. DBA can also create custom schedule for evaluating ADO policies. Row level policies are evaluated by MMON every 15 mins.

4)    Verify ADO execution details by using DBA_ILMEVALUATIONDETAILS and DBA_ILMRESULTS view.

5)    Verify if segment is compressed or moved based on the action provided in executed policy.

We are going into the details of all steps above and see how we can enable ADO policies as per our business requirements

Step 1) Enabling Heat Map

Enabling and disabling heat map

The very first step before you can start using ADO is to enable activity tracking, which you can enable by setting the new initialization parameter HEAT_MAP

SQL> alter system set heat_map = on;

System altered.

Above parameter will enable activity tracking for entire instance. Enabling heat map will enable tracking both DML and access at segment level and store these details in relevant table in SYSAUX tablespace. This will not track any access details of segments in SYSTEM and SYSAUX tablespace.

*** Important thing to note here is that heat_map does not work with CDB databases. It works only for non-CDB databases. So you cannot use this feature with CDB databases

You can also enable heat map at session level using following command

SQL> alter session set heat_map = on;

Session altered.

You can turn off heat map by setting the parameter to off as below

SQL> alter system set heat_map = off;

System altered.

Checking heat map details

Once you enable heap map, you can check various details of activity tracking provided by heat map. You can check details about when segments were accessed recently, was it row ID access or full scan access or when was the recent DML performed on the segment.

V$HEAT_MAP_SEGMENT view

Tracking details of segments are available at real time in V$HEAP_MAP_SEGMENT view. As soon as session accesses some objects in database, its heat map information will be available immediately in this view.

Example, I just accessed one of the demo table (SALES) in my test database after enabling heat map and I can see following info


SQL> select object_name, SUBOBJECT_NAME, TRACK_TIME, SEGMENT_WRITE, SEGMENT_READ, FULL_SCAN, LOOKUP_SCAN, con_id from v$heat_map_segment order by 2;

OBJECT_NAME          SUBOBJECT_ TRACK_TIM SEG SEG FUL LOO     CON_ID
-------------------- ---------- --------- --- --- --- --- ----------
SALES                P10        14-JUN-16 NO  NO  YES NO           0
SALES                P11        14-JUN-16 NO  NO  YES NO           0
SALES                P12        14-JUN-16 YES NO  YES NO           0
SALES                P2         14-JUN-16 NO  NO  YES NO           0
SALES                P3         14-JUN-16 NO  NO  YES NO           0
SALES                P4         14-JUN-16 NO  NO  YES NO           0
SALES                P5         14-JUN-16 NO  NO  YES NO           0
SALES                P6         14-JUN-16 NO  NO  YES NO           0
SALES                P7         14-JUN-16 NO  NO  YES NO           0
SALES                P8         14-JUN-16 NO  NO  YES NO           0
SALES                P9         14-JUN-16 NO  NO  YES NO           0
SALES                PM         14-JUN-16 NO  NO  YES NO           0
I_SALES_ITEM_YEAR               14-JUN-16 NO  NO  NO  YES          0
SALES_ITEM                      14-JUN-16 NO  NO  YES NO           0

14 rows selected.

If you see column FULL_SCAN is YES, it means the access was done using full scan.

If you see column SEGMENT_WRITE is YES, it means the modification was done to the table

TRACK_TIME is the timestamp when access or modification was done.

You can also see LOOKUP_SCAN column is YES for 1 row where object_name is the name of index. So if index is getting used in your select plan, it will show LOOKUP_SCAN as YES for such access.

SYS.HEAT_MAP_STAT$ and DBA_HEAT_MAP_SEG_HISTOGRAM

Data from V$HEAP_MAP_SEGMENT is persisted into SYS.HEAT_MAP_STAT$ table in SYSAUX tablespace by DBMS_SCHEDULER job at regular period of time. This data is then available via view DBA_HEAT_MAP_SEG_HISTOGRAM. Following show similar data when we query DBA_HEAT_MAP_SEG_HISTOGRAM

SQL> select object_name, subobject_name, track_time, SEGMENT_WRITE, full_scan, lookup_scan from DBA_HEAT_MAP_SEG_HISTOGRAM;

OBJECT_NAME          SUBOBJECT_ TRACK_TIM SEG FUL LOO
-------------------- ---------- --------- --- --- ---
SALES                P2         14-JUN-16 NO  YES NO
SALES                P3         14-JUN-16 NO  YES NO
SALES                P4         14-JUN-16 NO  YES NO
SALES                P5         14-JUN-16 NO  YES NO
SALES                P6         14-JUN-16 NO  YES NO
SALES                P7         14-JUN-16 NO  YES NO
SALES                P8         14-JUN-16 NO  YES NO
SALES                P9         14-JUN-16 NO  YES NO
SALES                P10        14-JUN-16 NO  YES NO
SALES                P11        14-JUN-16 NO  YES NO
SALES                P12        14-JUN-16 YES YES NO
SALES                PM         14-JUN-16 NO  YES NO
SALES_ITEM                      14-JUN-16 NO  YES NO
I_SALES_ITEM_YEAR               14-JUN-16 NO  NO  YES

14 rows selected.

DBA_HEATMAP_TOP_TABLESPACES

This view shows heat map information for top 100 tablespaces. It gives report at tablespace level instead of segment level. You can fine MIN, MAX and AVG read time, write time, full table access time and lookup access time for each tablespace.

DBMS_HEAT_MAP package

DBMS_HEAT_MAP packages can be used to get heat map statistics at even deeper level. The default access is only going to let you know heat map statistics at segment level. But if you want to go deeper, for example to check which blocks were accessed and at what time, you can use use subprograms in DBMS_HEAT_MAP package.

Following is a summary of the procedures and functions in DBMS_HEAT_MAP package:

  • BLOCK_HEAT_MAP function Returns the last modification time for each block in a table segment
  • EXTENT_HEAT_MAP function Returns the extent-level Heat Map statistics for a table segment
  • OBJECT_HEAT_MAP procedure Shows the minimum, maximum, and average access times for all segments belonging to an object
  • SEGMENT_HEAT_MAP procedure Shows the Heat Map attributes for a segment
  • TABLESPACE_HEAT_MAP procedure Shows minimum, maximum, and average access times for all segments in a tablespace

Example, we can use DBMS_HEAT_MAP.BLOCK_HEAT_MAP to get heat map statistics of a block of a segment

Similarly, DBMS_HEAT_MAP.EXTENT_HEAT_MAP function provide heat map statistics information at extent level.

Step 2) Enabling ADO policies

Let’s move to the 2nd step of implementing ILM. Now that we have enabled heat map in step 1, we need to create policies at various levels to manage our data. The policy we create will use the heat map statistical data and take the required action that we define in the policy. There are 4 sections in creating ADO policy

Level at which you can create ADO policies

  • Tablespace level – Default ADO policy can be defined at tablespace level and will be applicable to all the segments existing in that tablespace or new segments getting created in that tablespace.
  • Group level – When we define group level ADO policy, if a table is eligible for specific ADO action defined in policy, the same action would be performed on all dependent objects. If action is compress and table has LOB columns, all secureFile LOBs will be compressed as well.
  • Segment level – Applies to tables and table partitions. If no policy defined at segment level and we have a default policy at tablespace level, tablespace level policy applies to segment. But segment level policy overrides tablespace level policy.
  • Row level – Can be created only for tracking modifications. We cannot have row level policy for creation and access operations.

Operations to track

  • Creation – Action will be taken on specified time condition after creation of data
  • Access – Action will be taken on specified time condition after access of data
  • Modification – Action will be taken on specified time condition after modification of data

Condition when this policy takes effect

  • Based on time period. Ex: After 3 days, After 1 week, After 1 year
  • Tablespace usage above threshold

Action to take for ADO policy

  • Compression
  • Data Movement-  Moving to different storage tier
  • Both compression + move

You can create ILM policies during CREATE TABLE statement or you can add ILM policies on existing tables using ALTER TABLE statements.  A segment can have multiple ILM policies defined on it.

Before we look into different example of creating ILM policies let’s take a look at various compression level available. Data can be compressed while its being inserted, updated or loaded into a table via bulk load.

We have following 4 compression levels available

  • ROW STORE COMPRESS BASIC

This is basic level of compression and is used while inserting data into a table without using direct-path insert, using advanced compression option (ACO).

  • ROW STORE COMPRESS ADVANCED

This a renamed syntax for the previous OLTP table compression feature that was part of ACO. ROW STORE COMPRESS ADVANCED on heap table maps to LOW compression for SecureFile LOB segments when GROUP level ADO policy is implemented.

  • COLUMN STORE COMPRESS FOR QUERY LOW or HIGH

This provides Hybrid columnar compression (or HCC) and provides higher level compression then ROW STORE COMPRESS. It works well when load performance is critical and frequent queries are run against the data and no DMLs are expected. COLUMN STORE COMPRESS FOR QUERY LOW or HIGH maps to MEDIUM compression for SecureFile LOB segments when GROUP level ADO policy is implemented.

  • COLUMN STORE COMPRESS FOR ARCHIVE LOW or HIGH

This provides Hybrid columnar compression (or HCC) and provides highest level of compression. It works well when data is accessed very infrequently and no DMLs are run against data. COLUMN STORE COMPRESS FOR ARCHIVE LOW or HIGH maps to MEDIUM compression for SecureFile LOB segments when GROUP level ADO policy is implemented.

Let’s look at creating ADO polices for compression action and Data movement action

ADO policies for Compression action

Following are the different examples of creating ADO compression policies

Tablespace level compression policy

ALTER TABLESPACE USERS DEFAULT ILM ADD POLICY
ROW STORE COMPRESS ADVANCED
SEGMENT AFTER 30 DAYS OF NO ACCESS

In the above colorful command, we can see various sections required in policy implementation. TABLESPACE in purple color tells that this is the default policy at tablespace level and applicable to segments in that tablespace as represented by SEGMENT keyword in blue.

  • ILM ADD POLICY – This is required to add a policy to tablespace or segment
  • ROW STORE COMPRESS ADVANCED – This is the compression action level we are using as this is compression based ILM policy
  • SEGMENT – This represent that this policy is applicable to all segments in USERS tablespace.
  • AFTER 30 DAYS OF – This represent condition when this policy will be eligible to take required action of compression.
  • NO ACCESS – This represent operation to track. We have 3 different types of operations that we defined above – Access, modification, creation

So if alter tablespace USERS using above command and add ILM policy then if any segment in the tablespace (which has no other ILM policy set) is not accessed for more than 30 days then that segment will be compressed to ROW STORE COMPRESS ADVANCED

Group level compression policy

ALTER TABLE SALES ILM ADD POLICY
ROW STORE COMPRESS ADVANCED
GROUP AFTER 60 DAYS OF NO MODIFICATION

In the above example, we are implementing group level ILM policy. This policy will automatically compress the table SALES if no modification is done for 60 days. Since we are using GROUP level policy, dependent LOBs are compressed with LOW compression after 60 days of no modification.

ALTER TABLE SALES MODIFY PARTITION P1 ILM ADD POLICY
COLUMN STORE COMPRESS FOR ARCHIVE HIGH
GROUP AFTER 3 MONTHS OF CREATION

In this example, we are using GROUP level compression policy on a partition of a table. This policy will automatically compress PARTITION P1 after 3 months of creation and corresponding LOB will be compressed MEDIUM. This does highest level of compression using HCC. Global indexes are maintained

Segment level compression policy

ALTER TABLE SALES ILM ADD POLICY
COLUMN STORE COMPRESS FOR QUERY HIGH
SEGMENT AFTER 90 DAYS OF NO MODIFICATION

In this example, we are creating segment level compression policy on SALES table. This policy is going to automatically compress SALES table with COLUMN STORE COMPRESS FOR QUERY HIGH if it’s not modified for 90 days.

ALTER TABLE SALES ILM ADD POLICY
ROW STORE COMPRESS
SEGMENT AFTER 1 YEAR OF CREATION

This example implements basic level of compression (ROW STORE COMPRESS [BASIC]) on SALES table after 1 year of its creation. SALES table will be automatically compressed after 1 year of its creation

Row level compression policy

ALTER TABLE SALES ILM ADD POLICY
ROW STORE COMPRESS ADVANCED
ROW AFTER 30 DAYS OF NO MODIFICATION

Above policy will compress any block of the table SALES whose all rows are not modified from last 30 days. Even if 1 row in a block gets modified, it does not qualify for compression.

Important thing to note here is that row level compression polices can be created based on modification time only. They cannot be created based on creation time or access time. Also, only compression type available for row level compression policy is ROW STORE COMPRESS ADVANCED/BASIC. We cannot use columnar compression for row level policy.

ADO policies for Data movement action

This is another type of ADO policies that we can set at segment level only. We cannot set this at tablespace level or group level or any other level. Also, data movement happens at tablespace level. Meaning that we can move the table or partition to another tablespace based on low cost storage.

Data movement policy takes following format

ALTER TABLE <TABLE_NAME> ILM ADD POLICY
TIER TO <TABLESPACE_NAME>
[ CUSTOM ILM RULE | SEGMENT <CONDITION> <OPERATION> ]

TIER TO is a new clause and we specify which tablespace the segment should move if it satisfies he condition set. We can also provide custom ILM rules in the last section of the command instead of providing condition (example AFTER 30 DAYS, AFTER 1 YEAR etc) and operation (example CREATION, NO MODIFICATION, NO ACCESS etc).

Let’s take an example

In the above example, we replaced compression clause with “TIER TO” clause.

TIER TO – This clause provides tablespace name where segment should move after it satisfies the condition. In this example, ILM policy will automatically move SALES table to low_cost_storage tablespace after 60 days of creation.

If we don’t provide the last part of condition and operation, then policy will be created and data movement action will take place when tablespace is running out of space. Space threshold for tablespace is already set and we can see those values in DBA_ILMPARAMETERS table.

Following are tiring fullness threshold for tablespace

  • TBS PERCENT USED (default 85): Objects with tiring policy will be moved out of tablespace if its TBS PERCENT USED full (default 85% full)
  • TBS PERCENT FREE (default 25): Objects with tiring policy will continue to move out until source tablespace has TBS PERCENT FREE space (default 25% free).

You can see current values set by querying DBA_ILMPARAMETERS view. You can customize following parameters using DBMS_ILM_ADMIN package.


SQL> select * from dba_ilmparameters;

NAME                      VALUE
-------------------- ----------
ENABLED                       1
RETENTION TIME               30
JOB LIMIT                     2
EXECUTION MODE                2
EXECUTION INTERVAL            1
TBS PERCENT USED             90
TBS PERCENT FREE             15
POLICY TIME                   0

8 rows selected.

Example, if we want to change TBS PERCENT USER to 90% from existing 85%, we can use following procedure

SQL> exec dbms_ilm_admin.customize_ilm(dbms_ilm_admin.tbs_percent_used,90);

PL/SQL procedure successfully completed.

We will check out other procedures and functions in DBMS_ILM_ADMIN package at later section in this chapter.

So if we set following policy, then SALES table will be moved out of current tablespace to low_cost_storage tablespace when SALES tablespace reaches 90% of fullness

ALTER TABLE SALES ILM ADD POLICY
TIER TO low_cost_storage

Also, if we set this policy on many tables in the same tablespace and if tablespace reaches the fullness threshold, then tables will be moved to different tablespace in the order of oldest accessed tables first.

Last thing to know about data movement policy is that they are executed only once in life time. Since they apply only at segment level, once the policy is executed on the segment and required movement takes place, the policy is disabled for that segment. This rule applies only for data movement policy and not for compression policy.

CUSTOM_ILM_RULES

Instead of having condition and operation in the simplified syntax provided by Oracle, you can have more complex algorithm about when the desired action for ILM policy should take place by specifying custom ILM rule. The syntax for custom ILM rule looks like below

ALTER TABLE SALES ILM ADD POLICY
<Compression clause> | <Data movement clause>
ON <CUSTOM_ILM_RULE>

A custom ILM rule should be a function which returns Boolean. You can implement complex logic inside the function and return Boolean after evaluating that complex logic. If return value is true, ILM policy takes action. If return value is false, ILM policy does not take action. This custom ILM rule can be used for both compression action as well as data movement action policies.

Implementing multiple policies on segment

A segment can have multiple ADO policies implemented on this. This logically makes sense as we want to have different action taken on data as it ages more and more. Example, we might go for basic level of compression after 1 week of data insertion, more advanced compression after a month of no creation or no modification and highest level of compression of moving to low cost storage after a month of no access or after a year of creation of data. So satisfy all these timelines and criteria it’s not unusual to have multiple ILM policies on a segment. But there are rules to follow.

Rule 1) Policy should be applied on same operation or statistics (creation, modification or access)

If you create a segment level policy on a table based on ACCESS, make sure that other policies are also created based on ACCESS operation only.

Example:

I create a policy on SALES table based on ACCESS

SQL> alter table demo_user.sales ilm add policy row store compress advanced segment after 2 days of no access;

Table altered.

Then I create another policy on same table based on ACCESS

SQL> alter table demo_user.sales ilm add policy column store compress for query high segment after 30 days of no access;

Table altered.

Now, If I tried to create 3rd policy based on creation or modification, I will get conflict

SQL> alter table demo_user.sales ilm add policy column store compress for archive high segment after 60 days of creation;
alter table demo_user.sales ilm add policy column store compress for archive high segment after 60 days of creation
*
ERROR at line 1:
ORA-38323: policy conflicts with policy 21

Rule 2) Compression level should increase with time

You should not reduce compression level as timeline condition increases. In the following example, I tried to set column level compression after 7 days of no access and row store compression (which is lower level compression then column level) after 30 days of no access. This causes conflicts

SQL> alter table demo_user.sales ilm add policy column store compress for query high segment after 7 days of no access;

Table altered.

SQL> alter table demo_user.sales ilm add policy row store compress advanced segment after 30 days of no access;
alter table demo_user.sales ilm add policy row store compress advanced segment after 30 days of no access
*
ERROR at line 1:
ORA-38325: policy conflicts with policy 23

Rule 3) Only 1 row level policy allowed per segment

You can have only 1 row level policy on a table. In following example, I tried to create 2 row level policies on same table and it didn’t allow

SQL> alter table demo_user.sales ilm add policy row store compress advanced row after 7 days of no modification;

Table altered.

SQL> alter table demo_user.sales ilm add policy row store compress advanced row after 30 days of no modification;
alter table demo_user.sales ilm add policy row store compress advanced row after 30 days of no modification
*
ERROR at line 1:
ORA-38325: policy conflicts with policy 24

Also remember that you can create only “no modification” operation policy at row level.

Rule 4) Segment level policy does not overwrite row level policy

Continuing with above example, after creating row level policy, we can still create segment level policy

SQL> alter table demo_user.sales ilm add policy column store compress for query high segment after 30 days of no modification;

Table altered.

Policy priority and policy Inheritance

As we discussed before, we can create policies at tablespace level, which serves as default policy for every segment created in that tablespace with no policy, segment level and even at partition level. Question remains as to which policy will actually take affect?

Again, this follows rules. Following are the simple rules to understand this

  • Child level policy always overrides parent level policy for same action. By this logic, policies set at table partition will override the one set at table level if both has same actions and policies at table level will overrides the policy at tablespace level if both has same actions
  • If no policies are defined at child level, inheritance takes place. Example, if no ILM policies are defined at partition level, table level policies are applied at partition level. Also, if no ILM policies are defined at table level, tablespace level policies are inherited at table level
  • Inheritance is additive if policy actions are different. For example, if you have a compression policy at tablespace level after specified time condition and compression policy at segment level after some specified condition then total effect on the segment is the sum of effect of both policies – i.e. compression + data movement

SQL> select policy_name, object_name, subobject_name, object_type, inherited_from from dba_ilmobjects;

POLICY_NAM OBJECT_NAME          SUBOBJECT_ OBJECT_TYPE                    INHERITED_FROM
---------- -------------------- ---------- ------------------------------ --------------------
P25        SALES                P1         TABLE PARTITION                TABLE
P25        SALES                P10        TABLE PARTITION                TABLE
P25        SALES                P11        TABLE PARTITION                TABLE
P25        SALES                P12        TABLE PARTITION                TABLE
P25        SALES                P2         TABLE PARTITION                TABLE
P25        SALES                P3         TABLE PARTITION                TABLE
P25        SALES                P4         TABLE PARTITION                TABLE
P25        SALES                P5         TABLE PARTITION                TABLE
P25        SALES                P6         TABLE PARTITION                TABLE
P25        SALES                P7         TABLE PARTITION                TABLE
P25        SALES                P8         TABLE PARTITION                TABLE
P25        SALES                P9         TABLE PARTITION                TABLE
P25        SALES                PM         TABLE PARTITION                TABLE
P25        SALES                           TABLE                          POLICY NOT INHERITED
P41        SALES_ITEM                      TABLE                          TABLESPACE

15 rows selected.

We can check different policies implemented on different objects using DBA_ILMOBJECTS view.

In the above example, Policy P41 is applied on SALES_ITEM table as it exists in USERS tablespace and has no ILM policy defined on it. But since we have default ILM policy at tablespace level USERS, this table SALES_ITEM has inherited this policy from tablespace. In the column INHERITED_FROM, you can see value as TABLESPACE

Similarly, policy P25 applied to table and all its partition. But we create policy at table level only and not at partition level so all partitions of table inherit table level policy.

For object type TABLE and policy P25, we see “POLICY NO INHERITED” in INHERITED_FROM column. This is because we have created new policy at table level and it’s not getting inherited from any level.

Checking ILM policy details

We have couple of views to provide details about ILM policies that exists at various levels in database.

DBA_ILMPOLICIES

You can view all the policies configured in the database in this view


SQL> select * from dba_ilmpolicies;

POLICY_NAM POLICY_TYPE   TABLESPACE ENABLE DELETED
---------- ------------- ---------- ------ -------
P24        DATA MOVEMENT            YES    NO
P25        DATA MOVEMENT            YES    NO
P41        DATA MOVEMENT USERS      YES    NO

Even compression policies are represented as data movement policies in this table. This is because during compression data movement happens.

DBA_ILMDATAMOVEMENTPOLICIES


SQL> select policy_name, action_type, compression_level, condition_type, condition_days from DBA_ILMDATAMOVEMENTPOLICIES;

POLICY_NAM ACTION_TYPE COMPRESSION_LEV CONDITION_TYPE         CONDITION_DAYS
---------- ----------- --------------- ---------------------- --------------
P41        COMPRESSION ADVANCED        LAST MODIFICATION TIME             60
P24        COMPRESSION ADVANCED        LAST MODIFICATION TIME              7
P25        COMPRESSION QUERY HIGH      LAST MODIFICATION TIME             30

Step 3) Evaluating and Executing ADO policies

We have 2 ways of evaluation and execution of ADO policies – scheduled in maintenance window, evaluating and executing it manually. Let’s look at both approaches

Scheduled in maintenance window

Row level policies are evaluated and executed every 15 mins by MMON background process. Segment level policies are evaluated and executed daily once during maintenance window using a scheduler job. This scheduler job uses DBMS_ILM.EXECUTE_ILM_TASK procedure to evaluate and execute ADO policies.

** Note that name of the package is DBMS_ILM as against DBMS_ILM_ADMIN, which is used for customizing ILM parameters and other admin related tasks. We will look into DBMS_ILM_ADMIN package in short while.

In DBMS_ILM.EXECUTE_ILM_TASK procedure, TASK_ID is an input parameter which is automatically generated during evaluation process, which also happens during scheduled maintenance window.

You can change the interval of evaluation and execution using DBMS_ILM_ADMIN.CUSTOMIZE_ILM procedure. We have seen this procedure previously when we were changing TBS PERCENT FREE and TBS PERCENT USED parameters. We can use same procedure to customize execution interval of ILM tasks.

SQL> exec dbms_ilm_admin.customize_ilm(dbms_ilm_admin.execution_interval, 1);

PL/SQL procedure successfully completed.

Manually executing ADO policies

For manually executing ADO policies, we have 2 options

  1. Directly execute ADO policies without doing any customization
  2. Preview the evaluation of ADO policies, make any desired changes to the task and then execute the task

You can directly execute ADO policies without doing any customization using DBMS_ILM.EXECUTE_ILM procedure. Note that this procedure is different than the one that runs in scheduled maintenance window (DBMS_ILM.EXECUTE_ILM_TASK). There are multiple definitions of this procedure and in simple definition, you can just pass owner and table name and all policies on that table will be executed.

You can also preview the evaluation result before you execute the task. You can do following steps to customize ILM tasks

  • Evaluate set of ADO policies and preview the result. Check what all tables/policies will be executed
  • Add/remove objects/subobjects from ILM tasks evaluation result.
  • Execute the final customized evaluated ILM task

You can use following procedure in the same order to perform above steps of customizing ILM tasks

  • Execute DBMS_ILM.PREVIEW_ILM procedure to preview ILM tasks for database or specific schema
  • Execute DBMS_ILM.ADD_TO_ILM or DBMS_ILM.REMOVE_FROM_ILM to add or remove objects/subobjects from evaluated task
  • Execute DBMS_ILM.EXECUTE_ILM_TASK to execute the above customized ILM task

Example:

var v_task_id number;
exec DBMS_ILM.PREVIEW_ILM(TASK_ID=> :v_task_id, ILM_SCOPE=>DBMS_ILM.SCOPE_SCHEMA);

let’s say above command generated task_id as 1234.

exec DBMS_ILM.ADD_TO_ILM(TASK_ID=>1234, OWN=>DEMO_USER’, OBJ_NAME=>’NEW_TABLE’);

exec DBMS_ILM.EXECUTE_ILM_TASK(TASK_ID=>1234);

You can also pass following 2 parameters to DBMS_ILM.EXECUTE_ILM_TASK

EXECUTION_MODE: ILM_EXECUTION_ONLINE value executes the task online and ILM_EXECUTION_OFFLINE executes the task offline.

EXECUTION_SCHEDULE: Only possible value for this parameter is DBMS_ILM.SCHEDULE_IMMEDIATE

Step 4) Checking execution details of ADO policies

You can check the details of ADO policies evaluation using following 3 views

  • DBA_ILMEVALUATIONDETAILS
  • DBA_ILMRESULTS
  • DBA_ILMTASKS

DBA_ILMTASKS

Every task that is executed manually or automatically as scheduled will be recorded in DBA_ILMTASKS view. It stored creation time, start time and completion time of task and if task is completed, failed or Active


SQL> select TASK_ID, STATE, START_TIME, COMPLETION_TIME from DBA_ILMTASKS order by task_id;

   TASK_ID STATE     START_TIME                     COMPLETION_TIME
---------- --------- ------------------------------ ------------------------------
         1 COMPLETED 15-JUN-16 10.19.24.976731 PM   15-JUN-16 10.19.24.976731 PM
         2 COMPLETED 15-JUN-16 10.34.27.587548 PM   15-JUN-16 10.34.27.587548 PM
         3 COMPLETED 15-JUN-16 10.49.30.158279 PM   15-JUN-16 10.49.30.158279 PM
         4 COMPLETED 15-JUN-16 11.04.32.861197 PM   15-JUN-16 11.04.32.861197 PM
         5 COMPLETED 15-JUN-16 11.19.35.192715 PM   15-JUN-16 11.19.35.192715 PM

DBA_ILMEVALUATIONDETAILS

This view can be used to see details on task execution after the evaluation of the ADO policies is completed. If the SELECTED_FOR_EXECUTION column in this view has the value “selected for execution,” then the policy has been selected for execution indeed and an ADO job will be executed to satisfy the ILM policy. However, a job may not execute if there’s a value other than “selected for execution” under the column SELECTED_FOR_EXECUTION.

The following values all mean that the ADO policy hasn’t successfully passed the evaluation:

  • Precondition not satisfied
  • Policy disabled
  • Policy overruled
  • Inherited policy overruled
  • Job already exists
  • No operation since last ILM action
  • Target compression not higher than current
  • Statistics not available

DBA_ILMRESULTS

Provide details about completed ILM tasks. It has a STATISTICS column which is a CLOB and provide Job specific statistics, such as space saved via compression etc

Enabling and disabling ADO policies

You can enable and or disable ADO policies at individual table level or at database level completely.

Enabling/disabling ADO policies

You can enable/disable individual ADO policy on a table

SQL> alter table demo_user.sales ilm disable policy P25;

Table altered.

SQL> alter table demo_user.sales ilm enable policy P25;

Table altered.

You can also enable/disable all policies on a table

SQL> alter table demo_user.sales ilm disable_all;

Table altered.

SQL> alter table demo_user.sales ilm enable_all;

Table altered.

Deleting ADO policies

You can delete individual policy on a table using policy name

SQL> alter table demo_user.sales ilm delete policy P24;

Table altered.

If you want to delete all policies on a table, you can use delete_all

SQL> alter table demo_user.sales ilm delete_all;

Table altered.

Enable/disable ILM

You can disable ILM completely in database using DBMS_ILM_ADMIN.DISABLE_ILM procedure. This disables ILM, but keep all policies intact. Those policies will not get executed until we enable ILM.

SQL> exec DBMS_ILM_ADMIN.DISABLE_ILM;

PL/SQL procedure successfully completed.

SQL> exec DBMS_ILM_ADMIN.ENABLE_ILM;

PL/SQL procedure successfully completed.

Hope this helps !!

Previewing Backup Restore

Introduction:

This is a short article on RMAN where we can check if our backups are really intact and can help us in critical situation when we have to restore and recover the database.

We have a command option called PREVIEW which we can use with RESTORE DATABASE. This option does not actually restore the datafiles from backup but it just tell us SCN number until which we should be recovering our database. It also tells us SCN number until which it can restore.

Lets take a simple example where I have a complete database backup taken for my database including archivelogs.

I will run RESTORE DATABASE PREVIEW command to check if I can get back my database in case of any issues. Then I will wipe out my database and try to recover from the backup to validate if PREVIEW option was showing correct information.

Validating Backupsets

We can validate if our backupsets are intact and does not have any corruption. We can check both physical and logical corruption for the backupsets and make sure they are good and can be used for restore.

Following RMAN command will confirm the same

Physical corruption validation

 


RMAN> backup validate database archivelog all;

Starting backup at 08-FEB-16
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=272 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=13 device type=DISK
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
...
...
...

Logical corruption validation


RMAN> backup validate check logical database archivelog all;

Starting backup at 08-FEB-16
using channel ORA_DISK_1
using channel ORA_DISK_2
channel ORA_DISK_1: starting compressed full datafile backup set
channel ORA_DISK_1: specifying datafile(s) in backup set
input datafile file number=00008 name=/u01/app/oracle/oradata/orcl/datafiles/users02.dbf
...
...
...

If above command doesn’t report any issues, we can be sure that backupsets are intact and can be used for restore/recovery.

Checking PREVIEW

We can use “RESTORE DATABASE PREVIEW” command in RMAN to get preview of all backup sets we are going to use and until what SCN this backup will be able to restore.

This command also gives us the SCN number until which we need to recovery the DB to get all datafiles out of fuzzy status


RMAN> restore database preview;

Starting restore at 08-FEB-16
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=58 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=264 device type=DISK
List of Backup Sets
===================
BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
48 Full 209.01M DISK 00:00:43 08-FEB-16
BP Key: 48 Status: AVAILABLE Compressed: YES Tag: TAG20160208T053652
Piece Name: /u01/app/oracle/fast_recovery_area/ORCL/backupset/2016_02_08/o1_mf_nnndf_TAG20160208T053652_ccjvy4y2_.bkp
List of Datafiles in backup set 48
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- --------- ----
1 Full 2955419 08-FEB-16 /u01/app/oracle/oradata/orcl/datafiles/system01.dbf
4 Full 2955419 08-FEB-16 /u01/app/oracle/oradata/orcl/datafiles/undotbs01.dbf
8 Full 2955419 08-FEB-16 /u01/app/oracle/oradata/orcl/datafiles/users02.dbf

BS Key Type LV Size Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
47 Full 152.79M DISK 00:00:41 08-FEB-16
BP Key: 47 Status: AVAILABLE Compressed: YES Tag: TAG20160208T053652
Piece Name: /u01/app/oracle/fast_recovery_area/ORCL/backupset/2016_02_08/o1_mf_nnndf_TAG20160208T053652_ccjvy4yx_.bkp
List of Datafiles in backup set 47
File LV Type Ckp SCN Ckp Time Name
---- -- ---- ---------- --------- ----
3 Full 2955421 08-FEB-16 /u01/app/oracle/oradata/orcl/datafiles/sysaux01.dbf
6 Full 2955421 08-FEB-16 /u01/app/oracle/oradata/orcl/datafiles/users01.dbf
9 Full 2955421 08-FEB-16 /u01/app/oracle/oradata/orcl/datafiles/users03.dbf

...
...
...
List of Backup Sets
===================
BS Key Size Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
54 3.97M DISK 00:00:01 08-FEB-16
BP Key: 54 Status: AVAILABLE Compressed: YES Tag: TAG20160208T055120
Piece Name: /u01/app/oracle/fast_recovery_area/ORCL/backupset/2016_02_08/o1_mf_annnn_TAG20160208T055120_ccjws8m9_.bkp

List of Archived Logs in backup set 54
Thrd Seq Low SCN Low Time Next SCN Next Time
---- ------- ---------- --------- ---------- ---------
1 43 2955401 08-FEB-16 2956061 08-FEB-16

BS Key Size Device Type Elapsed Time Completion Time
------- ---------- ----------- ------------ ---------------
55 80.50K DISK 00:00:00 08-FEB-16
BP Key: 55 Status: AVAILABLE Compressed: YES Tag: TAG20160208T055120
Piece Name: /u01/app/oracle/fast_recovery_area/ORCL/backupset/2016_02_08/o1_mf_annnn_TAG20160208T055120_ccjws9xm_.bkp

List of Archived Logs in backup set 55
Thrd Seq Low SCN Low Time Next SCN Next Time
---- ------- ---------- --------- ---------- ---------
1 44 2956061 08-FEB-16 2956142 08-FEB-16
1 45 2956142 08-FEB-16 2956308 08-FEB-16
RMAN-05119: recovery can not be done to a consistent state.
Media recovery start SCN is 2955419
Recovery must be done beyond SCN 2955603 to clear datafile fuzziness
Finished restore at 08-FEB-16

If you check the last 4 lines, it will show the SCN number until which RMAN can restore the backups – SCN 2955419

From SCN 2955419 we need to start applying archive logs. We can identify the sequence number in which this SCN falls using below SQL


SQL> select sequence#, first_change#, next_change# from v$archived_log

where FIRST_CHANGE# <= 2955419 and NEXT_CHANGE# >= 2955419;

SEQUENCE#  FIRST_CHANGE# NEXT_CHANGE#
---------- ------------- ------------
43         2955401       2956061

So we will need archives from sequence# 43 and we need to recover beyond SCN 2955603.

Trying restore/recover

Lets see I have few/all datafiles missing for database. I tried to bounce the DB and got following error


SQL> startup
ORACLE instance started.

Total System Global Area 943718400 bytes
Fixed Size 2931136 bytes
Variable Size 348128832 bytes
Database Buffers 587202560 bytes
Redo Buffers 5455872 bytes
Database mounted.
ORA-01157: cannot identify/lock data file 1 - see DBWR trace file
ORA-01110: data file 1: '/u01/app/oracle/oradata/orcl/datafiles/system01.dbf'

Lets try to restore


[oracle@advait ~]$ rman target /

Recovery Manager: Release 12.1.0.2.0 - Production on Mon Feb 8 06:56:26 2016

Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.

connected to target database: ORCL (DBID=1429382412, not open)

RMAN> restore database;

Starting restore at 08-FEB-16
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=248 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=13 device type=DISK

skipping datafile 5; already restored to file /u01/app/oracle/oradata/orcl/pdbseed/system01.dbf
skipping datafile 7; already restored to file /u01/app/oracle/oradata/orcl/pdbseed/sysaux01.dbf
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00003 to /u01/app/oracle/oradata/orcl/datafiles/sysaux01.dbf
channel ORA_DISK_1: restoring datafile 00006 to /u01/app/oracle/oradata/orcl/datafiles/users01.dbf
channel ORA_DISK_1: restoring datafile 00009 to /u01/app/oracle/oradata/orcl/datafiles/users03.dbf
...
...
...
channel ORA_DISK_2: restored backup piece 1
channel ORA_DISK_2: restore complete, elapsed time: 00:01:01
channel ORA_DISK_1: piece handle=/u01/app/oracle/fast_recovery_area/ORCL/2B0D04DCEFAF55F5E0531438A8C0EDC4/backupset/2016_02_08/o1_mf_nnndf_TAG20160208T053652_ccjvzlmt_.bkp tag=TAG20160208T053652
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:00:35
Finished restore at 08-FEB-16

RMAN>

Once restore complete, lets try to recover


SQL> recover database;
ORA-00279: change 2955419 generated at 02/08/2016 05:36:52 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/fast_recovery_area/ORCL/archivelog/2016_02_08/o1_mf_1_43_%u_.arc
ORA-00280: change 2955419 for thread 1 is in sequence #43
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}

As expected, its asking for recovery starting from sequence# 43.

We can restore the required archivelogs as well


RMAN> restore archivelog from sequence 43;

Starting restore at 08-FEB-16
using channel ORA_DISK_1
using channel ORA_DISK_2

channel ORA_DISK_1: starting archived log restore to default destination
channel ORA_DISK_1: restoring archived log
archived log thread=1 sequence=43
channel ORA_DISK_1: reading from backup piece /u01/app/oracle/fast_recovery_area/ORCL/backupset/2016_02_08/o1_mf_annnn_TAG20160208T055120_ccjws8m9_.bkp
...
...
...
channel ORA_DISK_2: restored backup piece 1
channel ORA_DISK_2: restore complete, elapsed time: 00:00:01
Finished restore at 08-FEB-16

RMAN>

Once archives are restored, we can carry on the recovery


SQL> recover database
ORA-00279: change 2955419 generated at 02/08/2016 05:36:52 needed for thread 1
ORA-00289: suggestion :
/u01/app/oracle/fast_recovery_area/ORCL/archivelog/2016_02_08/o1_mf_1_43_cck2fwv
k_.arc
ORA-00280: change 2955419 for thread 1 is in sequence #43
Specify log: {<RET>=suggested | filename | AUTO | CANCEL}
/u01/app/oracle/fast_recovery_area/ORCL/archivelog/2016_02_08/o1_mf_1_43_cck2fwvk_.arc
Log applied.
Media recovery complete.
SQL> alter database open;

Database altered.

SQL>

In this case it old took sequence# 43 as NEXT_CHANGE# for sequence# 43 was 2956061 which is more than SCN 2955603 required to clear the fuzzy state. Rest of the redo information was also present in online redo logs so database did crash recovery while doing the open.

Hope this helps !!

Library Cache: Mutex X – Bug 20879889 – Fixed in 11.2.0.4

I recently encountered a bug related to MView log causing very high library cache: mutex x wait events.

I will brief about the debugging steps I tried and fix for the same.

Few things to note before I proceed:-

  • We observed huge wait events for library cache: mutex X whenever we performed flip to standby or when DB was bounced. I am implying that library cache was cold and didn’t had required cursor information and object handles.
  • Load on the database was very high. This includes very high number of sessions connected to database and high number of DMLs performed on same table by many sessions.
  • Table on which DMLs are performed is having MLOG created on that based on primary key. This is required as these changes and data needs to flow to downstream databases via MView (fast refresh). So around 10 downstream sites were registered on this MLOG.

Root Cause Analysis:

After looking at the wait events, we immediately started digging into the root cause. We wanted to understand the bottleneck which is causing these wait events and remove whatever it takes to reduce these wait events.

I started with checking top 10 wait events in last 20 mins from v$active_session_history


SQL>select * from (select event, count(1) from v$active_session_history
2 where sample_time > (sysdate - 20/1440) group by event order by 2 desc) where rownum < 10;

EVENT                                    COUNT(1)
---------------------------------------- ----------
library cache: mutex X                        50943
checkpoint completed                          15170
read by other session                          5487
row cache lock                                 4205
log file sync                                  3137
flashback buf free by RVWR                     1815
db file sequential read                        1675
log file switch completion                     1611
cursor: pin S wait on X                        1516

9 rows selected.

Clearly library cache: mutex X was way higher.

We can check who is causing library cache: mutex X by checking P1 and P2 of that wait event


SQL>select event, p1, count(1) from v$active_session_history where sample_time > (sysdate - 20/1440) and event = 'library cache: mutex X' group by event, p1 order by 3;

EVENT                                    P1         COUNT(1)
---------------------------------------- ---------- ----------
library cache: mutex X                    421399181          1
library cache: mutex X                   3842104349          1
library cache: mutex X                   3477606468          1
library cache: mutex X                   2432877226          1
library cache: mutex X                    955484820          2
library cache: mutex X                        37005         25
library cache: mutex X                   1412465886        297
library cache: mutex X                   2417922189      50615

8 rows selected.

As you can see max wait events are caused by P1 = 2417922189

P1 is idn – can be used for finding the cursor related to mutex

For library cache: mutex X

  1. P1 represents library cache hash bucket number (if idn <= 131072)
  2. P1 represents hash value of the library cache object under protection (if idn > 131072)

In our case hash value was 2417922189. So it represents library cache object.

We can check if this hash value belongs to a cursor (SQL) using v$sql


SQL>select sql_id from v$sql where hash_value = 2417922189;

SQL_ID
-------------
3b7aa6f81x44d

SQL>@sql
Enter SQL_ID:- 3b7aa6f81x44d
old 3: where sql_id = '&SQL_ID'
new 3: where sql_id = '3b7aa6f81x44d'

INSERT /*+ IDX(0) */ INTO "BOOKER"."MLOG$_FULFILLMENT_DEMAND_I"
(dmltype$$,old_new$$,snaptime$$,change_vector$$,xid$$,"WAREHOUSE
_ID","CO_WAREHOUSE_ITEM_ID") VALUES (:d,:o,to_date('4000-01-01:0
0:00:00','YYYY-MM-DD:HH24:MI:SS'),:c,:x,:1,:2)

As you can see this is DML on MLOG table. So clearly MLOG seems to be the bottleneck.

Immediate Action:

Wait events were making every thing stand still and nothing was getting processed. Since database availability was critical and we couldn’t afford to loose any more time because of wait events, our immediate action was to drop MLOG.

But since DB was having thousands of sessions and all stuck in library cache: mutex X, we cannot even get a lock to drop MLOG (not even with ddl_lock_timeout). Killing a session was not helping as they were keep coming back in no time.

So we decided to take quick bounce of DB (by keeping listeners down), dropped this MLOG and started listeners again.

This has fixed the issue and there were no wait events.

On down side, all downstream sites had to do complete refresh of MView followed by fast refresh as they got deregistered from MLOG. But it was OK as size of master table was not very big.

Patch Fix:

We also tried to investigate why an MLOG can cause this issue ? But were not able to get any insight.

MLOG is a table where primary key of changed records in main table will get inserted and when downstream table refreshes the data, these records will get deleted. So we don’t expect much higher size of MLOG. Unfortunately this is case it was 7.7GB (bigger than master table size).

Because the size of MLOG was too high, whenever a downstream database is doing fast refresh it has to update SNAPTIME$$ column in MLOG table which tells MLOG which rows are refreshed by downstream DB at what time. This update might be taking more time and holding enqueue locks. But inserts should not be blocked because of this update as it inserts a new record and doesn’t collied at enqueue level. Also, we were seeing these library cache wait events just after bounce or failover.

This is identified as a bug and Oracle has provided following patch as a fix for this issue

Patch 20879889: INSERT INTO MV LOG LEAVING TOO MANY OPEN CURSORS AFTER UPGR TO 11.2.0.4

Make sure you have this patch applied if you are using MLOG in your database.

Hope this helps !!

Reference:

https://support.oracle.com/epmos/faces/PatchDetail?requestId=19179589&_afrLoop=14055698364071&patchId=20879889&_afrWindowMode=0&_adf.ctrl-state=xqyu7jtft_77

 

 

Mutex: What do we know ?

In my previous article on Latches, I mentioned various things related to latches including different types and their behavior.

In this article I will describe similar things about mutexes.

Mutex are low level serialization / locking mechanism to protect memory structure inside library cache.

Why there was a change from Latches ? 

Mutex was introduced from 10.2 onwards and have been proved very efficient in managing library cache operations.

Mutex takes less memory then latches. Typically latch structure takes around 110 bytes on 32 bit system whereas mutex takes only 30 bytes. This variation is because of number of instructions it takes to acquire latch vs mutex. Latch takes around 150-200 instructions where as mutex takes 30-35 instructions. But the down side is that mutex gives less information about the waits and blockers compared to latches.

As we know the basics of library cache architecture, it consists of hash buckets and each bucket will contain linked list of library cache object handles. Whenever an access to library cache object happens, it is hashed to a bucket and corresponding latch should be obtained to traverse the chain in that bucket. Latch will be released when corresponding object is found or not found.

But there are only 67 library cache latches available to protect 131,072 buckets created by default in library cache. So single latch covers multiple buckets. This creates a false contention, meaning if 2 process are trying to access 2 different buckets protected by same latch, one of them have to wait until other completes traversing its bucket. So even though they are not colliding on same bucket they still will be blocked on each other because latch mechanism.

Further, if required object is found in library cache, process needs to pin that object while it is being used. Pin will basically protect object so that no other process can modify that object or that object will not be discarded from memory while you are accessing. When you are done, you need to take a latch again to unpin the object. These latches to pin and unpin needs memory allocation and deallocation to happen, which is expensive.

Compared to above process, mutex has many benefits

As against 67 latches to protect 131,072 buckets, Oracle introduced 131,072 mutexes to protect each bucket. So false contention is reduced drastically. False contention can still occur if 2 process want to access to 2 different objects which belongs to same bucket but its very rare.

After a mutex is obtained process will traverse the chain until it finds the required object. Once an object is found, a process needs to pin the object to access the same. But in case of mutex, there is no need for “latch: library cache pin”, instead mutex itself will act as a pin. Mutex acts as serialization mechanism to traverse a linked list as well as cursor pin structure. A mutex pin can be referenced by multiple sessions, providing that all sessions reference the mutex in shared mode only. The total number of sessions referencing a mutex in shared (S) mode is called reference count (ref count). The ref count for a mutex is stored in the mutex itself. Whenever a session wants to pin an object in shared pool, it increments this ref count of the mutex associated with that object. This is much faster and no need for any memory allocation or deallocation. Once process is done with that object, it will reduce the ref count. An object is not discarded from memory until ref count is zero.

Why not to increase number of latches ?

a. latches needs more memory and they are allocated upfront. As against Mutexes which are dynamic and are created when requested, latches are created when instance starts and its memory locations are externalized in view v$latch, v$latch_children

b. more latches you have, the more work you may have to do in some form of maintenance, reporting, or garbage collection. This will increase processing demand on server.

Now that we know why Oracle introduced mutex, lets move further in understanding more.

Shared, Exclusive and Examine mode:

Shared mode: Multiple sessions can access the memory structure protected by mutex in shared mode. Meaning every session has read access to structure protected by mutex. Every time a session access or acquire mutex in shared mode, ref count of mutex needs to be updated in mutex structure. Number of sessions accessing a mutex can be seen in lower bytes of P3 value of mutex wait event (cursor: pin S).

Exclusive mode: This mode is incompatible with all modes. Only 1 session can hold the lock in exclusive mode. In this mode, upper bytes of P3 value of mutex wait events represents the SID of holding session.

Examine mode: This mode indicates that mutex is in transition phase from shared mode to exclusive mode. During this time no other session can access or modify the mutex. In this mode, upper bytes of P3 value of mutex wait events represents the SID of holding session and lower bytes represents number of session holding mutex in shared mode.

Mutex Acquisition:

So how mutex acquisition works ?

We have seen in the past article on latches about how latch acquisition has worked. Mutex acquisition is similar to latch acquisition in that it tries for immediate gets and if its not able to acquire, it will spin followed by sleep (but sleep depends on oracle version).

Over past years with different releases of Oracle, mutex acquisition algorithm has changed drastically and I would say it got more stabilized with new releases. In Oracle 10.2, mutex was used for only pinning the object in library cache (only when _kks_use_mutex_pin=true) and there was no sleep.

Below algorithm applies to other mutexes which were introduced in 11g.

Oracle 10.2 – 11.1

Following is the mutex acquisition algorithm in 10.2 – 11.2

- Immediate gets mutex
      - spin gets mutex
            - yield()

Session will try to acquire mutex in immediate mode, if mutex is not available, it will spin. If still mutex is not available it will yield CPU, meaning, that process will be placed at the end of run queue but it will still be “ON CPU”. Because of this, CPU consumption was very high. Also because there was no sleep, wait interface was not able to record actual time waited for mutex acquisition. In AWR report, in top 5 timed events, we could see high number of waits, but total time waited used to be very low and CPU time always used to be close to DB time.

If CPU resources are not available OS will de-schedule such process (which is spinning and yielding CPU) and at that time that actual wait event for cursor: pin S gets recorded. But if a system has lot of CPU (like in many systems), that process will not even get de-scheduled, and Oracle thinks it is not waiting at all.

Therefore, in Oracle 10.2-11.1 the “cursor: pin S” wait time is the pure wait for CPU

For systems which do not have enough spare CPU, “cursor: pin S” used to be top wait events because it used to get de-scheduled from run queue and also whenever it used to spin and yield CPU consumption used to sky rocket. These are the systems where mutex issue was coming into light.

Because of these issue, Oracle released patch 6904068: High CPU usage when there are “cursor: pin S” waits.

This patch was released for 11.1 and later back ported to 10.2. With this patch, Oracle introduced underscore parameter – _first_spare_parameter. Default value of this parameter was 1 (centisec). This parameter provides a sleep time for mutex if its not acquired after first spin. So mutex behavior became similar to latch behavior but wait timeout of 10 ms was too high.

On better side, CPU consumption on system decreased and because of sleep time, this wait event (cursor: pin S) was shown up correctly on top 5 timed event (because total time waited was quiet accurate because of sleep). Parameter _first_spare_parameter was dynamic and if set to 0, mutex acquisition will behave without sleep (aggregated CPU usage).

But will all above discussion, question arises – if cursor: pin S is a shared mutex which can be acquired by multiple sessions concurrently why there would be blocking or wait events ?

Well couple of scenario when we can see this wait event

  1. If a mutex is in transition state “E” as mentioned above
  2. Everytime a session acquires mutex in shared mode, it has to update ref count of that mutex. That can cause contention.

What happened in 11.2 ?

Version 11.2.0.1

Behavior of mutex in 11.2.0.1 is same as 10.2. It was aggressive with no sleep. But it was visible as top event in-spite of no sleep calls.

This happened because Oracle counted wait time as time between first spin and successful mutex acquisition. So all the yield happening after the first spin was considered in waits. So total time waited was shown as high in AWR report and it was shown as one of the top timed events (even without sleep).

Version 11.2.0.2.2 PST

This version had a drastic change in behavior of mutex. Oracle came up with a exhaustive structure for controlling mutex called – mutex waits schemes.

Oracle introduced several underscore parameter to control behavior of mutex spin, yield and sleep.

Following underscore parameters were introduced starting this version

_mutex_wait_scheme – this parameter was introduced with 2 different values for 3 different type of mutex wait configurations.

0 - Always yield
1 - Always sleep for _mutex_wait_time
2 - Exponential back off with max sleep time of _mutex_wait_time

_mutex_spin_count – Number of times the process should spin. Default 255
_mutex_wait_time – amount of time process should sleep. Default 1ms

Along with this parameter, Oracle introduced following parameter to control yield and sleep of mutex

_wait_yield_mode – this defines if the process should yield first or sleep first. possible values are “yield” (default) or “sleep”

_wait_yield_hp_mode – defines high priority processes> default is SMON and VKTM

_wait_yield_sleep_time_msecs – defines how much a process should sleep before it yields again (in millisec). Default 1

_wait_yield_sleep_freq – number of yield cycles to do before it goes to sleep. default 100

_wait_yield_yield_freq – number of sleep cycles to do before it yields. default 20

_mutex_wait_scheme=0

I think in this scheme _mutex_wait_time is not applicable. Becuase sleep time depends on _wait_yield_sleep_freq. But I am not very sure.

With _mutex_wait_scheme=0 and default values for above _wait_yield* parameters above, mutex acquisition will work following way

- Immediate mutex gets
      - spin - _mutex_spin_count times
            - yield() - 100 times
                  - sleep - 20 times (20 cycles of 1 ms each)
            - yield() - 100 times
                  - sleep - 20 times (20 cycles of 1 ms each)

Using above underscore parameter we can vary the behaviour. Example if we change “_wait_yield_mode” to “sleep” instead of “yield”, oracle process will first sleep

- Immediate mutex gets
      - spin 
            - sleep - 20 times (20 cycles of 1 ms each)
      - yield() - 100 times
            - sleep - 20 times (20 cycles of 1 ms each)
      - yield() - 100 times

_mutex_wait_scheme=1

In this scheme _mutex_wait_time comes into picture. This is “always sleep” mode. So process goes to sleep after first spin, but wakes up after timeout of every _mutex_wait_time

So behavior will look like following

- Immediate mutex gets
      - Spin for _mutex_spin_count
            - sleep for _mutex_wait_time
            - sleep for _mutex_wait_time
            - sleep for _mutex_wait_time
            - sleep for _mutex_wait_time
            ...
            ...

_mutex_wait_scheme=2

In this scheme _mutex_wait_time defines the max time a process should wait. This is “exponential backoff” scheme. Wait time increases exponentially until it reaches a max value set by _mutex_wait_time.

Other than this behavior differs in initial spin and yield cycle. It spins and yield 2 times initially before the sleep begins

So behavior will look like following

- Immediate mutex gets
      - Spin for _mutex_spin_count - 2 times
            - yield() - 2 times
                  - sleep - with exponential increase of timeouts

Starting from 11.2.0.2.2, _mutex_wait_scheme=2 is the default behavior

Hope this helps !!

References:

http://blog.tanelpoder.com/2008/08/03/library-cache-latches-gone-in-oracle-11g/

http://oracleinaction.com/latche-lock-pin-mutex/

https://andreynikolaev.wordpress.com

Latches: What do we know ?

Latches are low level serialization mechanism which protects memory areas inside SGA. They are light wait and less sophesticated than enqueues and can be acquired and released very quickly.

Latch acquisition does not involve any complex algorithm and is based on test-and-set atomic instruction of a computer processor.

Latch Classification:

Latch can be classified in multiple ways:-

Shared latch and exclusive latch:

Shared latch is the one which can be shared by multiple processes/sessions.

Example if a session wants to read a block in memory and at the same time other session also wants to read the same block in memory, they can acquire shared latch. Example of shared latch is “latch: cache buffer chain”. This latch was exclusive in older version and Oracle changed this to shared latch from Oracle 9i onwards.

Exclusive latch is the one which is acquired when a session wants to make modification to block or memory area. Exclusive latch can be held by only 1 session/process at a time and is not compatible with any other latch.

Both shared latch and exclusive latch are not compatible with each other. Process holding a shared latch will block other process which needs exclusive latch but will allow other process which needs shared latch on same memory area. Similarly process holding exclusive latch will not allow any other process which needs either shared latch or exclusive latch.

Example of exclusive latch is “library cache latches” in previous version. These were taken in exclusive mode even for traversing a hash bucket. These latches are no more present in 11g and they are replaced by mutex.

As per Andrey Nikolaev 460 out of 551 latches are exclusive in Oracle 11.2.0.2. In each new version, Oracle tries to make latches more sharable in order to reduce contention.

Another classification of latches is immedaite latch and willing to wait latch

Immedaite latch and willing to wait latch:

Immediate latches are the one that session try to acquire immediately without waits. If the latch is not available, session will not wait and may get terminated or check for another latch. Example of immediate latch is redo copy latch. This is immediate latch because Oracle only wants to know if anyone else is currently copying redo data to log buffer, but not who exactly is copying and where, as this does not matter to LGWR.

Willing to wait latches are the one which will wait if the latch is not available. These latches have little complex behavior then immediate latches. Most of the latches in oracle are willing to wait latches

Why do we see wait events on shared latches ?

After reading about shared latches and exclusive latches, one question comes to mind regarding shared latches.

If “latch: cache buffer chain” latch is a shared latch (past 9i version) and they dont block other shared latches, why do we see “latch: cache buffer chain” latch wait events even in latest version of Oracle ?

The answer to this is explained by Andrey Nikolaev. Andrey has given a very practical example of why this happens. When multiple processes are trying to acquire shared latch (example CBC latch), we should not see any wait events or process blocking each other. The moment another session try to acquire exclusive lock on the resource (where other sessions were having shared latch), exclusive latches are given higher preference than shared latch. So session with exclusive latch will get access to resource and it will block all other sessions having shared latch. This will form a chain of sessions willing to acquire shared latch. Strange part of this algorithm is that even after exclusive latch has been released by the processing holding it, other processes cannot acquire shared latch concurrently. They still continue to be in latch queue and they get access to shared latch one by one. First process in queue will get shared latch and once its done with the access, it will release the latch and post the next process that latch is available. This is one of the major reason why we see wait events on shared latches.

Process flow for latch acquisition

Following is the process flow for acquiring a latch

 - Immediate latch gets 
       - spin latch gets
             - Add process to queue of latch waits
                   - Sleep until posted

The last step in process flow (sleep until posted) is a changed behavior. Initially until 9i, Oracle used to wake up periodically to check if latch has been freed or not and if latch is still not avaialble go back to sleep. So process flow in 9i used to be following

 - Immediate latch gets 
       - spin latch gets
             - Add process to queue of latch waits
                   - Sleep for fixed time and wake up
             - immediate latch gets
                   - Sleep for fixed time and wake up
             - immediate latch gets
                   - Sleep for fixed time and wake up

This behavior has been change from 10g onwards and holding process will wakeup the waiter process after the latch becomes free. This has been explain well by Tannel Podder. Also behavior is explained in more details by ALEX FATKULIN.

Some times, holding process was not able to wake up the process waiting for latch because of bugs or lower kernel version. So older version of Oracle used to have some default timeout available so that in case holding process miss the wakeup call, it will not wait indefinitely and will wakeup after timeout. Oracle has introduced a parameter _enable_reliable_latch_waits={true|false}, which alters this behavior. If this is set to true then no timeout is added and holding process continue to sleep until it gets posted by holding process. False represents the behavior otherwise.

Only latch which is exception to above process flow is “process allocation” latch. This latch does not depends on holding process to post when latch is free. It wakes up preriodically to acquire latch.

Latch Behavior

Oracle has introduced few parameters to control behavior of latch. We will discuss brief about those parameters and how they affect shared latch and exclusive latches

When process try to acquire latch, it attempts to acquire latch in immediate mode. If latch is not available in immediate mode, it will spin for defined number of times and try again. If still latch is not free, process will go to sleep.

Process Spin:

Spinning is a process of consuming/burning CPU so that process will stay “ON CPU” and not to get descheduled from CPU cycles. During spinning, process will burn CPU for few micro seconds with the hope that after passing that much time, latch will be available for it to acquire. This does increase CPU consumption, but it saves time as it may avoid the needing for the process to sleep (which is expensive as it involves context switches).

Number of times a process spins to acquire latch depends on type of latch.

Exclusive latch:

For exclusive latch, this “can be” controlled by _spin_count but needs database bounce. I said “can be” because exclusive latch spins ar actually decided by “spin” column in x$ksllclass table.


SQL>select indx,spin from x$ksllclass;

INDX SPIN
---------- ----------
0 20000
1 20000
2 20000
3 20000
4 20000
5 20000
6 20000
7 20000

8 rows selected.

There are 8 classes of latches (as indicated by indx column) that we will discuss later in this article. By default all latches belongs to class 0.
If we want to change spin count for exclusive latches, we need to change value of SPIN column for class 0. This can be done by changing _spin_count and bouncing the instance (but that will change spin count for all classes), or by setting _latch_class_0 parameter (which will change spin count for only class 0). We have similar parameter to change spin count for other classes (_latch_class_[0-7]).

So changing _spin_count is not a good idea. Instead we can move a specific latch for which we want to change the spin count to another class (1-7) and change specific underscore parameter (_latch_class_[1-7]).

By details _spin_count parameter is not applicable to exclusive latches as default value of _spin_count is 2000 where as exclusive latch spin for 20000 times as mentioned in above table x$ksllclass.
But changing _spin_count will make it applicable for exclusive latch as well.

Shared Latch:

For shared latch, number of times process spins is _spin_count * 2. This has been proved by Andrey Nikolaev. Also, _spin_count parameter is applicable to shared latches by default. So since default value of _spin_count is 2000, shared latches spins 4000 times (2000 * 2).

Diagnosing latch contention:

I am not going to mention much here because Tannel Podder has already written great script – latchprof.sql and latchprofx.sql which can be used to analyze latch wait events.

Tannel has also written great article on how to diagnose latch wait events – http://tech.e2sn.com/oracle/troubleshooting/latch-contention-troubleshooting

In next article, I will try to cover mutex.

Hope this helps !!

Reference:

https://andreynikolaev.wordpress.com

http://tech.e2sn.com/oracle/troubleshooting/latch-contention-troubleshooting