August 2016 – Persistent Storage Solutions

Oracle 12c Learning Series: Automatic Table Recovery Using RMAN

Published on August 10, 2016 by Advait3 Comments

Table recovery was possible in earlier release as well. Until previous release, if we wanted to recover a table, we had following options

Database point in time recovery (DBPITR)
Tablespace point in time recovery (TSPITR)
Flashback technology

In Oracle 12c, RMAN has been enhanced to perform recovery of table. We have a new command in RMAN which automates complete process of recovering the table. New process does not affect the existing objects in the database and reduces the time and space required for doing table recovery.

How table recovery works?

RMAN uses the database backups taken previously to recovery table or table partitions to a specified time. You need to provide following inputs to RMAN
- Table name or partition names to be recovered
- Point in time until which the recovery should be done
- Whether recovered tables should be imported back to original database
RMAN determines the correct backup based on your inputs
RMAN creates auxiliary instance
RMAN restores the controlfile
RMAN restores necessary files required for obtaining the older image of table. This might include SYSTEM, SYSAUX, UNDO tablespace and tablespace which has required table.
RMAN recovers the table or table partitions in auxiliary instance until the specified time.
RMAN creates data pump export dump file that contains the recovered objects
RMAN imports the recovered objects into original database

Recovery Point-in-Time Options

You can recover a table or table partition to a past point in time by specifying one of the following three clauses:

UNTIL SCN – The system change number (SCN)
UNTIL TIME – Time in NLS_DATE_FORMAT environment variable
UNTIL SEQUENCE – The log sequence number and thread number

Steps for performing table recovery

Let’s take an example. I have a table T1 created in DEO database.

[oracle@advait ~]$ sqlplus advaitd/oracle@deo

SQL> select * from T1;

      COL1
----------
         1
         2
         3

We can note down the current SCN when table is available


SQL> select current_scn from v$database;

CURRENT_SCN
-----------
    1243251

Following are the datafiles available in my database


SQL> select file#, name, con_id from v$datafile order by 3,1;  

     FILE# NAME                                                                                 CON_ID
---------- -------------------------------------------------------------------------------- ----------
         1 /u01/app/oracle/oradata/orcl/orcl/datafiles/system-01.dbf                                 0
         2 /u01/app/oracle/oradata/orcl/orcl/datafiles/sysaux-01.dbf                                 0
         3 /u01/app/oracle/oradata/orcl/orcl/datafiles/undotbs1-01.dbf                               0
         4 /u01/app/oracle/oradata/orcl/orcl/users.dbf                                               0
         5 /u01/app/oracle/oradata/orcl/orcl/datafiles/mgmt_depot.dbf                                0
         6 /u01/app/oracle/oradata/orcl/orcl/datafiles/mgmt.dbf                                      0
         7 /u01/app/oracle/oradata/orcl/orcl/datafiles/mgmt_ad4j.dbf                                 0

Lets drop the table T1


SQL> drop table T1;

Table dropped.

We can use below command to recover this table. Note that the SCN number we are giving below is the SCN number we noted down when table was existing.

RMAN> recover table advaitd.t1 until scn 1243251 auxiliary destination '/u01/app/oracle/oradata/aux';

Let’s see what this command does exactly.

Step 1) Command creates and mounts a new adhoc dummy instance as shown below


RMAN> recover table advaitd.t1 until scn 1243251 auxiliary destination '/u01/app/oracle/oradata/aux';

Starting recover at 26-JUL-16
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=107 device type=DISK
RMAN-05026: WARNING: presuming following set of tablespaces applies to specified Point-in-Time

List of tablespaces expected to have UNDO segments
Tablespace SYSTEM
Tablespace UNDOTBS1

Creating automatic instance, with SID='jCvj'

It uses the controlfile autobackup to restore the controlfile to mount the dummy instance.

Step 2) It then restores the controlfile for the auxiliary instance

Following memory script shows the commands for restoring controlfile


contents of Memory Script:
{
# set requested point in time
set until  scn 1243251;
# restore the controlfile
restore clone controlfile;
 
# mount the controlfile
sql clone 'alter database mount clone database';
 
# archive current online log 
sql 'alter system archive log current';
}

Step 3) Next it restores the tablespaces required to perform tablespace PITR

Following memory script shows that it’s going to restore the required datafiles (until SCN 1243251) into auxiliary destination and switch the datafiles to copy.
When we use UNTIL SCN, RMAN restores the datafiles which are little older then this SCN.


contents of Memory Script:
{
# set requested point in time
set until  scn 1243251;
# set destinations for recovery set and auxiliary set datafiles
set newname for clone datafile  1 to new;
set newname for clone datafile  3 to new;
set newname for clone datafile  2 to new;
set newname for clone tempfile  1 to new;
# switch all tempfiles
switch clone tempfile all;
# restore the tablespaces in the recovery set and the auxiliary set
restore clone datafile  1, 3, 2;
 
switch clone datafile all;
}

Step 4) Recover the datafiles until required SCN specified

Below memory script shows that RMAN tries to recover these datafiles until the specified SCN. Once recovered, it opens the database in read only mode.


contents of Memory Script:
{
# set requested point in time
set until  scn 1243251;
# online the datafiles restored or switched
sql clone "alter database datafile  1 online";
sql clone "alter database datafile  3 online";
sql clone "alter database datafile  2 online";
# recover and open database read only
recover clone database tablespace  "SYSTEM", "UNDOTBS1", "SYSAUX";
sql clone 'alter database open read only';
}

Step 5) Create SPFILE for auxiliary instance and mount the DB again

Following memory script tell us that RMAN is creating SPFILE for the auxiliary instance and including CONTROL_FILES parameter in SPFILE and mounting the auxiliary instance again


contents of Memory Script:
{
   sql clone "create spfile from memory";
   shutdown clone immediate;
   startup clone nomount;
   sql clone "alter system set  control_files = 
  ''/u01/app/oracle/oradata/aux/DEO/controlfile/o1_mf_ct8t8hyq_.ctl'' comment=
 ''RMAN set'' scope=spfile";
   shutdown clone immediate;
   startup clone nomount;
# mount database
sql clone 'alter database mount clone database';
}

Step 6) Restore USERS tablespace where the table belongs

Following memory script shows that RMAN is trying to restore USERS tablespace (datafile 0004) and switching to the copy.


contents of Memory Script:
{
# set requested point in time
set until  scn 1243251;
# set destinations for recovery set and auxiliary set datafiles
set newname for datafile  4 to new;
# restore the tablespaces in the recovery set and the auxiliary set
restore clone datafile  4;
 
switch clone datafile all;
}

Step 7) Recovering the auxiliary database until required SCN

Following memory script shows that RMAN is trying to recover USERS tablespace and other tablespaces that it has restored previously to required SCN. It also opens the database in resetlogs.


contents of Memory Script:
{
# set requested point in time
set until  scn 1243251;
# online the datafiles restored or switched
sql clone "alter database datafile  4 online";
# recover and open resetlogs
recover clone database tablespace  "USERS", "SYSTEM", "UNDOTBS1", "SYSAUX" delete archivelog;
alter clone database open resetlogs;
}

Step 8) Create directory for datapump export and export the required table

Following memory script shows that RMAN is creating a directory for exporting the table and it also exports the table


contents of Memory Script:
{
# create directory for datapump import
sql "create or replace directory TSPITR_DIROBJ_DPDIR as ''
/u01/app/oracle/oradata/aux''";
# create directory for datapump export
sql clone "create or replace directory TSPITR_DIROBJ_DPDIR as ''
/u01/app/oracle/oradata/aux''";
}
executing Memory Script

sql statement: create or replace directory TSPITR_DIROBJ_DPDIR as ''/u01/app/oracle/oradata/aux''

sql statement: create or replace directory TSPITR_DIROBJ_DPDIR as ''/u01/app/oracle/oradata/aux''

Performing export of tables...
   EXPDP> Starting "SYS"."TSPITR_EXP_jCvj_rjBm":  
   EXPDP> Estimate in progress using BLOCKS method...
   EXPDP> Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
   EXPDP> Total estimation using BLOCKS method: 64 KB
   EXPDP> Processing object type TABLE_EXPORT/TABLE/TABLE
   EXPDP> Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
   EXPDP> Processing object type TABLE_EXPORT/TABLE/STATISTICS/MARKER
   EXPDP> . . exported "ADVAITD"."T1"                              5.062 KB       3 rows
   EXPDP> Master table "SYS"."TSPITR_EXP_jCvj_rjBm" successfully loaded/unloaded
   EXPDP> ******************************************************************************
   EXPDP> Dump file set for SYS.TSPITR_EXP_jCvj_rjBm is:
   EXPDP>   /u01/app/oracle/oradata/aux/tspitr_jCvj_52000.dmp
   EXPDP> Job "SYS"."TSPITR_EXP_jCvj_rjBm" successfully completed at Tue Jul 26 19:55:13 2016 elapsed 0 00:00:21
Export completed

Step 9) Importing the table to original database

Following memory script shows, RMAN is trying to import the table to original database. You can provide additional options to prevent RMAN to import the table or you can provide import options to remap the table to different name if required or append the content to existing table.


contents of Memory Script:
{
# shutdown clone before import
shutdown clone abort
}
executing Memory Script

Oracle instance shut down

Performing import of tables...
   IMPDP> Master table "SYS"."TSPITR_IMP_jCvj_tCzd" successfully loaded/unloaded
   IMPDP> Starting "SYS"."TSPITR_IMP_jCvj_tCzd":  
   IMPDP> Processing object type TABLE_EXPORT/TABLE/TABLE
   IMPDP> Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
   IMPDP> . . imported "ADVAITD"."T1"                              5.062 KB       3 rows
   IMPDP> Processing object type TABLE_EXPORT/TABLE/STATISTICS/TABLE_STATISTICS
   IMPDP> Processing object type TABLE_EXPORT/TABLE/STATISTICS/MARKER
   IMPDP> Job "SYS"."TSPITR_IMP_jCvj_tCzd" successfully completed at Tue Jul 26 19:55:24 2016 elapsed 0 00:00:04
Import completed

Step 10) Finally, RMAN will clean up and remove the auxiliary instance


Removing automatic instance
Automatic instance removed
auxiliary instance file /u01/app/oracle/oradata/aux/DEO/datafile/o1_mf_temp_csgwmp8s_.tmp deleted
auxiliary instance file /u01/app/oracle/oradata/aux/JCVJ_PITR_DEO/onlinelog/o1_mf_2_csgwo8fy_.log deleted
auxiliary instance file /u01/app/oracle/oradata/aux/JCVJ_PITR_DEO/onlinelog/o1_mf_1_csgwo7ht_.log deleted
auxiliary instance file /u01/app/oracle/oradata/aux/JCVJ_PITR_DEO/datafile/o1_mf_users_csgwo53j_.dbf deleted
auxiliary instance file /u01/app/oracle/oradata/aux/DEO/datafile/o1_mf_sysaux_csgwkm4b_.dbf deleted
auxiliary instance file /u01/app/oracle/oradata/aux/DEO/datafile/o1_mf_undotbs1_csgwkm4f_.dbf deleted
auxiliary instance file /u01/app/oracle/oradata/aux/DEO/datafile/o1_mf_system_csgwkm2x_.dbf deleted
auxiliary instance file /u01/app/oracle/oradata/aux/DEO/controlfile/o1_mf_csgwkf6f_.ctl deleted
auxiliary instance file tspitr_jCvj_52000.dmp deleted
Finished recover at 26-JUL-16

RMAN>

You can verify that you got the required table or not


SQL> select * from T1;

      COL1
----------
         1
         2
         3

As you can see all the steps are performed by RMAN automatically once we execute single RMAN command to recover the table.

This was a small test database, but in case of production databases, we have to make sure that auxiliary destination has required space available for restoring the required tablespaces. Else table recovery will fail.

Hope this helps !!

Monitoring transaction recovery

Published on August 4, 2016 by Advait2 Comments

Sometimes we end up in a situation where our long running transaction is not completing and we are also not sure how much further time it’s going to take. This happened with one of our DBA where they found MLOG to be bloated because of one orphan snapshot entry. Orphan entries are the one where actual site is not registered on master (no entry in DBA_REGISTERED_SNAPSHOTS), but they see entry for MLOGS (entry in DBA_SNAPSHOT_LOGS). This could happen if we try to drop snapshot from downstream database and it does not get cleaned up on upstream databases.

So in the situation that I faced, upstream team had MLOG which was bloated to 18GB and MLOG also had an index which was bloated to 30GB. (ya, I know its bad :-))

So they identified the orphan snapshot ID and they wanted to purge that from snapshot log to reduce the size of MLOG (after they move the MLOG and rebuild the index after doing the purge).

They used following procedure of DBMS_SNAPSHOT for purging snapshot ID from log

PROCEDURE PURGE_SNAPSHOT_FROM_LOG
 Argument Name                  Type                    In/Out Default?
 ------------------------------ ----------------------- ------ --------
 SNAPSHOT_ID                    BINARY_INTEGER          IN

After they started the activity in the morning and monitoring the same until evening, it was still not complete. I helped them in tracking the progress by checking real time SQL monitoring report and it was showing that session has already read around 60GB and undo used until that time was around 48GB. It was still not clear how the command has read 60GB worth of data when MLOG size was only 18GB.

Also, original base table was just 2GB.

At this point they wanted to kill the session. But killing the session will not help immediately as it has to perform huge rollback as well (48GB of UNDO).

But since command was not completing and took almost entire shift, they decided to kill the session. So session was killed using “ALTER SYSTEM KILL SESSION ‘<sid>,<serial#>’ immediate” and session was marked for kill. But session was just marked as killed and it was still holding the lock (if we check in V$LOCK view). This was because session was doing the rollback. We can monitor the progress of rollback using V$TRANSACTION view

You can look at used_ublk in V$transaction to estimate how long it is going
to take to complete the rollback.

SQL> SELECT a.used_ublk 
 FROM v$transaction a, v$session b
 WHERE a.addr = b.taddr AND b.sid = <SID>;

For example:

If used_ublk showed 29,900 12 hours ago and is now 22,900, it has taken 12 hours to rollback 7,000 entries. It will take approximately another 36 hours to complete depending on the types of transactions that are rolling back.

Recovery was very slow as session was doing serial recovery. Next we found the OS PID of the session and killed the OS process as well so that recovery can happen in the background using SMON. Within few mins PMON performed the clean up and lock was released.

Rollback continued in the background and this is faster than the rollback performed by the session. If we kill the session and the shadow process at OS level, SMON picks up the rollback part and it goes for parallel rollback, which is faster.

V$FAST_START_TRANSACTIONS & X$KTUXE

We can monitor the progress of rollback in V$FAST_START_TRANSACTIONS view.

V$fast_start_transactions -> contains one row for each one of the transactions that Oracle is recovering in Parallel.

FAST_START_PARALLEL_ROLLBACK shows the maximum number of processes which may exist for performing parallel rollback.

In fast-start parallel rollback, the background process SMON acts as a coordinator and rolls back a set of transactions in parallel using multiple server processes.

Fast start parallel rollback is mainly useful when a system has transactions that run a long time before committing, especially parallel Inserts, Updates, Deletes operations. When SMON discovers that the amount of recovery work is above a certain threshold, it automatically begins parallel rollback by dispersing the work among several parallel processes.

The following queries are available to monitor the progress of the transaction recovery

set linesize 100 
 alter session set NLS_DATE_FORMAT='DD-MON-YYYY HH24:MI:SS'; 
 select usn, state, undoblockstotal "Total", undoblocksdone "Done", undoblockstotal-undoblocksdone "ToDo", 
 decode(cputime,0,'unknown',sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) 
 "Estimated time to complete" 
 from v$fast_start_transactions;

Run the above query several times in a row, this will give you a good idea on how SMON is progressing.

In some versions the cputime does not work (always 0), hence the estimated completion time will not be displayed
In some cases the v$fast_start_transactions view will not work. If this is the case then you can query the internal data dictionary view x$ktuxe

The ‘ktuxesiz’ column represents the remaining number of undo blocks required for rollback:

select ktuxeusn, to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') "Time", ktuxesiz, ktuxesta 
 from x$ktuxe 
 where ktuxecfl = 'DEAD';

I was not able to see recover progress using V$FAST_START_TRANSACTIONS, but I was able to see the progress in x$ktuxe view.


select ktuxeusn, to_char(sysdate,'DD-MON-YYYY HH24:MI:SS') "Time", ktuxesiz, ktuxesta 
from x$ktuxe 
where ktuxecfl = 'DEAD'; 

  KTUXEUSN|Time                      |  KTUXESIZ|KTUXESTA
----------|--------------------------|----------|----------------
      2167|01-AUG-2016 12:05:14      |   5260156|ACTIVE
SQL>/

  KTUXEUSN|Time                      |  KTUXESIZ|KTUXESTA
----------|--------------------------|----------|----------------
      2167|01-AUG-2016 12:05:15      |   5259945|ACTIVE
SRW1NA>/

  KTUXEUSN|Time                      |  KTUXESIZ|KTUXESTA
----------|--------------------------|----------|----------------
      2167|01-AUG-2016 12:05:15      |   5259854|ACTIVE

..
..
..
<After 2-3 hours>

  KTUXEUSN|Time                      |  KTUXESIZ|KTUXESTA
----------|--------------------------|----------|----------------
      2167|01-AUG-2016 16:31:47      |    612697|ACTIVE

Speeding up recovery

We can further improve the speed of recovery by taking following steps

1) There are cases where parallel transaction recovery is not as fast as serial transaction recovery, because the pq slaves are interfering with each other. To check the Parallel Recovery processes and there state query:

select * from v$fast_start_servers;

Column STATE shows the state of the server being IDLE or RECOVERING, if only 1 process is in state RECOVERING while the other processes are in state IDLE, then you should disable Parallel Transaction Recovery. How to do this is outlined in the following note:

Note 238507.1: How to Disable Parallel Transaction Recovery When Parallel Txn Re very is Active

2) If all the rows are showing RECOVERING in STATE column of v$fast_start_servers, then you will get benefitted if you add more threads for doing the recovery.

You can do so by setting value of FAST_START_PARALLEL_ROLLBACK parameter. You should set a value of HIGH if you want to speed up the recovery.

Following are the different values of this parameter

FALSE – Parallel rollback is disabled
LOW – Limits the maximum degree of parallelism to 2 * CPU_COUNT
HIGH -Limits the maximum degree of parallelism to 4 * CPU_COUNT

Note that, this parameter is not dynamic and needs database bounce. Also, If you change the value of this parameter, then transaction recovery will be stopped and restarted with the new implied degree of parallelism. So if you are already done more than half the rollback and you think its not worth to change this parameter, you can leave it. Else if you still change this parameter, recovery will start from the beginning again.

3) Increase the parameter ‘_cleanup_rollback_entries’

This parameter determines number of undo entries to apply per transaction cleanup. The default value is 100. You can change that to, say 400.This parameter cannot be changed dynamically, so in order to change this the database will need to be restarted.

In our specific situation, we knew that huge rollback needs to be performed and we were monitoring the rollback progress from the beginning. So we made a decision at the very beginning to set FAST_START_PARALLEL_ROLLBACK to HIGH and bounce the DB. This improved recovery speed right from the beginning.

References:

SMON: Parallel transaction recovery tried (Doc ID 1458738.1) To BottomTo Bottom

Troubleshooting Database Transaction Recovery (Doc ID 1494886.1)

Database Hangs Because SMON Is Taking 100% CPU Doing Transaction Recovery (Doc ID 414242.1)

SMON: Parallel transaction recovery tried (Doc ID 1458738.1)

Note 238507.1: How to Disable Parallel Transaction Recovery When Parallel Txn Re very is Active