Oracle Database Recovery – Basic Concept

Its been a really long time since I wrote a post. These days seems to be very busy with all Fusion middle ware floating around me. Anyways, here I start again. This post is a kind of kickoff post. I was going through Expert one-on-one book by Thomas Kyte and read a wonderful basic recovery related article from the book. Ohh by the way…everyone should read Expert one-on-one atleast once in his/her life. A must read book, not only for DBA but specifically for developers.

So here is the recovery concept go. This is a very basic concept about how Oracle does crash recovery. I will be putting the same story here, but in my words.

We know the basic DML transactions are insert, update and delete. We will consider a simple transaction and understand : what data gets generated during the transaction, where it is stored and how oracle database recovers in case of crash.

Lets say an insert statement on EMP table

insert into EMP (EMPNO, NAME, JOB, SAL) values (10, ‘Deo’, ‘DBA’, 100);

Following things will happen

  • This statement will modify the data blocks in buffer cache
  • This statement will also fetch the index blocks in the buffer cache and modify the same
  • This statement will generate undo data in buffer cache
  • This statement will generate redo data in redo log buffer. This entries in redo log buffer is to protect the blocks in buffer cache.

So the picture in SGA will looks as shown below

1

So till now the user has executed an insert statement, its not yet committed. The blocks in buffer cache has been protected by entries in redo log buffer.

What if the system crashes at this stage?

If the system crashes at this point, then everything in the memory will get wiped out. It will be like nothing happened. There is no record of this transaction in any of the files (datafiles or redo log file). This is OK. We dont need anything that was in SGA.

What if the buffer cache fills up at this stage?

We are have done the insert, but we have not yet committed. Changed blocks are in buffer cache. Also redo blocks are not yet flushed to online redo log files and are still present in redo log buffer. At this stage suppose db buffer cache fills up. Now DBWR process needs to make a room for new blocks to be fetched for some other user session. In that case DBWR will ask LGWR process to flush the redo log buffer into online redo log files. So the redo log records generated by insert transaction will get flushed to online redo log file. Note that we have NOT yet commited the insert record.

This behavior of DBWR to ask LGWR process to flush redo data is perfect. imagine what will happen if the data in buffer cache gets written to database files before redo data gets flushed into redo log files. If system craches at a point after buffer cache data gets updated in data files and before redo data from redo log buffer gets flushed into online redo log files, then when the database reboots, there is no way to tell whether our insert transaction was commetted or not. Since our transaction was not commited, how will Oracle know that it needs to rollback the insert transaction. That’s why data in redo log buffer gets flushed to online redo log files first. The scenario will look like as shown in fig 2 below.

2

Now the blocks in database buffer cache are protected by the entries in redo log files. Some of this blocks might get written to datafiles in order for DBWR process to make room, or none of them gets written to datafiles if DBWR process is able to make room without touching these blocks. It depends.

What if the system crashes at this stage?

At this stage if the system crashes then all the data in memory will get wiped out. The data blocks in buffer cache will be gone. But we still have the entries in online redo log files. So when the oracle performs crash recovery it reads the online redo log file and roll-forward the transactions. Redo log file has data to regenerate the data blocks that were present in the buffer cache before system crash. So oracle will read the redo log files and generate the data block, the index block and undo data block as well. Note here that undo data block is also treated here as a normal data block and changes to undo data block will make redo entries. So while doing crash recovery it will generate the 3 data blocks and scenario will be similar to fig 2. On further reading of redo log file, oracle will come to know that this insert transaction was never committed. So Oracle will use the undo data generated for this transaction to rollback the data. This will delete the insert entry.

Now before system crash if the data in buffer cache has been applied to datafiles then oracle will reapply the latest data generated and overwrite the previously inserted data. Now everything is back the way it was.

This scenario covers the rudimentary details of crash recovery. Here the recovery is a 2 step process, first oracle roll forwards all the transactions and then it starts rolling back the transactions which are not committed (from the undo data that it generates during roll forward process).

Similar concept applies to the update and delete statements as well.

What happens to the undo data?

The undo data which gets generated during transactions (example in the above scenerio) will be stored for some time. If there is no space left in memory this undo data is flushed to undo segments in undo tablespace and it will reside in undo tablespace for a minimum time period defined by UNDO_RETENTION init.ora paramter. This undo data is used during rolling back a transaction. Please note that for rolling back a transaction redo log files are never used. Redo logs and archive logs are referred (or read) only during recovery of database. During all other time the data is written to redo logs files but never read from it. So for rolling back the transaction undo data saved in undo tablespace is required.

Hope this helps !!

References:

Expert One-on-One – By Thomas Kyte

‘    


    
’


,
*

  



 


    

’    

    
=
 






Oracle Database Recovery Details

Recovery Fundamentals:

This post is to give you information about various recovery fundamental details and how recovery works.

We will start by looking at various SCNs and where they are stored.

There are 3 SCNs basically in control file

  1. Checkpoint SCN
  2. Stop SCN
  3. Thread checkpoint SCN

Checkpoint SCN is the datafile checkpoint SCN when checkpoint happens for datafile. This checkpoint SCN is recorded in datafile header as well.

Stop SCN is the SCN which gets recoreded in control file when datafile is taken in begin backup mode or when datafile is taken offline. This is the checkpoint at a point when datafile header is freezed.

Thread Checkpoint SCN is the one related to online redo log files. This SCN gets generated when ever transaction get recoreded in online redo log file.

When we shut down database with normal or immediate option, all these SCN are synchronized and made equal.

Lets take a quick example:

1) System checkpoint SCN in controlfile

SQL> select checkpoint_change# from v$database;

CHECKPOINT_CHANGE#
——————
3700901
2) Datafile checkpoint SCN in controlfile

SQL> select name, checkpoint_change# from v$datafile
2 where name like ‘%htmldb%’;

NAME
——————————————————————————–
CHECKPOINT_CHANGE#
——————
/dy/oracle/product/db10g/dbf/htmldb01.dbf
3700901

3) Stop SCN in control file

SQL> select name, last_change# from v$datafile
2 where name like ‘%htmldb%’;

NAME
——————————————————————————–
LAST_CHANGE#
————
/dy/oracle/product/db10g/dbf/htmldb01.dbf

4) Start SCN in datafile header

SQL> select name, checkpoint_change# from v$datafile_header
2 where name like ‘%htmldb%’;

NAME
——————————————————————————–
CHECKPOINT_CHANGE#
——————
/dy/oracle/product/db10g/dbf/htmldb01.dbf
3700901

Shut down the database now and start in mount mode

SQL> shut immediate
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup mount
ORACLE instance started.

Total System Global Area 1073741824 bytes
Fixed Size 1984184 bytes
Variable Size 243276104 bytes
Database Buffers 822083584 bytes
Redo Buffers 6397952 bytes
Database mounted.
SQL> select checkpoint_change# from v$database;

CHECKPOINT_CHANGE#
——————
3722204

SQL> select name, checkpoint_change# , last_change# from v$datafile
2 where name like ‘%htmldb%’;

NAME
——————————————————————————–
CHECKPOINT_CHANGE# LAST_CHANGE#
—————— ————
/dy/oracle/product/db10g/dbf/htmldb01.dbf
3722204 3722204

All these SCN values are coming from control file. Here you can see that last_change# from v$datafile was showing NULL. But when we shut down the database this value got updated to same as checkpoint_change#. This last_change# is the stop SCN and checkpoint_change# is the start SCN. So when we shutdown the database it run a checkpoint and makes start SCN = stop SCN.

Lets check the SCN in datafile header

SQL> select name, checkpoint_change# from v$datafile
2 where name like ‘%htmldb%’;

NAME
——————————————————————————–
CHECKPOINT_CHANGE#
——————
/dy/oracle/product/db10g/dbf/htmldb01.dbf
3722204
So here we see that datafile header is having same checkpoint # as system checkpoint number.

How oracle decides whethere recovery is required?

When database is started, Oracle checks the system SCN stored in control file and datafiles header. It compared system SCN which each datafile header and it those matches, then next it checks the start SCN and stop SCN in datafile headers, if those are also same then it will open the database else it as for recovery.
Also as soon as we open the database the last_change# in v$datafile_header will be set to NULL again.

Now shutdown the database with abort option.

SQL> shut abort
ORACLE instance shut down.
SQL> startup mount;
ORACLE instance started.

Total System Global Area 1073741824 bytes
Fixed Size 1984184 bytes
Variable Size 243276104 bytes
Database Buffers 822083584 bytes
Redo Buffers 6397952 bytes
Database mounted.
SQL> select checkpoint_change# from v$database;

CHECKPOINT_CHANGE#
——————
3722206

So we can see thet system checkpoint # is 3722206

SQL> select name, checkpoint_change# , last_change# from v$datafile
2 where name like ‘%htmldb%’;

NAME
——————————————————————————–
CHECKPOINT_CHANGE# LAST_CHANGE#
—————— ————
/dy/oracle/product/db10g/dbf/htmldb01.dbf
3722206
Here you can see that datafile header checkpoint SCN is also 3722206, but stop SCN # in controlfile is NULL. If shutdown checkpoint would have happened, then it would have updated the stop SCN for controlfile. But since we used “shut abort”, no checkpoint happened during shutdown. This situation is called “crash recovery”. Here the start SCN of datafile header and stop SCN of datafile header are not matching. This kind of situation is automatically taken care by Oracle. When you open the database, oracle automatically applies the transaction from redo log files and undo tablespace and it will recover the database. Problem happens when system SCN # does not match with datafile header start SCN. This is called “instance recovery”.

During start of database Stop SCN = NULL => Needs crash recovery
During Start of database DATAFILE HEADER START SCN != SYSTEM SCN in control file => Media recovery

When doing media recover we can have 2 situations

1) Datafile header SCN is less then datafile SCN stored in control file.

So when you open the database, Oracle checks the SCN number of datafile present in datafile header and control file. If the SCN matches it will open the datafile, else it will ask for recovery. Now when it ask for recovery, it will check the start SCN of datafile in datafile header. From this SCN onwards it needs recovery. So all the logs having this SCN number and beyond is required for recovery.

2) Datafile header SCN is more then datafile SCN stored in control file.

This kind of situation happens when you use backup control file or when you are recovering using “Backup controlfile”. In such situation since datafile header SCN is higher then control file, Oracle really doesn’t know till what SCN to recover. So you tell Oracle that you are using a “backup controlfile” and that you will tell it when to stop applying redo by replying “cancel.” When Oracle starts recovery, it looks at the datafiles to know the last time a checkpoint was performed on the datafile. Oracle now knows to start applying recovery to the datafile for all SCNs after the SCN in the datafile header. But Oracle does not know when to stop, and eventually, Oracle applies recovery in all of your archived redo logs. You can then tell Oracle to use the redo in the online redo logs. Oracle will ask you where to find more redo. At this point, you tell it to quit applying redo by replying CANCEL.

Once we open in reset logs mode, SCN numbers are synchronized in datafiles and controlfiles and redo sequence numbers are reset to 1.

References:

http://jenniferlinca.wordpress.com/2008/02/08/what-is-the-scn/#comments

Blogged with the Flock Browser

Tags: , , ,

Excess redo log generation during Hot Backup – Oracle 9i

Many of you must have heard or experienced that while taking hot backup of database LGWR process writes aggressively. Meaning that more redo data has been written to redo log file and consecutively more archive logs gets generated.

Here is the common misconception we have in our mind. If some one ask, why excessive redo logs and archive logs are getting generated when we start a hot backup of database ?? Quickly we answer .. Its simple, when we put tablespace in hot backup mode, Oracle will take a check point of tablespace and data files belonging to this tablespace will be freezed. Any user activity happening on objects belonging to this tablespace wont write data to these datafiles, instead it will write data to redo log files. So obviously there will be more redo log file generation.

Well, to some extent this is COMPLETELY WRONG !!!

I will straight way come to the point and explain you what happens when we put the tablespace in hot backup mode.

Your first assumption that datafiles belonging to the tablespace in hot backup mode is freezed is wrong. Datafiles are not freezed, only the datafile headers will be freezed !! So simply imagine that when you put the tablespace in backup mode, Oracle will take a checkpoint and update the datafile headers with checkpoint SCN and there after it is freezed until we take tablespace out of backup mode.

Other datafile (other then header part) remains as normal and data changes happens continuously to this datafile.

Now you may want to ask me “do I mean to say that datafiles gets updated continuously even when we are coping the same to backup location ?”. The answer is YES. Never think that the datafile you are coping is “Consistent”. No, datafiles gets changed continuously !!!

You might want to ask couple of more questions then.

1) If we say that backup file is not consistent and changes continuously, then how come Oracle is able to recover the database when we restore that datafile?

2) If the data changes are anyway happening continuously on data files, then why there is excess redo log generation ?

Thats it !! don’t ask me more then this. Let me explain answers to these questions.

Consider a typical case, where an Oracle database is installed on Linux platform. The standard Oracle block size if 8K and lets say that OS level data block size is 512K. Now when we put the tablespace in “Begin Backup” mode checkpoint has happened and datafile header is freezed. You found which are the files related to this tablespace and started copying using OS command. Now when you copy a datafile using OS command it is going to copy as per OS block size. Lets say when you start copying it gave 8 blocks to you to copy – that means you are copying 4K (512K X 4) to backup location. That means you are copying half of Oracle block to backup location. Now this process of copy can be preempted by Server CPU depending on load. Lets say when you started copying after copy of those 8 block (4K, half of Oracle block), your process get preempted by CPU and it has allocated CPU time to some other important process. Mean while DBWR process changes that block that you have copied halfway (since datafile is not freezed and only header is freezed, continuous updates can happen to datafile).

After a while CPU returns back and gives you next 8 blocks to copy (rest of the halk Oracle block). Now here is the problem !!! we copied half of the oracle block taken at time T0 and another half taken at time T1 and in-between the data block got changed. Does this sounds consistent ? Not to me !! Such type of block is called “Fractured Block”.

Well, since Oracle copies files like this it should do some thing, so that during recovery it wont face any problem.

Usually in case of a normal tablespace (which is not in begin backup mode), when a transaction happens oracle generates redo information and puts in redo log file. This is the bare minimum information that oracle generates in order to redo the information. It does not copy the complete block. Where as in case of begin backup mode, if a transaction happens and changes any block FOR THE FIST TIME, oracle copies the complete block to redo log file. This happens only during first time. If subsequent transaction updates the same block again, oracle will not copy the complete block to redo, instead it will generate minimum information to redo the changes. Now because oracle has to copy the complete block when it changes for the first time in begin backup mode, we say that excess redo gets generated when we put tablespace in begin backup mode.

Question arises, why Oracle has to copy the complete block to redo log files. As you have seen above that during copy of datafile, there can be many fractured blocks, and during restore and recovery its going to put those block back and try to recover. Now assume that block is fractured and oracle has minimum information that it generates in the redo. Under such condition it wont be possible for Oracle to recover such blocks. So instead Oracle just copies the entire block back from redo log files to datafiles during recovery process. This will make the datafile consistent. So recovery process is very important which takes care of all fractured blocks and makes it possible to recover a database.

I hope this explains above 2 questions.

Now you can easily explain why hot backup is not possible if database is in NOARCHIVELOG mode.

When you take a backup using RMAN, it does not generate excessive redo logs. The reason is simple. RMAN is intelligent. It does not use OS block for copying, instead it uses oracle blocks for copying datafiles so the files are consistent.

Hope this helps !!