September 2013 – Persistent Storage Solutions

What it used to be

After Oracle 9.2, there has been a significant changes in redo behavior. Before 10g, Oracle used to have a single log buffer for writing redo information which eventually gets written to online redo logs in round robin fashion. Of course if your database is in archive log mode information from online redo logs will flow to archive logs before that online redo log file gets overwritten.

Why changed

Oracle 10g onwards behaviour has changed because of contention. You see whenever a session makes a change, oracle has to write the change vector to redo buffer and for that it has to acquire a latch “redo allocation”. If you have just single log buffer, Oracle will have a single latch to protect that memory area and any session making changes have to acquire this latch to write redo change vector to redo buffer.
Think about an OLTP system with many session doing many transactions at the same time. This can cause serious contention for redo allocation latch and was not a scalable solution.

What changed

So from 10g onwards, Oracle has split log buffer into multiple buffer – these are called public redo strands. In 10g it was controlled by parameter log_parallelism and in 11g its controlled by hidden parameter _log_parallelism_max. Default value for _log_parallelism_max is decided by number of CPU. Minimum value for this parameter is 2 for CPU count 16 or below. For every 16 CPUs _log_parallelism_max increases by 1. So for 64 CPU machine _log_parallelism_max will default to 4.

You can check number of public strands in your database using following SQL

SQL> select
 PTR_KCRF_PVT_STRAND ,
 FIRST_BUF_KCRFA ,
 LAST_BUF_KCRFA ,
 TOTAL_BUFS_KCRFA ,
 STRAND_SIZE_KCRFA ,
 indx
from
 x$kcrfstrand ;
PTR_KCRF_PVT_STR FIRST_BUF_KCRFA LAST_BUF_KCRFA TOTAL_BUFS_KCRFA STRAND_SIZE_KCRFA INDX
---------------- ---------------- ---------------- ---------------- ----------------- ----------
00 00000000726BD000 0000000075166E00 87376 44736512 0
00 0000000075167000 0000000077C10E00 87376 44736512 1
00 0000000077C11000 000000007A6BAE00 87376 44736512 2
00 000000007A6BB000 000000007D164E00 87376 44736512 3
00 000000007D165000 000000007FC0EE00 87376 44736512 4
00 000000007FC0F000 00000000826B8E00 87376 44736512 5

As you can see from the output, I have 6 strands of around 43MB each. So my log_buffer must be 43 * 6 ~= 256 MB

SQL>select value from v$parameter where name = 'log_buffer';
VALUE
--------------------
268419072

You can also see that for 6 public redo strands, we have 6 latches to protect those memory structures.

SQL>select ADDR, LATCH#, CHILD#, NAME from v$latch_children where name like 'redo allocation%' order by child#;
ADDR LATCH# CHILD# NAME
---------------- ---------- ---------- ------------------------------
0000000FE1971430 187 1 redo allocation
0000000FE19714D0 187 2 redo allocation
0000000FE1971570 187 3 redo allocation
0000000FE1971610 187 4 redo allocation
0000000FE19716B0 187 5 redo allocation
0000000FE1971750 187 6 redo allocation
6 rows selected.

Please note that I am not talking about private redo strands in this post. Private redo strands and in memory undo can be configured separately to further reduce the contention and improve performance. I will talk about private redo strands in my next post.

The Myth

One of the myth is that log switch happens whenever online redo log file is 100% full
Well, its not like that. There is an internal algorithm that determines the log switch moment. This also has a very good reason because Oracle research finds that doing the log switch at the last moment could incur performance problems. As a result, after the log switch occurs, the archivers are copying only the actual information from the redo log files. Since the redo logs are not 100% full after the log switch and the archive logs are not blank padded after the copy operation has finished, this results in uneven, smaller files than the original redo log files.

How log switch happens

So there is an internal algorithm which decides when log switch should happen. I dont know if this internal algorithm is publicized anywhere, but based on my understanding and putting togetther many random articles, I think here is what happens.

When the RDBMS switches into a new online redo logfile, all the log buffer redo strand memory is “mapped” to the logfile space. If the logfile is larger than the log buffer then each strand will map/reserve its strand size worth of logfile space, and the remaining logfile space (the “log residue”) is still available.

Following figure shows the situation

If online redo log file size is smaller than log buffer then the whole logfile space is divided/mapped/reserved equally among all the strands, and there is no unreserved space (ie no log residue).

Following figure shows the situation

When any process fills a strand such that all the reserved underlying logfile space for that strand is used, AND there is no log residue, then a log switch is scheduled.

Consider first case where we have residue space available in online redo logs. Now when a strand is full it will look out for another “mapping” space in redo log file equivalent to strand size. If its available it will use it.

This will continue until last set of equivalue strand space available in online redo log is allocated as shown in figure below

Now at this point if one of the strand is full and it doesnt have space available, it will trigger log switch. So if other strands are not full and we have space available in other strands, that space gets wasted. So if you have 4 strand and at the end one strand is full, you will have space wastage equivalent to size of 3 strands (which are not full).

But since online redo log file is big enough wastage of space equivalue to 3 strands doesnt really look big.

Online redo log size – 1GB
Space wastage – 3 strands = 3 * 64MB = 192MB

So archive log size would be around (1GB – 192MB) = 810MB
Around 20% wastage

Consider second case where we dont have any residue space available. With initial allocations, strands are mapped to online redo logs and when one of the strand is full, it will trigger a log switch as there is no space available in online redo log file. So same space got wasted (3 strands), but in this case since online redo log file is small we see considerable wastage of space.

Online redo log size – 256MB
Space wastage – 3 strands = 3 * 64MB = 192MB

So archive log size would be around (256 MB – 192 MB) = 64 MB
Around 75% wastage.

Disadvantages of small redo logs

So you can see if you online redo logs are very small (size near to log buffer), you will end up wasting lot of space in online redo logs causing frequent log switches.

Suggestions

Some suggestion to reduce log switches and to make use of all space in online redo logs

Make online redo log files atleast 4-6 times log buffer size. This ensures we have enough number of allocations of strands before log switch happens
Check optimum value of _log_parallelism_max. While Oracle decides default based on CPU, I would say its better to have lower value. Lower value will make less number of strands and wastage will reduce.
Check for optimum value of log_buffer. default value is usually 1MB * number of CPU. In my opinion its not good to go beyond 128MB. But it may vary with system and in your system you may find higher value to be better.

References

Oracle Forum – https://forums.oracle.com/thread/2346745

https://nzdba.wordpress.com/2011/12/18/hollow-redo-logs/

Doc ID 1356604.1

Month: September 2013

Redo Behavior