Latches: What do we know ?

Latches are low level serialization mechanism which protects memory areas inside SGA. They are light wait and less sophesticated than enqueues and can be acquired and released very quickly.

Latch acquisition does not involve any complex algorithm and is based on test-and-set atomic instruction of a computer processor.

Latch Classification:

Latch can be classified in multiple ways:-

Shared latch and exclusive latch:

Shared latch is the one which can be shared by multiple processes/sessions.

Example if a session wants to read a block in memory and at the same time other session also wants to read the same block in memory, they can acquire shared latch. Example of shared latch is “latch: cache buffer chain”. This latch was exclusive in older version and Oracle changed this to shared latch from Oracle 9i onwards.

Exclusive latch is the one which is acquired when a session wants to make modification to block or memory area. Exclusive latch can be held by only 1 session/process at a time and is not compatible with any other latch.

Both shared latch and exclusive latch are not compatible with each other. Process holding a shared latch will block other process which needs exclusive latch but will allow other process which needs shared latch on same memory area. Similarly process holding exclusive latch will not allow any other process which needs either shared latch or exclusive latch.

Example of exclusive latch is “library cache latches” in previous version. These were taken in exclusive mode even for traversing a hash bucket. These latches are no more present in 11g and they are replaced by mutex.

As per Andrey Nikolaev 460 out of 551 latches are exclusive in Oracle In each new version, Oracle tries to make latches more sharable in order to reduce contention.

Another classification of latches is immedaite latch and willing to wait latch

Immedaite latch and willing to wait latch:

Immediate latches are the one that session try to acquire immediately without waits. If the latch is not available, session will not wait and may get terminated or check for another latch. Example of immediate latch is redo copy latch. This is immediate latch because Oracle only wants to know if anyone else is currently copying redo data to log buffer, but not who exactly is copying and where, as this does not matter to LGWR.

Willing to wait latches are the one which will wait if the latch is not available. These latches have little complex behavior then immediate latches. Most of the latches in oracle are willing to wait latches

Why do we see wait events on shared latches ?

After reading about shared latches and exclusive latches, one question comes to mind regarding shared latches.

If “latch: cache buffer chain” latch is a shared latch (past 9i version) and they dont block other shared latches, why do we see “latch: cache buffer chain” latch wait events even in latest version of Oracle ?

The answer to this is explained by Andrey Nikolaev. Andrey has given a very practical example of why this happens. When multiple processes are trying to acquire shared latch (example CBC latch), we should not see any wait events or process blocking each other. The moment another session try to acquire exclusive lock on the resource (where other sessions were having shared latch), exclusive latches are given higher preference than shared latch. So session with exclusive latch will get access to resource and it will block all other sessions having shared latch. This will form a chain of sessions willing to acquire shared latch. Strange part of this algorithm is that even after exclusive latch has been released by the processing holding it, other processes cannot acquire shared latch concurrently. They still continue to be in latch queue and they get access to shared latch one by one. First process in queue will get shared latch and once its done with the access, it will release the latch and post the next process that latch is available. This is one of the major reason why we see wait events on shared latches.

Process flow for latch acquisition

Following is the process flow for acquiring a latch

 - Immediate latch gets 
       - spin latch gets
             - Add process to queue of latch waits
                   - Sleep until posted

The last step in process flow (sleep until posted) is a changed behavior. Initially until 9i, Oracle used to wake up periodically to check if latch has been freed or not and if latch is still not avaialble go back to sleep. So process flow in 9i used to be following

 - Immediate latch gets 
       - spin latch gets
             - Add process to queue of latch waits
                   - Sleep for fixed time and wake up
             - immediate latch gets
                   - Sleep for fixed time and wake up
             - immediate latch gets
                   - Sleep for fixed time and wake up

This behavior has been change from 10g onwards and holding process will wakeup the waiter process after the latch becomes free. This has been explain well by Tannel Podder. Also behavior is explained in more details by ALEX FATKULIN.

Some times, holding process was not able to wake up the process waiting for latch because of bugs or lower kernel version. So older version of Oracle used to have some default timeout available so that in case holding process miss the wakeup call, it will not wait indefinitely and will wakeup after timeout. Oracle has introduced a parameter _enable_reliable_latch_waits={true|false}, which alters this behavior. If this is set to true then no timeout is added and holding process continue to sleep until it gets posted by holding process. False represents the behavior otherwise.

Only latch which is exception to above process flow is “process allocation” latch. This latch does not depends on holding process to post when latch is free. It wakes up preriodically to acquire latch.

Latch Behavior

Oracle has introduced few parameters to control behavior of latch. We will discuss brief about those parameters and how they affect shared latch and exclusive latches

When process try to acquire latch, it attempts to acquire latch in immediate mode. If latch is not available in immediate mode, it will spin for defined number of times and try again. If still latch is not free, process will go to sleep.

Process Spin:

Spinning is a process of consuming/burning CPU so that process will stay “ON CPU” and not to get descheduled from CPU cycles. During spinning, process will burn CPU for few micro seconds with the hope that after passing that much time, latch will be available for it to acquire. This does increase CPU consumption, but it saves time as it may avoid the needing for the process to sleep (which is expensive as it involves context switches).

Number of times a process spins to acquire latch depends on type of latch.

Exclusive latch:

For exclusive latch, this “can be” controlled by _spin_count but needs database bounce. I said “can be” because exclusive latch spins ar actually decided by “spin” column in x$ksllclass table.

SQL>select indx,spin from x$ksllclass;

---------- ----------
0 20000
1 20000
2 20000
3 20000
4 20000
5 20000
6 20000
7 20000

8 rows selected.

There are 8 classes of latches (as indicated by indx column) that we will discuss later in this article. By default all latches belongs to class 0.
If we want to change spin count for exclusive latches, we need to change value of SPIN column for class 0. This can be done by changing _spin_count and bouncing the instance (but that will change spin count for all classes), or by setting _latch_class_0 parameter (which will change spin count for only class 0). We have similar parameter to change spin count for other classes (_latch_class_[0-7]).

So changing _spin_count is not a good idea. Instead we can move a specific latch for which we want to change the spin count to another class (1-7) and change specific underscore parameter (_latch_class_[1-7]).

By details _spin_count parameter is not applicable to exclusive latches as default value of _spin_count is 2000 where as exclusive latch spin for 20000 times as mentioned in above table x$ksllclass.
But changing _spin_count will make it applicable for exclusive latch as well.

Shared Latch:

For shared latch, number of times process spins is _spin_count * 2. This has been proved by Andrey Nikolaev. Also, _spin_count parameter is applicable to shared latches by default. So since default value of _spin_count is 2000, shared latches spins 4000 times (2000 * 2).

Diagnosing latch contention:

I am not going to mention much here because Tannel Podder has already written great script – latchprof.sql and latchprofx.sql which can be used to analyze latch wait events.

Tannel has also written great article on how to diagnose latch wait events –

In next article, I will try to cover mutex.

Hope this helps !!