CM terminated with status 139 (Segmentation fault) – Oracle Apps 11i

Problem Statement:

In one of the instance, I faced an issue regarding concurrent manager. None of the concurrent mangers were coming up. When I checked the Internal Monitor (Which is responsible for bring up Internal Manager) it was showing the status deactivated.

Click on Internal Monitor in CM Administration Screen -> Process -> Internal Manager logs

I see following error

======================================================================== Starting PQP10MS6_0729@PQP10MS6 Internal Concurrent Manager — shell process ID 14385 logfile=/slot02/appmgr/PQP10MS6comn/admin/log/PQP10MS6/PQP10MS6_0729.mgr PRINTER=noprint mailto=appmgr02 restart=N diag=N sleep=60 (default) pmon=20 (default) quesiz=1 (default) +—————————————————————————+ Application Object Library: Concurrent Processing version 11.5 Copyright (c) 1979, 1999, Oracle Corporation. All rights reserved. Internal Concurrent Manager started : 29-JUL-2008 22:34:31 +—————————————————————————+

Spawned Process 14472 Process monitor session started : 29-JUL-2008 22:34:31 Spawned Process 14473 Spawned Process 14474 Starting INVTMRPM Concurrent Manager : 29-JUL-2008 22:34:31

Spawned Process 14475 sh: line 71: 14390 Segmentation fault (core dumped) FNDLIBR FND CPMGR “FNDCPMBR sysmgr=\”\” $maxreq $sleep $pmon $quesiz $diag logfile=$logfile $target” <<STOP $sysmanager STOP The PQP10MS6_0729@PQP10MS6 internal concurrent manager has terminated with status 139 – giving up.

Cause:

stub files are incorrect and are referring to iAS stubs. Check metalink note ID 343249.1 for more details.

Solution:

Apply patch 3830807: PATCH 3293983 NEEDS TO INCLUDE ALL THE MAKEFILES UNDER THE 8.0.6 ORACLE_HOME

Once patch is applied relink all executables from adadmin -> 2. Maintain Applications Files menu -> 1. Relink Applications programs

Bounce all services. This will resolve the issue.

Hope this helps !!

References:

Metalink note ID 343249.1

Error Viewing CM log files – “File server could not verify its initialization parameters” – Oracle Apps 11i

Some times we face following error while viewing CM log files from Concurrent Manager Administer screen.

‘File server could not verify its initialization parameters’

Cause:

The issue is because of the syntax error in listener.ora file on appmgr side. If you check the parameter APPLFSTT in listener.ora on appmgr side (cd $TNS_ADMIN), you will see there is a space between the value.

Example:

APPLFSTT=PQP10MS6_806_BALANCE; PQP10MS6_FO; PQP10MS6_BALANCE; PQP10MS6

FNDFS cannot interpret the space between the value

Workaround:

Remove the space between the values and bounce apps listener.

Permanent Fix:

As per metalink note ID 304568.1, apply patch 4244610 to the environment.

Hope this helps !!

Enabling Forms Tracing – Oracle E-Business Suite R12

Introduction:

Some times we need to diagnose the issue or error coming in forms. For such situation we need to get more information about the issue we are facing in forms. One of the best way to get such information is using tracing.

Many types of tracing are available in forms. We will discuss couple of methods for enabling the tracing and see how that can help us in diagnosing the issue.

Enabling SQL Tracing

First we see how to enable a normal SQL tracing. In my case I have a forms where I am entering few values and then saving the form. This will have some insert and updates command in the backend which will get executed when we save the form. SQL tracing is used to track those DML queries to debug the issue. When we open the form we can see on the menu option

Help -> Diagnostic -> Trace ->

Here we have many options available (default is ‘No Trace’). We can enable tracing by selecing on of the options from here.  ‘Regular Trace’ gives the least information and ‘Trace with Binds and Waits’ (level 12) gives maximum information. Be careful while enabling SQL trace with wait and binds as this will make the trace file huge.

Also we can start the tracing in the middle of forms sessions also. Example in my case I wanted to check the last insert/update DML statements that gets fired when we save the form. So after we fill all the values in the fields, we can start the tracing so that the initial select statement does not come in trace file.

When we enable the trace it will give the trace file location (This location will be the location of USER_DUMP_DESTINATION parameter of database).

After you save the form you can stop tracing by selecting ‘No Trace’ again from Help -> Diagnostic -> Trace -> No Trace

Use tkprof to covert trace file in readable format.

-bash-2.05b$ tkprof md9ys210_ora_20412_MFG.trc
output = trace_sql.txt

TKPROF: Release 10.2.0.4.0 – Production on Mon Jul 28 23:54:31 2008

Copyright (c) 1982, 2007, Oracle.  All rights reserved.

-bash-2.05b$

Enabling Forms Tracing

Another way for debugging is to enable tracing at forms level for a particular user. This method is also explained in metalink note ID 373548.1. By default, it will trace errors only. To utilize the full flexibility of Forms Trace, the record=forms parameter must be used in conjunction with the tracegroup parameter.

Applications Development has created several recommended levels of tracing, which are predefined in the file $ORA_CONFIG_HOME/10.1.2/forms/server/ftrace.cfg

  • light (0-71,192): Records user actions, error messages and form service events.
  • medium (0-97,192,194,195): Includes events in the light tracegroup also captures built-in and user-exit events.
  • full (0-98,100-199): Captures all information that is currently available through Forms Tracing.
  • dbsql (32,64,98,100,101,136,137,192): All events related to the database and forms interaction with it
  • network (64,32,128,129,131,130,132,133,134,192): All events related to communications between client tier and Forms server.

In my case I am enabling the tracing for SQLs.

Please follow below steps to enable form level tracing.

  1. Sign on to Oracle Applications as a user with the System Administrator responsibility.
  2. Select the System Administrator responsibility.
  3. Select the Profile -> System function (this will launch Forms).
  4. Search for the Applications user and the profile option Forms Runtime Parameters.
  5. Add the required parameters, using the syntax: record=forms tracegroup=dbsql
    You can use any other value also for tracegroup depending on your requirement. Examples:

    record=forms tracegroup=medium

    Note the space between the parameter/value pairs.

  6. Commit the changes and log out of Applications.
  7. Restart the Forms OC4J group using either adformsctl.sh (servlet) or adformsrvctl.sh (socket).
  8. Log in to Applications as the user whose Forms Runtime Parameters profile option was amended.

Before doing this, metalink note ID also mention to append “record=forms” at the end of ICX: Forms Launcher profile for that user. Example

http://rws60048rems.us.oracle.com:8058/forms/frmservlet?record=forms

But when I tried that I got following error.

FRM-90926: Duplicate Parameter on command line.

So I removed record=forms and I was able to generate the trace files without any issue. This may be some issue with my instance.

Once you set the profile “Forms Runtime Parameters”, bounce the forms and try to accecss forms, it will generate a trace file at the location defined by FORMS_TRACE_DIR. After you finish the navigation in form and exit, your trace file will get created FORMS_TRACE_DIR location.

This file will be in binary format and cannot be read. You need to convert this into .txt or .html format using following command

java -cp $ORACLE_HOME/forms/java/frmxlate.jar oracle.forms.diagnostics.Xlate datafile=forms_8842.trc outputfile=forms_trace1.html outputclass=WriteOutHTML

Here,

datafile=forms_8842.trc is the name of trace file that got created due to activities on forms.

outputfile=forms_trace1.html is the name of output file that will get created. You can use html or .txt

outputclass=WriteOutHTML parameter decides if the output file should be HTML or TXT. If you use just WriteOut then output file will be TXT. If you use WriteOutHTML then output file will be HTML.

The output HTML file will look like this.

Hope this helps !!

References:

Metalink note ID 373548.1

Tuning JVM in Oracle E-Business Suite

Introduction:

This post is based on an interesting article that I came across. I have provided the link to those articles in the reference section.

Also the main inspiration of this article are the performance issues that I am facing daily in my ST applications.

When we talk about JVM which are used for a wide variety of applications from small applets on desktops to web services on large servers. Each JVM will manage its memory using some or the other Garbage Collectors. In the J2SE platform version 1.4.2 there were four garbage collectors but there was no choice given to user to select one of them. In version 5 as well the choice was not given by intelligence was incorporated to select the correct GC based on the class of the machine on which the application is started.
Usually the choice of Garbage Collector (GC) does not matter as applications perform within there limits, but for some applications it does. For example large application that scales well to large number of threads, processors, sockets, and a large amount of memory. As with the past experience usaully smaller application has no problem with GC and works very smoothly even with “serial GC” however for large application this can become a major bottleneck. For a sufficiently large system it becomes well worthwhile to choose the right garbage collector and to tune it if necessary.

This post is based on the JVM present in Oracle E-Business suite R12. We will be talking about garbage collection in R12 and see an interesting tool which can make quick analysis about how GC is happening in our application.

Generational Collections:

An object is considered garbage when it can no longer be reached from any pointer in the running program. The most straightforward garbage collection algorithms simply iterate over every reachable object. Any objects left over are then considered garbage. The time this approach takes is proportional to the number of live objects, which is prohibitive for large applications maintaining lots of live data.
Beginning with the J2SE Platform version 1.2, the virtual machine incorporated a number of different garbage collection algorithms that are combined using generational collection. Lets understand what it is.

Past experience showed that objects are usually allocated for short span of time. The blue area in the diagram below is a typical distribution for the lifetimes of objects. The X axis is object lifetimes measured in bytes allocated. The byte count on the Y axis is the total bytes in objects with the corresponding lifetime. Sharp peak on the left represent that maximum number of live objects can be reclaimed within a short span of time after being allocated.

Some objects do live longer, and so the distribution stretches out to the the right. Some applications have very different looking distributions, but a surprisingly large number possess this general shape. Efficient collection is made possible by focusing on the fact that a majority of objects “die young”.

To optimize this scenario JVM divides the memory into different generations of we can say pools holding objects of different ages. Garbage collection occurs in each generation when the generation fills up. Initially objects are assigned to younger generation (of pool where young objects are placed) and once young generation fills up minor GC (Garbage Collection) will take place which will clean up the dead object (since by the time most of the short leaving objects will be dead). The left over objects which might be still alive are transferred to elder generation (called tenured generation). When tenured generation gets full, “FULL GC” will take place which is quite slow compared to minor GC as all the live objects will be checked.

Below figure shows different pools in memory. “Young Generation” consist of 3 sub generation one EDEN, and two survivor space. Objects are initially allocated in EDEN. One survivor space is empty at any time, and serves as a destination of the next, copying collection of any live objects in EDEN and the other survivor space. Objects are copied between survivor spaces in this way until they are old enough to be tenured, or copied to the tenured generation.

“Tenured Generation” is bigger then Young Generation.

The permanent generation holds data needed by the virtual machine to describe objects that do not have an equivalence at the Java language level. For example objects describing classes and methods are stored in the permanent generation.

JVM in Oracle E-Business Suite R12

Talking about Oracle E-Business Suite R12 we specify the minimum and maximum size of jvm memory to be used for different processes.
Example in CONTEXT_FILE of application we specify JVM for

  1. Forms process using – “forms_jvm_start_options” parameter
  2. OACore process using – “oacore_jvm_start_options” parameter
  3. OAFM process using – “oafm_jvm_start_options” parameter

Lets look at the various properties we specify for these parameter. Example for OACore

<oacore_jvm_start_options oa_var=”s_oacore_jvm_start_options”>-server -verbose:gc -Xmx1024M -Xms256M -XX:MaxPermSize=256M -XX:NewRatio=2  -XX:+PrintGCTimeStamps -XX:+UseTLAB -XX:+UseParallelGC  -XX:ParallelGCThreads=2  -Dcom.sun.management.jmxremote -Djava.security.policy=$ORACLE_HOME/j2ee/oacore/config/java2.policy -Djava.awt.headless=true -Dhttp.webdir.enable=false -Doracle.security.jazn.config=/slot/ems2339/appmgr/inst/apps/pz8st210_rws60056rems/ora/10.1.3/j2ee/oacore/config/jazn.xml</oacore_jvm_start_options>

-verbose:gc : This parameter is used to tell JVM to dump the log messages in oacore log file when ever garbage collection happens. The log file is located at $LOG_HOME/ora/10.1.3/opmn. We will refer/use this log file further down when I will demonstrate GCViewer tool.

-Xmx1024M : This is the maximum size of footprint of JVM

-Xms256M : This is the minimum size of footprint of JVM. So at the start “Young Generation” + “Tenure Generation” = -Xms value

-XX:MaxPermSize : This is the size of “Permanent Generation”

-XX:NewRatio : This is the ratio of “Tenure Generation”/”Young Generation”

-XX:+PrintGCTimeStamps : This parameter is going to print the time stamp when garbage collection is done.

In my case the log file path and location is $LOG_HOME/ora/10.1.3/opmn/default_group~oacore~default_group~1.log (This is the log file for oacore. Similarly there will be log file for forms and oafm in the same location).
The sample content of log file is as given below.

94562.018: [GC 670227K->595360K(892672K), 0.0221060 secs]
94617.600: [GC 672480K->617324K(799104K), 0.0307160 secs]
94648.483: [GC 694444K->623826K(872384K), 0.0405620 secs]
94706.754: [Full GC 756173K->264184K(790720K), 0.8990440 secs]
94718.575: [GC 458782K->424403K(737536K), 0.0471040 secs]
94740.380: [GC 501646K->436633K(793600K), 0.0656750 secs]
94817.197: [GC 512473K->441116K(795136K), 0.0749340 secs]

Here the first column 94562.018, 94617.600 shows the time in seconds when GC happened. Inside the square bracket it indicates whether its a minor GC or FULL GC. That is followed by some number 670227K->595360K. The number on left side of -> indicate original size of live objects before GC and number after -> indicate size of live objects after GC. Number in the bracket (892672K) indicates total size of live objects allocated. Number after comma indicates time it took to complete garbage collection. For example in the first rows it took 0.0221060 secs for completing GC.

Having known this much on information let me introduce you to a tool called GCViewer which is a java based tool and shows the content of this log file in the form of graph giving huge information which will help us tuning our application.

GC Viewer Tool

1) Download and installation :- You can download this tool from this location
While downloading and using in windows you can just download the bin file (no need to downlaod source file) – gcviewer-1.29-bin.zip

unzip the file and you will get a folder consisting of a jar file. Double click on the jar file and an application will open as shown below.

2) Uploading the log file :- You can either ftp the oacore log file from $LOG_HOME/ora/10.1.3/opmn or you can give the URL for log file. For giving URL copy the log file to OA_HTML location as shown below.

-bash-3.00$ cp default_group~oacore~default_group~1.log $OA_HTML

In this case URL becomes – http://(hostname):(port)/OA_HTML/default_group~oacore~default_group~1.log
In my case it is

http://rws60056rems.us.oracle.com:8039/OA_HTML/default_group~oacore~default_group~1.log

You can load this log file by giving URL to the tool as shown below and click on OK.

After loading the file the graph is shown below.

This is really not a good sign. In my case the FULL GC was happening at every second and thats why I have to go deep inside and think about tuning JVMs. The black portion that you see in the above fig are the FULL GC. You can see various matrix values and hide other matrix values using View button in the top menu. Black lines are most important for me as I want to avoid frequent FULL GC. Below fig shows continuous black lines every second indicating that FULL GC is happening.

View for FULL GC

Also another important things to look for are read and blue lines. The red line indicates the total heap size, the blue line shows how much of the heap is actually used.

If you observe carefully the blue line follows a saw-tooth pattern. This is because when new live objects are loaded the size of JVM increase steadily as shown by inclined lines and when minor collection happens the size reduces suddenly.

The green line shows time required for garbage collections (Shown below). Usually minor GC take less time and FULL GC takes more time.

Tuning Application

Finally after doing all this exercise we need to use this information for tuning our application. Based on the above observations we can make certain rules for tuning.

When ever a full GC happens the memory consumption (blue line) is reduced to more or less the same value as after the last full collection. This means that even though some objects were promoted to the older generation, most of them didn’t become very old either. Which means, that maybe they should not have been promoted to the older generation in the first place.

In that case it is beneficial to increase the size of “Younger Generation” to prevent the early promotion of medium-lived objects. The parameter -XX:NewRatio=<value> defines the ratio of “Tenure Generation”/”Young Generation”. So if we reduce this ration (lets say to 1) then in that case “Younger Generation” size will get increased and most of the live objects will stay and die in “Younger Generation” rather then shifting to Tenure Generation and dying. This will cause Tenure generation to get full after longer period and that will reduce FULL GC. So in case if there are huge number of FULL GC and heap size is getting reduced to more or less same level after FULL GC then you can consider reducing the value of -XX:NewRatio=<value> parameter in CONTEXT_FILE for oacore (we are concentrating more in oacore here). Disadvantage of this is that overall size of footprint will increase as Younger Generation size increased.

So initially it will start with 256M (as in my case) and then as the memory consumption increases it will gradually increase the size of JVM to accommodate more objects and it will increase the size until max upto -Xmx parameter value.

Usually it is recommended that we should allocate as much as memory as possible in the initial stage instead of waiting for JVM to increase it automatically. This will prevent application from suffering due to performance in the initial period of time.

Hope this helps !!!

References:

Metalink Note ID : 342799.1
Tuning Garbage Collection with the 5.0 Java[tm] Virtual Machine
Java Performance Tuning using GC Viewer

Oracle RAC 10g – Cache Fusion

Introduction:

This post is about Oracle Cache Fusion technology, which is implemented in Oracle database 10g RAC. We are going to discuss just about cache fusion. You should have the architecture knowledge about RAC. Please check Oracle documentation for understanding Oracle RAC architecture. Also you can visit my previous post about Oracle RAC installation to get some basic information and installation details.

Cache fusion technology was partially implemented in Oracle 8i in OPS (Oracle Parallel Server). Before Oracle 8i the situation was different. If we take a case of multi-instance Oracle Parallel server and if one of the instance ask for a block of data which is currently modified by other instance of same database, then the holding instance needs to write the data to disk so that requesting instance can read the same data. This is called “Disk Ping”. This has greatly effected the performance of the database. With Oracle 8i, partial cache fusion was implemented.

Oracle 8i (Oracle Parallel Server) has a background process called “Block Server Process” which was responsible for cache fusion in Oracle 8i OPS. Following table gives the scenario when cache fusion works in Oracle 8i OPS and scenario where cache fusion was not working. Offcourse these limitations are not present in Oracle 10g RAC.

So when requesting instance ask for a block which is present in holding instance in a read or write mode and if the block is dirtied, then cache fusion used to work and block from cache of holding instance used to get copied to requesting instance. But if block is not dirtied and block is present in holding instance then requesting instance has to read the block from datafile. Also if the block is opened for write in holding instance and other instance wants to update the same block then holding instance have to write the block to disk so that requesting instance can read it.

Concept of cache fusion

Cache Fusion basically is about fusing the memory buffer cache of multiple instance into one single cache. For example if we have 3 instance in a RAC which is using the same datafiles and each instance is having its own memory buffer cache in there own SGA, then cache fusion will make the database behave as if it has a single instance and the total buffer cache is the sum of buffer cache of all the 3 instance. Below figure shows what I mean.

This behavior is possible because of high speed interconnect existing in the cluster between each instance. Each of instance is connected to other instance using a high-speed interconnect. This makes it possible to share the memory between 2 or more servers. Previously only datafile sharing was possible, now because of interconnect, even the cache memory can be shared.

But how this helps? Well, for example if we have a data block in one of the instance and its updating the block and other instance needs the same data block then this data block can be copied from holding instance buffer cache to requesting instance buffer cache using this high-speed interconnect. This high speed interconnect is a private connection made just for sending data blocks and more by instances. External users cannot use this connection. It is this interconnect which makes multiple server behave like a cluster. These servers are bind together using this interconnect.

Moving further, now we know how the cluster is formed and what is the back bone of cluster and what exactly we call “cache fusion”. Next we will see how cache fusion works. But before that we need to discuss few important headings which is very important to understand.

We will discuss following topics before discussing Cache Fusion

  1. Cache Coherency
  2. Multi-Version consistency model
  3. Resource Co-ordination – Synchronization
  4. Global Cache Service (GCS)
  5. Global Enqueue Service
  6. Global Resource Directory
  7. GCS resource modes and roles
  8. Past Images
  9. Block access modes and buffer states

I promise this wont be too heavy. Lets look into the overview of these concepts. I wont be going into the details, just sufficient for you to understand cache fusion.

1) Cache Coherency

If we consider a single instance database, whenever a user queries for data he gets a consistent view of data. For example another user has already read a block of data and changed some rows in buffer cache. If another user want to read the data from same data block then Oracle will make a copy of that data block in buffer cache and apply the undo information present in undo tablespace to get a consistent view of data. This consistent data is then presented to user who wants to read the data. This is called maintaining consistency of data.
Now consider a multi instance system RAC, where a data block might not be present in same instance. A user might be updating data block in some other instance. If data block are already available in local instance then they will be immediately available to the user. if they are present in some other instance with in the cluster, they will be transfered into local buffer cache.
Maintaining the consistency of data blocks in the buffer cache of multiple instance is called “Cache Coherency”.

2) Multi-Version consistency model

Multi version consistency model distinguishes between current version of data block and one or mode read consistent version of data block. The current block is the one which contains all the changes, committed as well as uncommitted. Example a user fired a DML on a data block which is not present in any of the instance. Then this block will be read from disk into buffer cache where the value gets changed. After then user commits and fires another DML on same data block. Now that data block is dirty and contains committed as well as uncommitted changes.
Suppose this data block is requested by another user for reading, then oracle will make a copy and apply undo information and make a Consistent Read “CR” copy of this block and ship it to requesting instance. Thus we have multiple versions of same data blocks, each of them are consistent with respect to the user who requested.
During the course of operation there can be many more version of same data block, each of them consistent with respect to some point in time.

3) Resource Co-ordination – Synchronization

In case of multi instance system such as RAC, where same resources (example data block) are getting used concurrently, effective synchronization is required for maintaining consistency. With in the shared cache, co-ordination of concurrent task is called synchronization. The synchronization provided by Oracle RAC provides a cluster wide concurrency of resource and in turn ensure integrity of shared data. All though there is synchronization within the cache, there is some cost involved for doing the same. If we talk about low level operation of synchronization, its just a data copy operation or data transfer operation.
According to Oracle studies, accessing the block in a local cache is much faster then accessing the block from another instance cache with in the cluster. Because with local cache is the in memory copy and with other instance cache, the data transfer needs to be done over high speed interconnect which is obviously slower then in memory copy. Worst is the copy from disk, which is much slower then above two process. Below graph shows the block access time using these 3 methods.

4) Global Cache Service

Global Cache Service (GCS) is the main component of Oracle Cache Fusion technology. This is represented by background process LMSn. There can be max 10 LMS process for an instance. The main function of GCS is to track the status and location of data blocks. Status of data block means the mode and role of data block (I will explain mode and role further). GCS is the main mechanism by which cache coherency among “multiple cache” is maintained. GCS is also responsible for block transfer between the instances.

5) Global Enqueue Service

Global Enqueue Service (GES) tracks the status of all Oracle enqueuing mechanism. This involves all non-cache fusion intra instance operations. GES performs concurrency control on dictionary cache locks, library cache locks and transactions. If performs this operation for resources that are accessed by more then once instance.
Enqueue services are also present in single instance database. These are responsible for locking the rows on a table using different locking modes. To understand more about enqueues, check Oracle documentation about locking.

6) Global Resource Directory

GES and GCS together maintains Global Resource Directory (GRD). GRD is like a in-memory database which contains details about all the blocks that are present in cache. GRD know what is the location of latest version of block, what is the mode of block, what is the role of block (Mode and role will be discussed shortly) etc. When ever a user ask for any data block GCS gets all the information from GRD. GRD is a distributed resource, meaning that each instance maintain some part of GRD. This distributed nature of GRD is a key to fault tolerance of RAC. GRD is stored in SGA. Typically GRD contains following and more information

  • Data Block Address – This is the address of data block being modified
  • Location of most current version of data block
  • Modes of data block
  • Roles of data block
  • SCN number of data block
  • Image of data block – Could be current image or past image.

7) GCS resource modes and roles

Mode of data block is decided based on whether a resource holder intends to modify the data or read the data. The modes are as follows:

  1. Null (N) Mode: Null mode is the least restrictive mode. It indicates no access rights. It acts as a place holder.
  2. Shared (S) Mode: Shared mode indicate that database block is being read and not modified. However another session can read the data block
  3. Exclusive (X) Mode: Exclusive mode indicate exclusive access to block. Other resource cannot have write over this data block. However it can have consistent read on this datablock.

GCS resources also has roles. Following are the different roles present:

  1. Local: When a data block is first read into the instance from the disk it has a local role. Meaning that only 1 copy of data block exists in the cache. No other instance cache has a copy of this block.
  2. Global: Global role indicates that multiple copy of data block exists in clustered instance. For example a user connected to one of the instance request for a data block. This data block is read from disk into an instance. The role granted is local. If another instance request for same block this block will get copied to the requesting instance and the role becomes global.

This role and mode information is maintained in GRD (Global Resource Directory) by GCS (Global Cache Service).

8) Past Images

Past Image concept was introduced in Oracle 9i to maintain data integrity. In an Oracle database, a typical block is not written to disk immediately after it is dirtied. This is to reduce excessive IO. When the same dirty block is requested by some other instance for write of read purpose, an image of the block is created in owning instance and then the block is shifted to requesting instance. This image copy of the block is called Past Image (PI). In the event of failure Oracle can reconstruct the block by reading PIs. It is also possible to have more then 1 PI of the block, depending on how many times the block was requested in dirty stage.

A past image of the block is different then CR (Consistent read) image. Past image is required to create CR by applying undo data.

9) Block access modes and buffer states

An additional concurrency control concept is the buffer state which is the state of a buffer in the local cache of an instance. The buffer state of a block relates to the access mode of the block. For example, if a buffer state is exclusive current (XCUR), an instance owns the resource in exclusive mode.
To see a buffer’s state, query the “status” column of the V$BH dynamic performance view. The V$BH view provides information about the block access mode and their buffer state names as follows:

  • With a block access mode of NULL the buffer state name is CR — An instance can perform a consistent read of the block. That is, if the instance holds an older version of the data.
  • With a block access mode of S the buffer state name is SCUR — An instance has shared access to the block and can only perform reads.
  • With a block access mode of X the buffer state name is XCUR –An instance has exclusive access to the block and can modify it.
  • With a block access mode of NULL the buffer state name is PI — An instance has made changes to the block but retains copies of it as past images to record its state before changes.

Only the SCUR and PI buffer states are Real Application Clusters-specific. There can be only one copy of any one block buffered in the XCUR state in the cluster database at any time. To perform modifications on a block, a process must assign an XCUR buffer state to the buffer containing the data block.
For example, if another instance requests read access to the most current version of the same block, then Oracle changes the access mode from exclusive to shared, sends a current read version ofthe block to the requesting instance, and keeps a PI buffer if the buffer contained a dirty block.

At this point, the first instance has the current block and the requesting instance also has the current block in shared mode. Therefore, the role of the resource becomes global. There can be multiple shared current (SCUR) versions of this block cached throughout the cluster database at any time.

Block transfer using Cache Fusion

Lets consider a very details example of how the block transfer happens between different instances. For explaininng this example I am assuming a 3 node RAC system and also another assumption is that any DML statement is followed by a commit. So if I say that a user executed update that means user executed update + commit. But there is no checkpoint until the end.

Stage 1) In stage 1 datablock is requested by a user C who is connected to instance 3. So a datablock is read into the buffer cache of instance 3.

select sales_rank from salesman where salesid = 10;

Assume this gives a value of 30. This block is read for the first time and its not present in any other instance. So the role of block is LOCAL and the block is read in SHARED mode. Also there are NO PAST IMAGES. So we describe this stage has instance 3 having SL0 mode (SHARED, LOCAL, 0 PAST IMAGES).

Stage 2) In stage 2 user B issues the same select statement against the salesman table. Instance 2 will need the same block; therefore, the block is shipped from instance 3 to instance 2 via cache fusion interconnect. There is no disk read at this time. Both instances are in SHARED mode (S) and role is LOCAL (L). Here if you see carefully that even though the block is present in more then once instance, still we say that role is local because the block is not yet dirtied. Had the block been dirty and then requested by other instance, then in that case the role will change to global.

Stage 3) In stage 3 user B decides to update the row and commit at instance 2. The new sales rank is 24. At this stage, instance 2 acquires EXCLUSIVE lock for updating the data at instance 2 and SHARED lock from instance 3 is downgraded to NULL lock.

update salesman set sales_rank = 24 where salesid = 10; commit;

So instance 2 is having a mode XL0 (Exclusive, Local with 0 past images) and instance 3 is having a NULL lock, which is just a place holder. Also the role of the block is still LOCAL because the block is dirtied for the first time only on instance 2 and no other instance is having any dirty copy of that. If another instance now tries to update same block the role will change to global.

Stage 4) In stage 4 user A decides to update in instance 1 the same row and hence the same block with salesrank of 40. It finds that block is dirtied in instance 2. Therefore the datablock is shipped to instance 1 from instance 2, however, a PAST IMAGE of the datablock is created on instance 2 and lock mode on instance 2 is downgraded to NULL with a GLOBAL role. Instance 2 now has NG1 (NULL lock with GLOBAL role and 1 PAST IMAGE). At this time instance 1 will have EXCLUSIVE lock with GLOBAL role (XG0).

Stage 5) User C executes a select statement from instance 3 on same row. The datablock from instance 1 being the most recent copy (GRD (Global Resource Directory) knows this information about which instance is having the latest copy of datablock), it is shipped to instance 3. As a result the lock on instance 1 is converted to SHARED GLOBAL with 1 PAST IMAGE. The reason the lock gets changed to SHARED and not NULL is because instance 3 asked for shared lock (for reading data) and not exclusive lock (for updating data). If the instance 3 asked for exclusive lock then the instance 1 would have had NULL lock.

Also the instance 3 will now hold SG0 (SHARED, GLOBAL with 0 PAST IMAGES).

Stage 6) User B issues the same select statement against the salesman table on instance 2. Instance 2 will request for a consistent copy of buffer from another instance, which happens to be the current master.
Therefore instance 1 will ship the block to instance 2, where it will be required with SG1 (SHARED, GLOBAL with 1 PAST IMAGE). So instance 2 mode becomes SG1.

Stage 7) User C on instance C updates the same row. Therefore the instance 3 requires an exclusive lock and instance 1 and instance 2 will be downgraded to NULL lock with GLOBAL role and 1 PAST IMAGE. Instance 3 will have EXCLUSIVE lock, GLOBAL role and with no PAST IMAGES (XG0).

Stage 8) The checkpoint is initiated and a “Write to Disk” takes place at instance 3. As a result previous past images will be discarded (as they are not required for recovery) and instance 3 will hold that block in EXCLUSIVE lock LOCAL role with no PAST IMAGES (XL0).

Further if any instance wants to read or write on the same block then a copy will be again shifted from instance 3.

References:

Oracle 10g Grid and Real Application Cluster – By Madhu Tumma

Database Statistics and CBO

Introduction

One of the greatest problems with the Oracle Cost-based optimizer was the failure of the Oracle DBA to gather accurate schema statistics. In order for the CBO to make an intelligent decision about the best execution plan for your SQL, it must have information about the table and indexes that participate in the query. CBO depends on the information about tables and indexes present in data dictionary. This information includes size of the tables, its extent distribution, cardinality, and selectivity of column values etc. Based on these details CBO makes an attempt to come up with a highly optimized plan for executing the query.

Optimizer statistics include the following:

  • Table statistics
    • Number of rows
    • Number of blocks
    • Average row length
  • Column statistics
    • Number of distinct values (NDV) in column
    • Number of nulls in column
    • Data distribution (histogram)
  • Index statistics
    • Number of leaf blocks
    • Levels
    • Clustering factor
  • System statistics
    • I/O performance and utilization
    • CPU performance and utilization

Gathering statistics Automatically and Manually

Usually statistics gathering in 10g is automated using a scheduler job “GATHER_STATS_JOB”. This job is created by default when a database is created and run every day to collect statistics.

We can also gather statistics manually using DBMS_STATS. But why do we need to manually gather statistics? This is because, there are cases where automatic statistics gathering may not be adequate. Because the automatic statistics gathering runs during an overnight batch window, the statistics on tables which are significantly modified during the day may become stale. Example in case of bulk load of data into the table. Following are the procedures available in DBMS_STATS package to gather statistics

GATHER_INDEX_STATS – Index statistics
GATHER_TABLE_STATS – Table, column, and index statistics
GATHER_SCHEMA_STATS – Statistics for all objects in a schema
GATHER_DICTIONARY_STATS – Statistics for all dictionary objects
GATHER_DATABASE_STATS – Statistics for all objects in a database

Automatic sampling

While gathering the statistics the important part comes in deciding the sample size to be considered while gathering the stats. Sampling is specified using the ESTIMATE_PERCENT argument to the DBMS_STATS procedures. The better the quality of the statistics, the better the job that the CBO will do when determining your execution plans. Unfortunately, doing a complete analysis on a large  database could take days to finish. So to avoid this we tell DBMS_STATS to consider some percentage of rows on which it can base its analysis.

In earlier releases, the DBA had to guess what percentage of the database provided the best sample size and sometimes underanalyzed the schema. Starting with Oracle9i Database, the  estimate_percent argument is a great way to allow Oracle’s dbms_stats to automatically estimate the “best” percentage of a segment to sample when gathering statistics

estimate_percent => dbms_stats.auto_sample_size

AUTO_SAMPLE_SIZE lets Oracle determine the best sample size necessary for good statistics, based on the statistical property of the object. Because each type of statistics has different requirements, the size of the actual sample taken may not be the same across the table, columns, or indexes. For example, to collect table and column statistics for all tables in the OE schema with auto-sampling, you could use:

EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS(‘OE’,DBMS_STATS.AUTO_SAMPLE_SIZE);

When the ESTIMATE_PERCENT parameter is manually specified, the DBMS_STATS gathering procedures may automatically increase the sampling percentage if the specified percentage did not produce a large enough sample. This ensures the stability of the estimated values by reducing fluctuations.

Determining Stale Statistics

The important questions comes here is how does Oracle knows that the statistics for the table or index is stale and should be gathered? In Oracle 9i you could check if the data in a table had changed significantly by turning on the table monitoring option (ALTER TABLE … MONITORING) and then checking the view DBA_TAB_MODIFICATIONS for those tables. In 10g, the MONITORING statement is gone. Instead, statistics are collected automatically if the initialization parameter STATISTIC_LEVEL is set to TYPICAL or ALL. This is done by pre defined job “GATHER_STATS_JOB” mentioned previously.

In order to determine whether or not a given database object needs new database statistics, Oracle provides a table monitoring facility. This monitoring is enabled by default when STATISTICS_LEVEL is set to TYPICAL or ALL. Monitoring tracks the approximate number of INSERTs, UPDATEs, and DELETEs for that table, as well as whether the table has been truncated, since the last time statistics were gathered. The information about changes of tables can be viewed in the USER_TAB_MODIFICATIONS view. Following a data-modification, there may be a few minutes delay while Oracle propagates the information to this view. Use the DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO procedure to immediately reflect the outstanding monitored information kept in the memory.

The GATHER_DATABASE_STATS or GATHER_SCHEMA_STATS procedures gather new statistics for tables with stale statistics when the OPTIONS parameter is set to GATHER STALE or GATHER AUTO. If a monitored table has been modified more than 10%, then these statistics are considered stale and gathered again.

So all the modifications (update, insert, delete) are stored in USER_TAB_MODIFICATIONS table and these are compared with number of rows in a table at the time of statistics collection. If this ratio is greater then 10% then that table is eligible for statistics collection.

Dynamic Sampling

What is dynamic sampling ?

Dynamic sampling is used at compile time when a CBO is making query plan. This involves gathering the information about tables and indexes and other data dictionaly information required in the query. When we gather statistics this information is available ready, but in cases where the statistics are not gathered, we can considered using dynamic samplying where the stats are gathered dynamically and based on this information query plan is generated.

How Dynamic Sampling Works ?

The primary performance attribute is compile time. Oracle determines at compile time whether a query would benefit from dynamic sampling. If so, a recursive SQL statement is issued to scan a small random sample of the table’s blocks, and to apply the relevant single table predicates to estimate predicate selectivities. The sample cardinality can also be used, in some cases, to estimate table cardinality. Any relevant column and index statistics are also collected.

When to Use Dynamic Sampling ?

For a query that normally completes quickly (in less than a few seconds), you will not want to incur the cost of dynamic sampling. However, dynamic sampling can be beneficial under any of the following conditions:

  • A better plan can be found using dynamic sampling.
  • The sampling time is a small fraction of total execution time for the query.
  • The query will be executed many times.

How to Use Dynamic Sampling to Improve Performance ?

You control dynamic sampling with the OPTIMIZER_DYNAMIC_SAMPLING parameter, which can be set to a value from 0 to 10. The default is 2.

  • A value of 0 means dynamic sampling will not be done.
  • Increasing the value of the parameter results in more aggressive application of dynamic sampling, in terms of both the type of tables sampled (analyzed or unanalyzed) and the amount of I/O spent on sampling.

For dynamic sampling levels refer Oracle documentation

References:

Oracle documentation

Important tables for ADPATCH

Here are some of the important tables used by and updated by ADPATCH utility.

AD_APPL_TOPS

This table holds the various APPL-TOP’s in the Oracle Applications installation that have ever been patched.

AD_APPLIED_PATCHES

AD_APPLIED_PATCHES holds information about the “distinct” Oracle Applications patches that have been applied. If 2 patches happen to have the same name but are different in content (eg. “merged” patches), then they are considered distinct and this table will therefore hold 2 records.

AD_BUGS

AD_BUGS holds information about the various Oracle Applications bugs whose fixes have been applied (ie. patched) in the Oracle Applications installation.

AD_PATCH_DRIVERS

This table holds information about the patch drivers that comprise a patch.

AD_FILE_VERSIONS

This table holds the various versions of Oracle Applications files (real files, not “pseudo-files”), that have ever been patched or executed in the Oracle Applications installation.

AD_FILES

AD_FILES is the “files repository”. It contains information about the various files that have been patched in the Oracle Applications installation.
Some entries are “pseudo-files” and not real files, (eg. directories) in which case some of the columns are not applicable and would then hold the value “DUMMY”

AD_PATCH_DRIVER_LANGS

NLS patches (or more specifically, NLS patch drivers) pertain to a language or multiple languages. This table holds that language (or multiple languages).

AD_PATCH_DRIVER_MINIPKS

This table holds information about the various Mini Packs contained in a patch (driver)

AD_PATCH_RUN_BUG_ACTIONS holds the various actions present in “applied” bug (fix). If Autopatch determined not to apply a bug (fix), then this table will not hold any records for that “unapplied” bug fix.

AD_PATCH_RUN_BUG_ACTIONS

Even though a patch may have been applied on an Oracle Applications installation, some actions in some of its included bugs (fixes) may not have got executed if the “Autopatch” utility determined that it was not necessary to execute those actions. In such cases, EXECUTED_FLAG is set to N.

AD_PATCH_RUN_BUGS

This table holds information about the bugs fixed in a specific run of Autopatch.
AD_PATCH_RUN_BUGS holds information about the various bugs fixed in a specific run of Autopatch.

Even though a patch may have been applied on an Oracle Applications installation, some bugs (fixes) contained in it may not get applied due to some reason. In such cases, the REASON_NOT_APPLIED column holds the reason.

AD_PATCH_RUNS

AD_PATCH_RUNS holds information about the various invocations of Autopatch for applying Oracle Applications patches to a specific release of an Oracle Applications installation.

If multiple drivers are run in one invocation of Autopatch, they result in multiple records in this table. These multiple records will all have the same SESSION_ID (because they arose from one Autopatch invocation), but different TASK_NUMBER’s. The TASK_NUMBER’s in this case will be numbered sequentially as 1, 2, 3, etc.
Note that when the database driver of a Maintenance Pack is applied, it bumps up the release version by creating a new record in AD_RELEASES, which is then pointed to by the UPDATED_TO_RELEASE_ID column of the old record.

AD_RELEASES

AD_RELEASES holds the various Oracle Applications releases that an installation of Oracle Applications has gone through in its entire life cycle.

It should be noted that START_DATE_ACTIVE, END_DATE_ACTIVE and BASE_RELEASE_FLAG are loosely-maintained informational columns and are not accurately maintained, and therefore should not be relied upon heavily.

AD_PATCH_COMMON_ACTIONS

This table holds distinct information about the various actions that are (often repeatedly) performed by Autopatch as part of applying patches.

Hope this helps !!

References:

Metalink ETRM DIRECT ACCESS DOCUMENT 150230.1