Oracle Performance Tuning – ASH Basics

Recently I got an opportunity to dig into performance of database. We were not facing any performance issue, but we wanted to understand whats happening under the hood. What are the bottlenecks and how do we prepare and scale our database further.

In future the load on the databases are going to be crazy and we want to make sure that we look into every consumer of our database and optimize the usage.

I went back into the basics of database performance tuning – ASH. Active session history is the beast which has loads of information and provides information at very granular level – “session”. We can dig into ASH data and generate different reports rolled-up against various entities like module, action, sql_id, user_id etc. We can determine how much each of these entities are using database resources.

DB Time:

One of the most fundamental measure of database performance is DB Time. This is the amount of time a session has spend on database activities. This is a combination of CPU Time + non-Idle wait times. This does not consider any idle wait time so even if session is done with its activity but just sitting idle doing nothing, that idle time is not considered part of DB Time.

This gives a very accurate measure of amount of database resources a session is burning.

Active Session History:

ASH or Active Session History provides snapshots of active sessions connected to database (status=ACTIVE in v$session) and loads of other information related to those sessions about what they were doing when snapshot was taken. To understand how we can extract this information and make sense about it, we need to understand how ASH is build and what that information means.

V$ACTIVE_SESSION_HISTORY

Active session history is exposed by database using a view V$ACTIVE_SESSION_HISTORY. This view stores samples taken every 1 sec. A background process MMNL (Memory manager lite) takes a snapshot from v$session and v$session_wait about the current state of active sessions and dumps the information in V$ACTIVE_SESSION_HISTORY. Note that only active sessions’s snapshot is taken.

V$ACTIVE_SESSION_HISTORY is a memory view stored in the form of circular buffer. It means that this view has size limit on its growth (approx 2MB / CPU). So when it reaches to that size it starts overwriting the oldest records in circular buffer. You can visualize the same using following figure

 

ash-architecture-and-advanced-usage-rmoug2014-7-638

 

 

 

 

 

 

 

 

So how these samples are useful to us in finding DB Time ?

Lets say a session has spend 10 sec of active time in database. It means DB time for that session is 10 sec. During this 10 sec of active time, we will have 10 samples of that session taken as interval time between snapshot is 1 sec. So if we just take a count(*) from V$ACTIVE_SESSION_HISTORY, we will see 10 records for that session. This means that to get DB Time of a session  we just have to take count(*) from V$ACTIVE_SESSION_HISTORY for that session.

Its little difficult to adjust to the fact that time parameter can be obtained using count(*) from a view. But this is because interval time between samples is 1 sec.

So DB Time is actually – [ Number of samples ] X [ Time interval between samples ]

Since time interval between samples is 1 sec, we can get DB time as number of samples which is count(*) from V$ACTIVE_SESSION_HISTORY

Again, calculating DB Time using count(*) from ASH is approximation as sessions which are taking fraction of times < 1 sec will not be counted if they happens between 2 samples and DB time of other sessions which ends just before next samples is taken will be counted until last samples when they were active. But those were very few cases and amount of DB time not measure in such cases will be very less compared to total DB time we are measuring. You can visualize the same using following figure.

 

Aas_ashDBA_HIST_ACTIVE_SESS_HISTORY

DBA_HIST_ACTIVE_SESS_HISTORY is a history table which maintains history of active sessions between AWR snapshots. Since V$ACTIVE_SESSION_HISTORY is a memory view and has limited space for growth, data from V$ACTIVE_SESSION_HISTORY flows to DBA_HIST_ACTIVE_SESS_HISTORY table.

DBA_HIST_ACTIVE_SESS_HISTORY table enables us to view active session data from the past. Data in DBA_HIST_ACTIVE_SESS_HISTORY is retained based on retention level set for AWR snapshots (30 days by default). So every time AWR snapshot happens, data from V$ACTIVE_SESSION_HISTORY goes into this table.

But NOT all data from V$ACTIVE_SESSION_HISTORY goes into DBA_HIST_ACTIVE_SESS_HISTORY. When AWR snapshot happens, it picks only 1 out of 10 samples from V$ACTIVE_SESSION_HISTORY into history table.

Complete visualization of active sessions looks like below

ash-architecture-and-advanced-usage-rmoug2014-7-638

 

 

 

 

 

 

 

 

Since AWR snapshot interval is 1 hour, it will take AWR snapshots every 1 hour and during that time 1 out of 10 samples form V$ACTIVE_SESSION_HISTORY will move to DBA_HIST_ACTIVE_SESS_HISTORY table.

So if want to calculate DB Time from DBA_HIST_ACTIVE_SESS_HISTORY table, we need to consider sample time as 10 sec instead of 1 sec.

DB Time = [ Number of samples ] X [ Time interval between samples ]
        = count(1) * 10

That’s why when we are calculating DB Time of a session or SQL from DBA_HIST_ACTIVE_SESS_HISTORY table, we need to take count(*) * 10.

Calculating DB Time

With this brief understanding of ASH, we can use these tables to find out DB time for a session and consolidate sessions to calculate DB Time for module/user etc.

If we can list modules in terms of % of DB time they are consuming in database, modules with high % of DB time are the one using most resources in database and has high impact to database. Same holds true at user level as well. We can find out which user is contributing highest DB time.

I have given couple of queries to find out % of DB Time different modules are contributing in database. This gives a fair idea about module wise resource consumption in database.

Note that I have randomized module names using DBMS_RANDOM.string(‘L’,TRUNC(DBMS_RANDOM.value(10,21))) to avoid exposing actual module name outside.


set line 999
set pagesize 999
set verify off
col module format a30;
BREAK ON REPORT
COLUMN DUMMY HEADING ''
compute sum of PCT_CPU on REPORT
with total_cpu as
(select   count(1)
 from     V$ACTIVE_SESSION_HISTORY
 where    SESSION_TYPE = 'FOREGROUND')
 select   module,
          count(1) "module_cpu",
          (select * from total_cpu) "total_cpu",
          round((count(1)/(select * from total_cpu))*100,2) PCT_CPU
from      V$ACTIVE_SESSION_HISTORY
where     SESSION_TYPE = 'FOREGROUND'
group by module order by PCT_CPU desc;

MODULE                         module_cpu  total_cpu    PCT_CPU
------------------------------ ---------- ---------- ----------
vvadcagpas                        29355     187065      15.69
cbkcbqchts                        26506     187065      14.17
zugkvhbajxpff                     24655     187065      13.18
tvfqhrkqzktiiceolrfd              10046     187065       5.37
paczxkndgsaqt                      6740     187065        3.6
pzqacdgxeqzbtqhalfc                6220     187065       3.33
nvqqscmayx                         5905     187065       3.16
zbirttdyoxxbslopnm                 5412     187065       2.89
bbztiihehigeqwxvk                  5051     187065        2.7
kvosekxfhkmfw                      4687     187065       2.51
aenyopwmok                         3977     187065       2.13
brfqfuwdyhyrej                     3256     187065       1.74
abruxykxhvqgfg                     3227     187065       1.73
zfrahazoqgdxubpni                  2759     187065       1.47
ddxinnevqd                         2633     187065       1.41
lructcsjkxhxatfidpf                2518     187065       1.35
naawhsfqvcaoathn                   2246     187065        1.2
sljzltjhhigepqfmcji                2142     187065       1.15
chvjvntynkjb                       2102     187065       1.12
zlzhwwhzbhfdtsszfedj               2005     187065       1.07
yvkuwzjkjhyeamco                   1610     187065        .86
fdjwspubzyhyxyzzkgyi               1578     187065        .84
risowlcnkizrbpgt                   1557     187065        .83
ywwdycosdpcfsflaubia               1539     187065        .82
pxlqrkmvreppum                     1496     187065         .8
amivlnlivqeesnqdkto                1404     187065        .75
jjaqyosqsc                         1347     187065        .72
yxkgfgktafvx                       1354     187065        .72
gqistslkwdw                        1328     187065        .71
bhuzvdpklocmnyimtzux               1272     187065        .68
uimycabvbjcczibrn                  1053     187065        .56
ksnvqslyker                         997     187065        .53
ofekwpklwpjtha                      975     187065        .52
wmkmovueneno                        904     187065        .48
umdtggtvdptkubfs                    795     187065        .42
dddegpqpwyalotxtoh                  761     187065        .41
pwycwyhavccwjuufqjx                 733     187065        .39
pxvcbhpfmeuwen                      729     187065        .39
nhezqmgkqccdcog                     716     187065        .38
ixtljofuazhpp                       628     187065        .34
mbotmyoairyqbdg                     600     187065        .32
aocabnswozdg                        586     187065        .31
qtdbdbuffprxj                       586     187065        .31
suiwwwtolcnkbnkcfwxw                560     187065         .3
btbdbtaztym                         562     187065         .3
xmvdgeczrot                         567     187065         .3
gkwzuwlgsh                          538     187065        .29
yhgmyqgqkvzmvbpsi                   463     187065        .25
obahkcsiafagmkxxu                   457     187065        .24
tcjoiyhzpuutbfhegn                  448     187065        .24
brlrkszrpmtw                        426     187065        .23
spzvvwzvoh                          435     187065        .23
ckvvarzrtpedhjl                     334     187065        .18
mnamshpswmsif                       329     187065        .18
gwvwypqrpwxkbugms                   293     187065        .16
clkuqearqhguezagwjlc                302     187065        .16
fskwtuctalnxhhu                     306     187065        .16
dyljhijlwyfmjy                      277     187065        .15
eqdifihzda                          255     187065        .14
nkrvwqrldd                          229     187065        .12
jkevokwkwokbnkbx                    231     187065        .12
niatevdadex                         198     187065        .11
qfjiifhiubvstyp                     200     187065        .11
acquoqslwxedulsmekbu                180     187065         .1
hjfimltalvopvuum                    156     187065        .08
bmyyhtzdqebo                        152     187065        .08
tkehpofgfe                          157     187065        .08
qnhgoqgstvq                         131     187065        .07
ddlloytpmyhpzpmk                    125     187065        .07
bvborwlump                          133     187065        .07
juirsmdjuejot                       124     187065        .07
saeoojlzfepjfybpclls                129     187065        .07
ppfdzcozyhxmf                       135     187065        .07
xbrwqdopemjqrhebglzs                115     187065        .06
zrfdsejwxgitsorat                   119     187065        .06
vsmsefmxnkoeb                        97     187065        .05
dvhoxhzayqyhsny                      98     187065        .05
cwjpiesdhpxvjs                      101     187065        .05
bvffncgtiojaj                        98     187065        .05
flfuciblxfzdcbo                      87     187065        .05
asucxewzdruwyfmg                     66     187065        .04
dztczkivnnbuzyuh                     67     187065        .04
hslcljfjmrhduw                       73     187065        .04
lywascpugxxjidx                      74     187065        .04
cawgdisaagltmie                      73     187065        .04
dpqbuvfkef                           68     187065        .04
tjpaushwfftmm                        54     187065        .03
ulgktcqvsazktojskfo                  56     187065        .03
ckoayyqtcxybkukgqvgl                 49     187065        .03
aqmphwbrngbotpxehyhe                 49     187065        .03
asqllnbivusyu                        60     187065        .03
skvrrqoshnirxbud                     59     187065        .03
vobtqhfewje                          58     187065        .03
xurrifyontk                          34     187065        .02
rliqgkwdwao                          42     187065        .02
ziwftblonh                           36     187065        .02
wzybbgmxkx                           30     187065        .02
vhumxxhimauc                         16     187065        .01
vssaefphdkebdx                       26     187065        .01
vjbjocafrwgdv                        11     187065        .01
xgcwcxrlzrjvysanlijw                 14     187065        .01
ztzxyfykhnxozdn                      20     187065        .01
btlbvffybkyjsn                       12     187065        .01
pmmqhsjuqhdaf                        22     187065        .01
kaumdvjqfcprmaeygr                   11     187065        .01
kadliemxdxhtzleqjr                   28     187065        .01
hlebszxkejjgkfbpmjew                 16     187065        .01
atwvjasykmlmldbipswh                 16     187065        .01
                                                     ----------
sum                                                      99.95

Similarly to check DB Time for sessions in past, we need to use DBA_HIST_ACTIVE_SESS_HISTORY table. Following query will provide % DB time used by individual modules calculated as % of total DB time between 2 AWR snapshots

set line 999
set pagesize 999
set verify off
BREAK ON REPORT
COLUMN DUMMY HEADING ''
compute sum of PCT_CPU on REPORT
with total_cpu as
(select   count(1)*10
from      dba_hist_active_sess_history
where     snap_id >= &begin_snap_id
and       snap_id < &end_snap_id
and       SESSION_TYPE = 'FOREGROUND')
select    module,
          count(1)*10 "module_cpu",
          (select * from total_cpu) "total_cpu",
          round((count(1)*10/(select * from total_cpu))*100,2) PCT_CPU
from      dba_hist_active_sess_history
where     snap_id >= &&begin_snap_id
and       snap_id < &&end_snap_id
and       SESSION_TYPE = 'FOREGROUND'
group by module order by PCT_CPU desc

 

 

Note that we have used count(*) * 10 in above query because 1 of 10 samples go to this table. Also, you need to specify begin_snap_id and end_snap_id and module DB Time usage during those snapshots will be calculated.

CPU Time

If we want to ignore waits and only focus on pure CPU used by modules, we can list modules as per CPU time they consume as % of total CPU time of database.

I think DB time is still better metric than CPU time because DB time covers waits as well. Waits slow down database.

We can calculate CPU time of a session by doing the same count(*) from V$ACTIVE_SESSION_HISTORY but we need to count only those samples during which session was actually using CPU. An ACTIVE session could be either using CPU to do some processing or it could be actively waiting on something (example IO when doing db file sequential read). Active session using CPU is represented by session_state = ‘ON CPU’. So we will count only those samples for session where session_state was ON CPU

Following query will give CPU time used by each module as % of total CPU time using V$ACTIVE_SESSION_HISTORY


set line 999
set pagesize 999
set verify off
BREAK ON REPORT
COLUMN DUMMY HEADING ''
compute sum of PCT_CPU on REPORT
with total_cpu as
(select count(1)
from  V$ACTIVE_SESSION_HISTORY
where   SESSION_TYPE != 'BACKGROUND'
and session_state = 'ON CPU')
select  module,
count(1) "module_cpu",
(select * from total_cpu) "total_cpu",
round((count(1)/(select * from total_cpu))*100,2) PCT_CPU
from  V$ACTIVE_SESSION_HISTORY
where SESSION_TYPE != 'BACKGROUND'
and   session_state = 'ON CPU'
group by module order by PCT_CPU desc

We can find similar CPU time for past sessions using DBA_HIST_ACTIVE_SESS_HISTORY table. Following query will provide CPU time used by module as % of total CPU time between 2 AWR snapshots


set line 999
set pagesize 999
set verify off
BREAK ON REPORT
COLUMN DUMMY HEADING ''
compute sum of PCT_CPU on REPORT
with total_cpu as
(select count(1)*10
from  dba_hist_active_sess_history
where   snap_id >= &&begin_snap_id
and   snap_id < &&end_snap_id
and   SESSION_TYPE != 'BACKGROUND'
and session_state = 'ON CPU')
select  module,
count(1)*10 "module_cpu",
(select * from total_cpu) "total_cpu",
round((count(1)*10/(select * from total_cpu))*100,2) PCT_CPU
from  dba_hist_active_sess_history
where   snap_id >= &&begin_snap_id
and   snap_id < &&end_snap_id
and   SESSION_TYPE != 'BACKGROUND'
and   session_state = 'ON CPU'
group by module order by PCT_CPU desc

Note, that in above query from DBA_HIST_ACTIVE_SESS_HISTORY, we have used count(*) * 10 because only 1 of 10 samples goes into this table.

I think for now above information is sufficient to give a basic idea about ASH.

I will be writing more articles on performance tuning using ASH as I dig further.

Hope this helps !!

Advertisements

Understanding SQL Trace file – Oracle Database 10g

Introduction

Many times when the application runs slow one of the usual approach for a DBA is to enable tracing and get the SQL queries the application is running. We check some of the SQL that application runs is becoming bottle neck here. But getting a trace file for different application can be very tricky. The reason is that when a user is connecting from the front end application its very difficult for us (the DBA) to track the session information. When he login to the application and does some database activity a new session will get created, its difficult to know if that session belongs to the user we intend to monitor. Also if the application is written efficiently then inbetween the application usage a client might get disconnected and then reconnect again when some database operation needs to be performed. Again the SID changes. So its very difficult for a DBA to keep tracks of sessions an application is changing.

So in order to get around this problem we have something called End-to-End application tracing.

End to End Tracing:

In end to end application tracing, instead of tracing the session we trace using different parameters like

Client identifier – specifies an end user based on the logon Id, such as HR.HR
Service – specifies a group of applications with common attributes, service level thresholds, and priorities; or a single application, such as ACCTG for an accounting application
Module – specifies a functional block, such as Accounts Receivable or General Ledger, of an application
Action – specifies an action, such as an INSERT or UPDATE operation, in a module
Session – specifies a session based on a given database session identifier (SID), on the local instance
Instance – specifies a given instance based on the instance name

To gather the appropriate statistics using PL/SQL, you need to enable statistics gathering for client identifier, service, module, or action using procedures in the DBMS_MONITOR package.

Example if you want to enable statistics gathering for user OE. You can enable statistics for this CLIENT_ID using following procedure

EXECUTE DBMS_MONITOR.CLIENT_ID_STAT_ENABLE(client_id => ‘OE.OE’);

So when ever any clients logs into the database using OE as user ID, all the activities for that user will get traced in the trace file. You can see the client identifier using CLIENT_IDENTIFIER column in v$session table.

EXECUTE DBMS_MONITOR.CLIENT_ID_STAT_DISABLE(client_id => ‘OE.OE’); is used to disable the tracing for that client ID.

Similarly you can also enable the tracing for any service, any module, for an required action etc. For more information on end to end tracing, kindlly check the link in reference section of this post.

TKPROF:

coming to the contents of trace file. Initially when you get a trace file generated you wont be able to understand as its not in a readable format. You need to use a utility called TKPROF inorder to make the trace file readable.
The basic inputs to tkprof are 2 file names. First filename will be the name of the trace file which you want to format and second file name will be filename for output file.
Example: tkprof jdbc_traces.trc jdbc_traces.txt

Contents of Trace file:

After you use the tkprof utility you will get the output file in readable format. The complete trace file contains well formated blocks. Every block contains the SQL, statistics about that SQL, Miss statistics, query Explain plan and wait statistics.
The sample block from one of the trace file is as shown below

********************************************************************************

SQL ID : 74637j0wuhxfx
Select BEG_VERSION, EXT_VERSION, FIRST_DATE, FIRST_USER, IND_CHANGE,
 INT_VERSION, I_PROJECT, LAST_DATE, LAST_USER, PROJECT_CODE, PROJECT_NAME,
 PRV_VERSION, V_LAST_DATE
from
 SNP_PROJECT where I_PROJECT = :1

call     count       cpu    elapsed       disk      query    current        rows
------- ------  -------- ---------- ---------- ---------- ----------  ----------
Parse        9      0.00       0.00          0          0          0           0
Execute    823      0.02       0.02          0          0          0           0
Fetch      823      0.05       0.05          0       1646          0         823
------- ------  -------- ---------- ---------- ---------- ----------  ----------
total     1655      0.08       0.08          0       1646          0         823

Misses in library cache during parse: 1
Misses in library cache during execute: 1
Optimizer mode: ALL_ROWS
Parsing user id: 100

Rows     Row Source Operation
-------  ---------------------------------------------------
 1  TABLE ACCESS BY INDEX ROWID SNP_PROJECT (cr=2 pr=0 pw=0 time=0 us cost=1 size=108 card=1)
 1   INDEX UNIQUE SCAN PK_PROJECT (cr=1 pr=0 pw=0 time=0 us cost=0 size=0 card=1)(object id 81820)

Elapsed times include waiting on following events:
 Event waited on                             Times   Max. Wait  Total Waited
 ----------------------------------------   Waited  ----------  ------------
 SQL*Net message to client                     823        0.00          0.00
 SQL*Net message from client                   822      103.76        173.63
********************************************************************************

Now Lets understand the meaning of these statistics. If we consider the call statistics table we have columns like count, cpu, elapsed, disk, query, current and rows. The meaning of these columns will be given at the starting of your trace file.
COUNT   – Number of times a statement was parsed, executed, or fetched.

CPU     – Total CPU time in seconds for all parse, execute, or fetch calls for the statement. This value is zero (0) if TIMED_STATISTICS is not turned on.

ELAPSED – Total elapsed time in seconds for all parse, execute, or fetch calls for the statement. This value is zero (0) if TIMED_STATISTICS is not turned on.

DISK    – Total number of data blocks physically read from the datafiles on disk for all parse, execute, or fetch calls.

QUERY   – Total number of buffers retrieved in consistent mode for all parse, execute, or fetch calls. Usually, buffers are retrieved in consistent mode for queries.

CURRENT – Total number of buffers retrieved in current mode. Buffers are retrieved in current mode for statements such as INSERT, UPDATE, and DELETE.

ROWS    – Total number of rows processed by the SQL statement. This total does not include rows processed by subqueries of the SQL statement.
Also the rows gives statistics about 3 calls :- Parse, Execute and Fetch
PARSE   – Translates the SQL statement into an execution plan, including checks for proper security authorization and checks for the existence of tables, columns, and other referenced objects.

EXECUTE – Actual execution of the statement by Oracle. For INSERT, UPDATE, and DELETE statements, this modifies the data. For SELECT statements, this identifies the selected rows.

FETCH   – Retrieves rows returned by a query. Fetches are only performed for SELECT statements.
For example in the above table we have following statistics
– SQL query has been parsed 9 times

– SQL query got executed 823 times

– fetch operation has been done 823 times

– CPU time taken for parsing is less then 1/100th of a second

– CPU time for execute was 0.02 sec

– CPU time for fetch was 0.06 sec

– there was no elapsed time involved for parsing

– elapsed time for execute was 0.02 sec

– elapsed time for fetch was 0.05

– No block was fetch from disk, so no physical reads

– blocks received in consistent mode (1646) during fetch operation. Since this is a Select statement the blocks are shown during Fetch operation. If its a DML statement then blocks will be shown during Execute operation.

– none of the blocks are received in current mode. You will see this value gets updated during DML statement or if some DML activites are done

– Total number of rows processed are 823
Other then this you can also see the Misses in library cache during each call. If there is no miss then it wont be mentioned. The 1 miss for the SQL is very much acceptable since when a SQL runs for the first time it need to be parsed and executed and execution plan will be stored. So parse and execute call will have 1 misses.

If you see the statement parse call happened 9 times but the miss count is only 1, meaning that the statement was parsed only once and was stored in library cache. For next 8 parses the same parsed statement from library cache was used. So we have miss 1 and hit 8.

Similarly execution plan was made only once and 822 times Oracle used same execution plan from library cache.
Next it gives the execution plan for the query as shown below

Rows     Row Source Operation
-------  ---------------------------------------------------
 1  TABLE ACCESS BY INDEX ROWID SNP_PROJECT (cr=2 pr=0 pw=0 time=0 us cost=1 size=108 card=1)
 1   INDEX UNIQUE SCAN PK_PROJECT (cr=1 pr=0 pw=0 time=0 us cost=0 size=0 card=1)(object id 81820)

Here we have several parameters shown in bracket (cr=2 pr=0 pw=0 time=0 us cost=1 size=108 card=1)

cr = Consistent Reads

pr = Physical Reads

pw = Physical Writes

time = time taken by this step in microseconds

cost = cost incured by the step

size = represent the size of data in that step

card = cardinality

Finally at the end SQL trace give the wait time event for that query, the max wait time and total time it waited.

Concluding from the Traces:

With the values above we need make conclusions and decide whether to tune the SQL or not. Unless we have a locking issue or bad performing SQLs we should not worry about CPU time or the elapsed time. because timings come into consideration only when we have bad performing SQLs. The key is the number of block visits, both query (that is, subject to read consistency) and current (that is, not subject to read consistency). Segment headers and blocks that are going to be updated are acquired in current mode, but all query and subquery processing requests the data in query mode.

References:

Oracle® Database Performance Tuning Guide

Hope this helps !!