Oracle Performance Tuning – ASH Basics

Recently I got an opportunity to dig into performance of database. We were not facing any performance issue, but we wanted to understand whats happening under the hood. What are the bottlenecks and how do we prepare and scale our database further.

In future the load on the databases are going to be crazy and we want to make sure that we look into every consumer of our database and optimize the usage.

I went back into the basics of database performance tuning – ASH. Active session history is the beast which has loads of information and provides information at very granular level – “session”. We can dig into ASH data and generate different reports rolled-up against various entities like module, action, sql_id, user_id etc. We can determine how much each of these entities are using database resources.

DB Time:

One of the most fundamental measure of database performance is DB Time. This is the amount of time a session has spend on database activities. This is a combination of CPU Time + non-Idle wait times. This does not consider any idle wait time so even if session is done with its activity but just sitting idle doing nothing, that idle time is not considered part of DB Time.

This gives a very accurate measure of amount of database resources a session is burning.

Active Session History:

ASH or Active Session History provides snapshots of active sessions connected to database (status=ACTIVE in v$session) and loads of other information related to those sessions about what they were doing when snapshot was taken. To understand how we can extract this information and make sense about it, we need to understand how ASH is build and what that information means.

V$ACTIVE_SESSION_HISTORY

Active session history is exposed by database using a view V$ACTIVE_SESSION_HISTORY. This view stores samples taken every 1 sec. A background process MMNL (Memory manager lite) takes a snapshot from v$session and v$session_wait about the current state of active sessions and dumps the information in V$ACTIVE_SESSION_HISTORY. Note that only active sessions’s snapshot is taken.

V$ACTIVE_SESSION_HISTORY is a memory view stored in the form of circular buffer. It means that this view has size limit on its growth (approx 2MB / CPU). So when it reaches to that size it starts overwriting the oldest records in circular buffer. You can visualize the same using following figure

 

ash-architecture-and-advanced-usage-rmoug2014-7-638

 

 

 

 

 

 

 

 

So how these samples are useful to us in finding DB Time ?

Lets say a session has spend 10 sec of active time in database. It means DB time for that session is 10 sec. During this 10 sec of active time, we will have 10 samples of that session taken as interval time between snapshot is 1 sec. So if we just take a count(*) from V$ACTIVE_SESSION_HISTORY, we will see 10 records for that session. This means that to get DB Time of a session  we just have to take count(*) from V$ACTIVE_SESSION_HISTORY for that session.

Its little difficult to adjust to the fact that time parameter can be obtained using count(*) from a view. But this is because interval time between samples is 1 sec.

So DB Time is actually – [ Number of samples ] X [ Time interval between samples ]

Since time interval between samples is 1 sec, we can get DB time as number of samples which is count(*) from V$ACTIVE_SESSION_HISTORY

Again, calculating DB Time using count(*) from ASH is approximation as sessions which are taking fraction of times < 1 sec will not be counted if they happens between 2 samples and DB time of other sessions which ends just before next samples is taken will be counted until last samples when they were active. But those were very few cases and amount of DB time not measure in such cases will be very less compared to total DB time we are measuring. You can visualize the same using following figure.

 

Aas_ashDBA_HIST_ACTIVE_SESS_HISTORY

DBA_HIST_ACTIVE_SESS_HISTORY is a history table which maintains history of active sessions between AWR snapshots. Since V$ACTIVE_SESSION_HISTORY is a memory view and has limited space for growth, data from V$ACTIVE_SESSION_HISTORY flows to DBA_HIST_ACTIVE_SESS_HISTORY table.

DBA_HIST_ACTIVE_SESS_HISTORY table enables us to view active session data from the past. Data in DBA_HIST_ACTIVE_SESS_HISTORY is retained based on retention level set for AWR snapshots (30 days by default). So every time AWR snapshot happens, data from V$ACTIVE_SESSION_HISTORY goes into this table.

But NOT all data from V$ACTIVE_SESSION_HISTORY goes into DBA_HIST_ACTIVE_SESS_HISTORY. When AWR snapshot happens, it picks only 1 out of 10 samples from V$ACTIVE_SESSION_HISTORY into history table.

Complete visualization of active sessions looks like below

ash-architecture-and-advanced-usage-rmoug2014-7-638

 

 

 

 

 

 

 

 

Since AWR snapshot interval is 1 hour, it will take AWR snapshots every 1 hour and during that time 1 out of 10 samples form V$ACTIVE_SESSION_HISTORY will move to DBA_HIST_ACTIVE_SESS_HISTORY table.

So if want to calculate DB Time from DBA_HIST_ACTIVE_SESS_HISTORY table, we need to consider sample time as 10 sec instead of 1 sec.

DB Time = [ Number of samples ] X [ Time interval between samples ]
        = count(1) * 10

That’s why when we are calculating DB Time of a session or SQL from DBA_HIST_ACTIVE_SESS_HISTORY table, we need to take count(*) * 10.

Calculating DB Time

With this brief understanding of ASH, we can use these tables to find out DB time for a session and consolidate sessions to calculate DB Time for module/user etc.

If we can list modules in terms of % of DB time they are consuming in database, modules with high % of DB time are the one using most resources in database and has high impact to database. Same holds true at user level as well. We can find out which user is contributing highest DB time.

I have given couple of queries to find out % of DB Time different modules are contributing in database. This gives a fair idea about module wise resource consumption in database.

Note that I have randomized module names using DBMS_RANDOM.string(‘L’,TRUNC(DBMS_RANDOM.value(10,21))) to avoid exposing actual module name outside.


set line 999
set pagesize 999
set verify off
col module format a30;
BREAK ON REPORT
COLUMN DUMMY HEADING ''
compute sum of PCT_CPU on REPORT
with total_cpu as
(select   count(1)
 from     V$ACTIVE_SESSION_HISTORY
 where    SESSION_TYPE = 'FOREGROUND')
 select   module,
          count(1) "module_cpu",
          (select * from total_cpu) "total_cpu",
          round((count(1)/(select * from total_cpu))*100,2) PCT_CPU
from      V$ACTIVE_SESSION_HISTORY
where     SESSION_TYPE = 'FOREGROUND'
group by module order by PCT_CPU desc;

MODULE                         module_cpu  total_cpu    PCT_CPU
------------------------------ ---------- ---------- ----------
vvadcagpas                        29355     187065      15.69
cbkcbqchts                        26506     187065      14.17
zugkvhbajxpff                     24655     187065      13.18
tvfqhrkqzktiiceolrfd              10046     187065       5.37
paczxkndgsaqt                      6740     187065        3.6
pzqacdgxeqzbtqhalfc                6220     187065       3.33
nvqqscmayx                         5905     187065       3.16
zbirttdyoxxbslopnm                 5412     187065       2.89
bbztiihehigeqwxvk                  5051     187065        2.7
kvosekxfhkmfw                      4687     187065       2.51
aenyopwmok                         3977     187065       2.13
brfqfuwdyhyrej                     3256     187065       1.74
abruxykxhvqgfg                     3227     187065       1.73
zfrahazoqgdxubpni                  2759     187065       1.47
ddxinnevqd                         2633     187065       1.41
lructcsjkxhxatfidpf                2518     187065       1.35
naawhsfqvcaoathn                   2246     187065        1.2
sljzltjhhigepqfmcji                2142     187065       1.15
chvjvntynkjb                       2102     187065       1.12
zlzhwwhzbhfdtsszfedj               2005     187065       1.07
yvkuwzjkjhyeamco                   1610     187065        .86
fdjwspubzyhyxyzzkgyi               1578     187065        .84
risowlcnkizrbpgt                   1557     187065        .83
ywwdycosdpcfsflaubia               1539     187065        .82
pxlqrkmvreppum                     1496     187065         .8
amivlnlivqeesnqdkto                1404     187065        .75
jjaqyosqsc                         1347     187065        .72
yxkgfgktafvx                       1354     187065        .72
gqistslkwdw                        1328     187065        .71
bhuzvdpklocmnyimtzux               1272     187065        .68
uimycabvbjcczibrn                  1053     187065        .56
ksnvqslyker                         997     187065        .53
ofekwpklwpjtha                      975     187065        .52
wmkmovueneno                        904     187065        .48
umdtggtvdptkubfs                    795     187065        .42
dddegpqpwyalotxtoh                  761     187065        .41
pwycwyhavccwjuufqjx                 733     187065        .39
pxvcbhpfmeuwen                      729     187065        .39
nhezqmgkqccdcog                     716     187065        .38
ixtljofuazhpp                       628     187065        .34
mbotmyoairyqbdg                     600     187065        .32
aocabnswozdg                        586     187065        .31
qtdbdbuffprxj                       586     187065        .31
suiwwwtolcnkbnkcfwxw                560     187065         .3
btbdbtaztym                         562     187065         .3
xmvdgeczrot                         567     187065         .3
gkwzuwlgsh                          538     187065        .29
yhgmyqgqkvzmvbpsi                   463     187065        .25
obahkcsiafagmkxxu                   457     187065        .24
tcjoiyhzpuutbfhegn                  448     187065        .24
brlrkszrpmtw                        426     187065        .23
spzvvwzvoh                          435     187065        .23
ckvvarzrtpedhjl                     334     187065        .18
mnamshpswmsif                       329     187065        .18
gwvwypqrpwxkbugms                   293     187065        .16
clkuqearqhguezagwjlc                302     187065        .16
fskwtuctalnxhhu                     306     187065        .16
dyljhijlwyfmjy                      277     187065        .15
eqdifihzda                          255     187065        .14
nkrvwqrldd                          229     187065        .12
jkevokwkwokbnkbx                    231     187065        .12
niatevdadex                         198     187065        .11
qfjiifhiubvstyp                     200     187065        .11
acquoqslwxedulsmekbu                180     187065         .1
hjfimltalvopvuum                    156     187065        .08
bmyyhtzdqebo                        152     187065        .08
tkehpofgfe                          157     187065        .08
qnhgoqgstvq                         131     187065        .07
ddlloytpmyhpzpmk                    125     187065        .07
bvborwlump                          133     187065        .07
juirsmdjuejot                       124     187065        .07
saeoojlzfepjfybpclls                129     187065        .07
ppfdzcozyhxmf                       135     187065        .07
xbrwqdopemjqrhebglzs                115     187065        .06
zrfdsejwxgitsorat                   119     187065        .06
vsmsefmxnkoeb                        97     187065        .05
dvhoxhzayqyhsny                      98     187065        .05
cwjpiesdhpxvjs                      101     187065        .05
bvffncgtiojaj                        98     187065        .05
flfuciblxfzdcbo                      87     187065        .05
asucxewzdruwyfmg                     66     187065        .04
dztczkivnnbuzyuh                     67     187065        .04
hslcljfjmrhduw                       73     187065        .04
lywascpugxxjidx                      74     187065        .04
cawgdisaagltmie                      73     187065        .04
dpqbuvfkef                           68     187065        .04
tjpaushwfftmm                        54     187065        .03
ulgktcqvsazktojskfo                  56     187065        .03
ckoayyqtcxybkukgqvgl                 49     187065        .03
aqmphwbrngbotpxehyhe                 49     187065        .03
asqllnbivusyu                        60     187065        .03
skvrrqoshnirxbud                     59     187065        .03
vobtqhfewje                          58     187065        .03
xurrifyontk                          34     187065        .02
rliqgkwdwao                          42     187065        .02
ziwftblonh                           36     187065        .02
wzybbgmxkx                           30     187065        .02
vhumxxhimauc                         16     187065        .01
vssaefphdkebdx                       26     187065        .01
vjbjocafrwgdv                        11     187065        .01
xgcwcxrlzrjvysanlijw                 14     187065        .01
ztzxyfykhnxozdn                      20     187065        .01
btlbvffybkyjsn                       12     187065        .01
pmmqhsjuqhdaf                        22     187065        .01
kaumdvjqfcprmaeygr                   11     187065        .01
kadliemxdxhtzleqjr                   28     187065        .01
hlebszxkejjgkfbpmjew                 16     187065        .01
atwvjasykmlmldbipswh                 16     187065        .01
                                                     ----------
sum                                                      99.95

Similarly to check DB Time for sessions in past, we need to use DBA_HIST_ACTIVE_SESS_HISTORY table. Following query will provide % DB time used by individual modules calculated as % of total DB time between 2 AWR snapshots

set line 999
set pagesize 999
set verify off
BREAK ON REPORT
COLUMN DUMMY HEADING ''
compute sum of PCT_CPU on REPORT
with total_cpu as
(select   count(1)*10
from      dba_hist_active_sess_history
where     snap_id >= &begin_snap_id
and       snap_id < &end_snap_id
and       SESSION_TYPE = 'FOREGROUND')
select    module,
          count(1)*10 "module_cpu",
          (select * from total_cpu) "total_cpu",
          round((count(1)*10/(select * from total_cpu))*100,2) PCT_CPU
from      dba_hist_active_sess_history
where     snap_id >= &&begin_snap_id
and       snap_id < &&end_snap_id
and       SESSION_TYPE = 'FOREGROUND'
group by module order by PCT_CPU desc

 

 

Note that we have used count(*) * 10 in above query because 1 of 10 samples go to this table. Also, you need to specify begin_snap_id and end_snap_id and module DB Time usage during those snapshots will be calculated.

CPU Time

If we want to ignore waits and only focus on pure CPU used by modules, we can list modules as per CPU time they consume as % of total CPU time of database.

I think DB time is still better metric than CPU time because DB time covers waits as well. Waits slow down database.

We can calculate CPU time of a session by doing the same count(*) from V$ACTIVE_SESSION_HISTORY but we need to count only those samples during which session was actually using CPU. An ACTIVE session could be either using CPU to do some processing or it could be actively waiting on something (example IO when doing db file sequential read). Active session using CPU is represented by session_state = ‘ON CPU’. So we will count only those samples for session where session_state was ON CPU

Following query will give CPU time used by each module as % of total CPU time using V$ACTIVE_SESSION_HISTORY


set line 999
set pagesize 999
set verify off
BREAK ON REPORT
COLUMN DUMMY HEADING ''
compute sum of PCT_CPU on REPORT
with total_cpu as
(select count(1)
from  V$ACTIVE_SESSION_HISTORY
where   SESSION_TYPE != 'BACKGROUND'
and session_state = 'ON CPU')
select  module,
count(1) "module_cpu",
(select * from total_cpu) "total_cpu",
round((count(1)/(select * from total_cpu))*100,2) PCT_CPU
from  V$ACTIVE_SESSION_HISTORY
where SESSION_TYPE != 'BACKGROUND'
and   session_state = 'ON CPU'
group by module order by PCT_CPU desc

We can find similar CPU time for past sessions using DBA_HIST_ACTIVE_SESS_HISTORY table. Following query will provide CPU time used by module as % of total CPU time between 2 AWR snapshots


set line 999
set pagesize 999
set verify off
BREAK ON REPORT
COLUMN DUMMY HEADING ''
compute sum of PCT_CPU on REPORT
with total_cpu as
(select count(1)*10
from  dba_hist_active_sess_history
where   snap_id >= &&begin_snap_id
and   snap_id < &&end_snap_id
and   SESSION_TYPE != 'BACKGROUND'
and session_state = 'ON CPU')
select  module,
count(1)*10 "module_cpu",
(select * from total_cpu) "total_cpu",
round((count(1)*10/(select * from total_cpu))*100,2) PCT_CPU
from  dba_hist_active_sess_history
where   snap_id >= &&begin_snap_id
and   snap_id < &&end_snap_id
and   SESSION_TYPE != 'BACKGROUND'
and   session_state = 'ON CPU'
group by module order by PCT_CPU desc

Note, that in above query from DBA_HIST_ACTIVE_SESS_HISTORY, we have used count(*) * 10 because only 1 of 10 samples goes into this table.

I think for now above information is sufficient to give a basic idea about ASH.

I will be writing more articles on performance tuning using ASH as I dig further.

Hope this helps !!

Advertisements

“show slave hosts” on master not reporting hostname

If you have multiple MySQL slaves or complex topology, its difficult to remember what slaves are replicating from which master.

We can go to slave servers and check “show slave status” which list master host, but if we want to find out other way, we have can run “show slave hosts” from master.
This will list down all the slaves that are getting replicated from this master.

Only problem I faced was, “show slave hosts” was not reporting hostnames. They were empty.

root [fleetdb] >show slave hosts;
+-----------+------+------+-----------+--------------------------------------+
| Server_id | Host | Port | Master_id | Slave_UUID                           |
+-----------+------+------+-----------+--------------------------------------+
|         3 |      | 3306 |         2 | c21f28d2-c243-11e4-baf5-2c600c20dba4 |
+-----------+------+------+-----------+--------------------------------------+
1 row in set (0.01 sec)

After bit of RTFM, I came to know we need to set a parameter – “report_host” while starting slave. This will register slave hostname while connecting to master (for replication) and same will be shown in “show slave hosts”

This is a read-only parameter meaning, we need to bounce mysql instance after setting this parameter in /etc/my.cnf

So I set the parameter in /etc/my.cnf as below

report_host=slave-host1.mydb.example.com

We can specify any name here and same will be registered in master. But it make sense to provide actual host name.

Once I bounce mysql instance, my slave is started and it connects to master.
Now I check “show slave hosts” and it shows this host.

root [fleetdb] >show slave hosts;
+-----------+------------------------------+------+-----------+--------------------------------------+
| Server_id | Host                         | Port | Master_id | Slave_UUID                           |
+-----------+------------------------------+------+-----------+--------------------------------------+
|         3 | slave-host1.mydb.example.com | 3306 |         2 | c21f28d2-c243-11e4-baf5-2c600c20dba4 |
+-----------+------------------------------+------+-----------+--------------------------------------+
1 row in set (0.00 sec)

So we can easily know what slaves are connected to this master.

Hope this helps !!