Database Statistics and CBO

Introduction

One of the greatest problems with the Oracle Cost-based optimizer was the failure of the Oracle DBA to gather accurate schema statistics. In order for the CBO to make an intelligent decision about the best execution plan for your SQL, it must have information about the table and indexes that participate in the query. CBO depends on the information about tables and indexes present in data dictionary. This information includes size of the tables, its extent distribution, cardinality, and selectivity of column values etc. Based on these details CBO makes an attempt to come up with a highly optimized plan for executing the query.

Optimizer statistics include the following:

  • Table statistics
    • Number of rows
    • Number of blocks
    • Average row length
  • Column statistics
    • Number of distinct values (NDV) in column
    • Number of nulls in column
    • Data distribution (histogram)
  • Index statistics
    • Number of leaf blocks
    • Levels
    • Clustering factor
  • System statistics
    • I/O performance and utilization
    • CPU performance and utilization

Gathering statistics Automatically and Manually

Usually statistics gathering in 10g is automated using a scheduler job “GATHER_STATS_JOB”. This job is created by default when a database is created and run every day to collect statistics.

We can also gather statistics manually using DBMS_STATS. But why do we need to manually gather statistics? This is because, there are cases where automatic statistics gathering may not be adequate. Because the automatic statistics gathering runs during an overnight batch window, the statistics on tables which are significantly modified during the day may become stale. Example in case of bulk load of data into the table. Following are the procedures available in DBMS_STATS package to gather statistics

GATHER_INDEX_STATS – Index statistics
GATHER_TABLE_STATS – Table, column, and index statistics
GATHER_SCHEMA_STATS – Statistics for all objects in a schema
GATHER_DICTIONARY_STATS – Statistics for all dictionary objects
GATHER_DATABASE_STATS – Statistics for all objects in a database

Automatic sampling

While gathering the statistics the important part comes in deciding the sample size to be considered while gathering the stats. Sampling is specified using the ESTIMATE_PERCENT argument to the DBMS_STATS procedures. The better the quality of the statistics, the better the job that the CBO will do when determining your execution plans. Unfortunately, doing a complete analysis on a large  database could take days to finish. So to avoid this we tell DBMS_STATS to consider some percentage of rows on which it can base its analysis.

In earlier releases, the DBA had to guess what percentage of the database provided the best sample size and sometimes underanalyzed the schema. Starting with Oracle9i Database, the  estimate_percent argument is a great way to allow Oracle’s dbms_stats to automatically estimate the “best” percentage of a segment to sample when gathering statistics

estimate_percent => dbms_stats.auto_sample_size

AUTO_SAMPLE_SIZE lets Oracle determine the best sample size necessary for good statistics, based on the statistical property of the object. Because each type of statistics has different requirements, the size of the actual sample taken may not be the same across the table, columns, or indexes. For example, to collect table and column statistics for all tables in the OE schema with auto-sampling, you could use:

EXECUTE DBMS_STATS.GATHER_SCHEMA_STATS(‘OE’,DBMS_STATS.AUTO_SAMPLE_SIZE);

When the ESTIMATE_PERCENT parameter is manually specified, the DBMS_STATS gathering procedures may automatically increase the sampling percentage if the specified percentage did not produce a large enough sample. This ensures the stability of the estimated values by reducing fluctuations.

Determining Stale Statistics

The important questions comes here is how does Oracle knows that the statistics for the table or index is stale and should be gathered? In Oracle 9i you could check if the data in a table had changed significantly by turning on the table monitoring option (ALTER TABLE … MONITORING) and then checking the view DBA_TAB_MODIFICATIONS for those tables. In 10g, the MONITORING statement is gone. Instead, statistics are collected automatically if the initialization parameter STATISTIC_LEVEL is set to TYPICAL or ALL. This is done by pre defined job “GATHER_STATS_JOB” mentioned previously.

In order to determine whether or not a given database object needs new database statistics, Oracle provides a table monitoring facility. This monitoring is enabled by default when STATISTICS_LEVEL is set to TYPICAL or ALL. Monitoring tracks the approximate number of INSERTs, UPDATEs, and DELETEs for that table, as well as whether the table has been truncated, since the last time statistics were gathered. The information about changes of tables can be viewed in the USER_TAB_MODIFICATIONS view. Following a data-modification, there may be a few minutes delay while Oracle propagates the information to this view. Use the DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO procedure to immediately reflect the outstanding monitored information kept in the memory.

The GATHER_DATABASE_STATS or GATHER_SCHEMA_STATS procedures gather new statistics for tables with stale statistics when the OPTIONS parameter is set to GATHER STALE or GATHER AUTO. If a monitored table has been modified more than 10%, then these statistics are considered stale and gathered again.

So all the modifications (update, insert, delete) are stored in USER_TAB_MODIFICATIONS table and these are compared with number of rows in a table at the time of statistics collection. If this ratio is greater then 10% then that table is eligible for statistics collection.

Dynamic Sampling

What is dynamic sampling ?

Dynamic sampling is used at compile time when a CBO is making query plan. This involves gathering the information about tables and indexes and other data dictionaly information required in the query. When we gather statistics this information is available ready, but in cases where the statistics are not gathered, we can considered using dynamic samplying where the stats are gathered dynamically and based on this information query plan is generated.

How Dynamic Sampling Works ?

The primary performance attribute is compile time. Oracle determines at compile time whether a query would benefit from dynamic sampling. If so, a recursive SQL statement is issued to scan a small random sample of the table’s blocks, and to apply the relevant single table predicates to estimate predicate selectivities. The sample cardinality can also be used, in some cases, to estimate table cardinality. Any relevant column and index statistics are also collected.

When to Use Dynamic Sampling ?

For a query that normally completes quickly (in less than a few seconds), you will not want to incur the cost of dynamic sampling. However, dynamic sampling can be beneficial under any of the following conditions:

  • A better plan can be found using dynamic sampling.
  • The sampling time is a small fraction of total execution time for the query.
  • The query will be executed many times.

How to Use Dynamic Sampling to Improve Performance ?

You control dynamic sampling with the OPTIMIZER_DYNAMIC_SAMPLING parameter, which can be set to a value from 0 to 10. The default is 2.

  • A value of 0 means dynamic sampling will not be done.
  • Increasing the value of the parameter results in more aggressive application of dynamic sampling, in terms of both the type of tables sampled (analyzed or unanalyzed) and the amount of I/O spent on sampling.

For dynamic sampling levels refer Oracle documentation

References:

Oracle documentation

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s