BNFD - Your DBA NoteBook On The Net!!!

Friday, April 9, 2010

Optimizer Statstics & Index Usage!!!

Optimizer Statistics -

To Verify -

SQL> SELECT TABLE_NAME, NUM_ROWS, BLOCKS, AVG_ROW_LEN,
TO_CHAR(LAST_ANALYZED, 'MM/DD/YYYY HH24:MI:SS')
FROM DBA_TABLES
WHERE TABLE_NAME IN ('SO_LINES_ALL','SO_HEADERS_ALL','SO_LAST_ALL');

TABLE_NAME NUM_ROWS BLOCKS AVG_ROW_LEN LAST_ANALYZED
------------ -------- ------- ----------- -------------
SO_HEADERS_ALL 1632264 209014 149 07/29/2009 00:59:51
SO_LINES_ALL 10493845 1922196 263 07/29/2009 01:16:09
SO_LAST_ALL


Note - SO_LAST_ALL has no statistics.

Verifying Index Statistics
To verify that index statistics are available and assist you in determining which are the best indexes to use in an application, execute the following statement against the dictionary DBA_INDEXES view:

SQL> SELECT INDEX_NAME "NAME", NUM_ROWS, DISTINCT_KEYS "DISTINCT",
1 LEAF_BLOCKS, CLUSTERING_FACTOR "CF", BLEVEL "LEVEL",
2 AVG_LEAF_BLOCKS_PER_KEY "ALFBPKEY"
3 FROM DBA_INDEXES
4 WHERE owner = 'SH'
5* ORDER BY INDEX_NAME;


NAME NUM_ROWS DISTINCT LEAF_BLOCKS CF LEVEL ALFBPKEY
------------------------------ ---------- ---------- ----------- ---------- ---------- ----------
CUSTOMERS_PK 50000 50000 454 4405 2 1
PRODUCTS_PK 10000 10000 90 1552 1 1
PRODUCTS_PROD_CAT_IX 10000 4 99 4422 1 24
PRODUCTS_PROD_SUBCAT_IX 10000 37 170 6148 2 4
SALES_PROD_BIX 6287 909 1480 6287 1 1
SALES_PROMO_BIX 4727 459 570 4727 1 1

6 rows selected.

Optimizer Index Determination Criteria
The optimizer uses the following criteria when determining which index to use:

Number of rows in the index (cardinality)

Number of distinct keys. These define the selectivity of the index.

Level or height of the index. This indicates how deeply the data 'probe' must search in order to find the data.

Number of leaf blocks in the index. This is the number of I/Os needed to find the desired rows of data.

Clustering factor (CF). This is the collocation amount of the index block relative to data blocks. The higher the CF, the less likely the optimizer is to select this index.

Average leaf blocks per key (ALFBKEY). Average number of leaf blocks in which each distinct value in the index appears, rounded to the nearest integer. For indexes that enforce UNIQUE and PRIMARY KEY constraints, this value is always one.

Determining if You Have Chosen the Right Index
Use the following notes to assist you in deciding whether you have chosen an appropriate index for a table, data, and query:

DISTINCT
Consider index ap_invoices_n3, the number of distinct keys is two. The resulting selectivity based on index ap_invoices_n3 is poor, and the optimizer is not likely to use this index. Using this index fetches 50% of the data in the table. In this case, a full table scan is cheaper than using index ap_invoices_n3.

Index Cost Tie
The optimizer uses alphabetic determination: If the optimizer determines that the selectivity, cost, and cardinality of two finalist indexes is the same, then it uses the two indexes' names as the deciding factor. It chooses the index with name beginning with a lower alphabetic letter or number.

Verifying Column Statistics
To verify that column statistics are available, execute the following statement against the dictionary's DBA_TAB_COL_STATISTICS view:

SQL> SELECT COLUMN_NAME, NUM_DISTINCT, NUM_NULLS, NUM_BUCKETS, DENSITY

FROM DBA_TAB_COL_STATISTICS
WHERE TABLE_NAME ="PA_EXPENDITURE_ITEMS_ALL"
ORDER BY COLUMN_NAME;




This returns the following data:

COLUMN_NAME NUM_DISTINCT NUM_NULLS NUM_BUCKETS DENSITY
------------------------------ ------------ ---------- ----------- ----------
BURDEN_COST 4300 71957 1 .000232558
BURDEN_COST_RATE 675 7376401 1 .001481481
CONVERTED_FLAG 1 16793903 1 1
COST_BURDEN_DISTRIBUTED_FLAG 2 15796 1 .5
COST_DISTRIBUTED_FLAG 2 0 1 .5
COST_IND_COMPILED_SET_ID 87 6153143 1 .011494253
EXPENDITURE_ID 1171831 0 1 8.5337E-07
TASK_ID 8648 0 1 .000115634
TRANSFERRED_FROM_EXP_ITEM_ID 1233787 15568891 1 8.1051E-07


Verifying column statistics are important for the following conditions:

Join conditions

When the WHERE clause includes a column(s) with a bind variable; for example:

column x = :variable_y



In these cases, the stored column statistics can be used to get a representative cardinality estimation for the given expression.

Consider the data returned in the previous example.

NUM_DISTINCT Column Statistic
Low
The number of distinct values for the columns CONVERTED_FLAG is one. In this case this column has only one value. If in the WHERE clause, then there is a bind variable on column CONVERTED_FLAG = :variable_y, say. If CONVERTED_FLAG is low, as the case in this example, then this leads to poor selectivity, and CONVERTED_FLAG is a poor candidate to be used as the index.

Column COST_BURDEN_DISTRIBUTED_FLAG: NUM_DISTINCT = 2. Likewise, this is low. COST_BURDEN_DISTRIBUTED_FLAG is not a good candidate for an index unless there is much skew or there are a lot of nulls. If there is data skew of, say, 90%, then 90% of the data has one particular value and 10% of the data has another value. If the query only needs to access the 10%, then a histogram is needed on that column in order for the optimizer to recognize the skew and use an index on this column.

High
NUM_DISTINCT is more than 1 million for column EXPEDITURE_ID. If there is a bind variable on column EXPENDITURE_ID, then this leads to high selectivity (implying high density of data on this column). In other words, EXPENDITURE_ID is a good candidate to be used as the index.

NUM_NULL Column Statistic
NUM_NULLS indicates the number of null statistics.

Low
For example, if a single column index has few nulls, such as the COST_DISTRIBUTED_FLAG column, and if this column is used as the index, then the resulting data set is large.

High
If there are many nulls on a particular column, such as the CONVERTED_FLAG column, and if this column is used as the index, then the resulting data set is small. This means that COST_DISTRIBUTED_FLAG is a more appropriate column to index.

DENSITY Column Statistic
This indicates the density of the values of that column. This is calculated by 1 over NUM_DISTINCT.

Column Statistics and Join Methods
Column statistics are useful to help determine the most efficient join method, which, in turn, is also based on the number of rows returned.