Tuesday, January 31, 2012

Locking statistics of tables

Having accurate statistics of database objects is mandatory in order to get the best execution path, and usually you can collect statistics using the DBMS_STATS package with no problem at all. But sometimes you have special tables that needs special treatment, like "temporary" tables (normal tables that are populated and its data modified very fast) that can change a lot the execution path of a SQL sentence and drive you crazy trying to pinpoint the root cause of that execution path changes.

As a demonstration of this behavior we will create two tables, one that will change a lot (test1) and other that is more or less static (test2):

SQL> create table test1 (id number primary key, id_char varchar2(15));

Table created.

SQL> declare
mynum number := 1;
begin
while mynum < 10 loop
insert into test1 values(mynum,to_char(mynum));
mynum := mynum + 1;
end loop;
end;
/

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

SQL> create table test2 (id number primary key, id_char varchar2(15));

Table created.

SQL> declare
mynum number := 1;
begin
while mynum < 100000 loop
insert into test2 values(mynum,to_char(mynum));
mynum := mynum + 1;
end loop;
end;
/

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

At the beginning test1 table has 9 records, and we want to join test1 and test2 tables to get a few records, but first we will gather statistics of test1:

SQL> BEGIN DBMS_STATS.GATHER_TABLE_STATS( OWNNAME => 'MYUSER', TABNAME => 'TEST1'); END;
2 /

PL/SQL procedure successfully completed.

SQL> select a.ID, b.ID_CHAR from test1 a join test2 b on (a.ID=b.ID) where a.ID between 4 and 7;

ID ID_CHAR
---------- ---------------
4 4
5 5
6 6
7 7

Next we will get the current execution plan for this small query:

SQL> EXPLAIN PLAN FOR
select a.ID, b.ID_CHAR from test1 a join test2 b on (a.ID=b.ID) where a.ID between 4 and 7;

Explained.

SQL> @$ORACLE_HOME/rdbms/admin/utlxpls.sql

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3677731951

--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 56 | 4 (0) | 00:00:01 |
| 1 | NESTED LOOPS | | 4 | 56 | 4 (0) | 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| TEST2 | 5 | 55 | 4 (0) | 00:00:01 |
|* 3 | INDEX RANGE SCAN | SYS_C004858 | 5 | | 3 (0) | 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | SYS_C004857 | 1 | 3 | 0 (0) | 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("B"."ID">=4 AND "B"."ID"<=7)
4 - access("A"."ID"="B"."ID")
filter("A"."ID">=4 AND "A"."ID"<=7)

18 rows selected.

This execution plan looks good and really doesn't matter a lot because we got just a few records, but wait! Nine records is not the typical size of test1 table; now we will create more records:

SQL> delete from test1;

9 rows deleted.

SQL> declare
mynum number := 1;
begin
while mynum < 10000 loop
insert into test1 values(mynum,to_char(mynum));
mynum := mynum + 1;
end loop;
end;
/

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

You might think that there is not a lot of difference having ten or ten thousand records in test1, but Oracle's optimizer does not think the same way and this time the execution plan is different:

SQL> EXPLAIN PLAN FOR
2 select a.ID, b.ID_CHAR from test1 a join test2 b on (a.ID=b.ID) where a.ID between 4000 and 7000;

Explained.

SQL> @$ORACLE_HOME/rdbms/admin/utlxpls.sql

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2418654178

--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 14 | 3 (0) | 00:00:01 |
| 1 | NESTED LOOPS | | | | | |
| 2 | NESTED LOOPS | | 1 | 14 | 3 (0) | 00:00:01 |
|* 3 | INDEX RANGE SCAN | SYS_C004857 | 1 | 3 | 1 (0) | 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | SYS_C004858 | 1 | | 1 (0) | 00:00:01 |
| 5 | TABLE ACCESS BY INDEX ROWID| TEST2 | 1 | 11 | 2 (0) | 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("A"."ID">=4000 AND "A"."ID"<=7000)
4 - access("A"."ID"="B"."ID")
filter("B"."ID"<=7000 AND "B"."ID">=4000)

19 rows selected.

You might think that retrieving four records is different of retrieving four thousand and that's right, but the optimizer choose the execution path based in wrong data; look the amount of rows Oracle thinks it will work to.

You will see it clearly refreshing the test1 table statistics and getting the new execution path:

SQL> BEGIN DBMS_STATS.GATHER_TABLE_STATS( OWNNAME => 'SYS', TABNAME => 'TEST1'); END;
2 /

PL/SQL procedure successfully completed.

SQL> EXPLAIN PLAN FOR
select a.ID, b.ID_CHAR from test1 a join test2 b on (a.ID=b.ID) where a.ID between 4000 and 7000;

Explained.

SQL> @$ORACLE_HOME/rdbms/admin/utlxpls.sql

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3677731951

--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3001 | 45015 | 17 (12) | 00:00:01 |
| 1 | NESTED LOOPS | | 3001 | 45015 | 17 (12) | 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| TEST2 | 3002 | 33022 | 15 (0) | 00:00:01 |
|* 3 | INDEX RANGE SCAN | SYS_C004858 | 3002 | | 8 (0) | 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | SYS_C004857 | 1 | 4 | 0 (0) | 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("B"."ID">=4000 AND "B"."ID"<=7000)
4 - access("A"."ID"="B"."ID")
filter("A"."ID">=4000 AND "A"."ID"<=7000)

18 rows selected.

Can you see the problem? Having to deal with this kind of tables you may: use hints to force the optimal execution of the SQL sentence, refresh statistics every time you change a table like this, or lock the statistics when you have a typical table size or optimal execution path.

We will consider that at this point the test1 table has a typical size and its statistics help the optimizer to chose an optimal execution plan, therefore we will lock its statistics this way:

SQL> exec DBMS_STATS.LOCK_TABLE_STATS ('MYUSER', 'TEST1');

PL/SQL procedure successfully completed.

Now we will repopulate again the test1 table with nine records:

SQL> delete from TEST1;

9999 rows deleted.

SQL> declare
mynum number := 1;
begin
while mynum < 10 loop
insert into test1 values(mynum,to_char(mynum));
mynum := mynum + 1;
end loop;
end;
/

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

SQL> BEGIN DBMS_STATS.GATHER_TABLE_STATS( OWNNAME => 'MYUSER', TABNAME => 'TEST1'); END;
2 /
BEGIN DBMS_STATS.GATHER_TABLE_STATS( OWNNAME => 'MYUSER', TABNAME => 'TEST1'); END;
*
ERROR at line 1:
ORA-20005: object statistics are locked (stattype = ALL)
ORA-06512: at "SYS.DBMS_STATS", line 23829
ORA-06512: at "SYS.DBMS_STATS", line 23880
ORA-06512: at line 1

As long as the statistics of test1 table were locked you cannot gather new statistics unless you unlock the statistics of the table. Anyway, the point of doing this is to help the optimizer to chose a good execution plan:

SQL> EXPLAIN PLAN FOR
select a.ID, b.ID_CHAR from test1 a join test2 b on (a.ID=b.ID) where a.ID between 4 and 7;

Explained.

SQL> @$ORACLE_HOME/rdbms/admin/utlxpls.sql

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3677731951

--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 4 | 60 | 4 (0) | 00:00:01 |
| 1 | NESTED LOOPS | | 4 | 60 | 4 (0) | 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| TEST2 | 5 | 55 | 4 (0) | 00:00:01 |
|* 3 | INDEX RANGE SCAN | SYS_C004858 | 5 | | 3 (0) | 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | SYS_C004857 | 1 | 4 | 0 (0) | 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("B"."ID">=4 AND "B"."ID"<=7)
4 - access("A"."ID"="B"."ID")
filter("A"."ID"<=7 AND "A"."ID">=4)

18 rows selected.

SQL> delete from TEST1;

9 rows deleted.

SQL> declare
mynum number := 1;
begin
while mynum < 10000 loop
insert into test1 values(mynum,to_char(mynum));
mynum := mynum + 1;
end loop;
end;
/

PL/SQL procedure successfully completed.

SQL> commit;

Commit complete.

SQL> EXPLAIN PLAN FOR
select a.ID, b.ID_CHAR from test1 a join test2 b on (a.ID=b.ID) where a.ID between 4000 and 7000;

Explained.

SQL> @$ORACLE_HOME/rdbms/admin/utlxpls.sql

PLAN_TABLE_OUTPUT
----------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3677731951

--------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
--------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 3001 | 45015 | 17 (12) | 00:00:01 |
| 1 | NESTED LOOPS | | 3001 | 45015 | 17 (12) | 00:00:01 |
| 2 | TABLE ACCESS BY INDEX ROWID| TEST2 | 3002 | 33022 | 15 (0) | 00:00:01 |
|* 3 | INDEX RANGE SCAN | SYS_C004858 | 3002 | | 8 (0) | 00:00:01 |
|* 4 | INDEX UNIQUE SCAN | SYS_C004857 | 1 | 4 | 0 (0) | 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

3 - access("B"."ID">=4000 AND "B"."ID"<=7000)
4 - access("A"."ID"="B"."ID")
filter("A"."ID">=4000 AND "A"."ID"<=7000)

18 rows selected.

As you can see, this time the execution plan was the same for ten or ten thousand records in test1 table and the number of rows managed in the execution plans were more or less accurate.

By the way, if you want to unlock the statistics of a table you can do it this way:

SQL> exec DBMS_STATS.UNLOCK_TABLE_STATS ('MYUSER', 'TEST1');

PL/SQL procedure successfully completed.

SQL> BEGIN DBMS_STATS.GATHER_TABLE_STATS( OWNNAME => 'MYUSER', TABNAME => 'TEST1'); END;
2 /

PL/SQL procedure successfully completed.


More information:

Locking or Unlocking Statistics
Preserving Statistics using DBMS_STATS.LOCK_TABLE_STATS

No comments:

Post a Comment