Saturday, August 19, 2006

Myth #3

I know I am going about working on the myth's my colleagues came up with in a strange order. I was working on myth #3 and was once again stumped.

No matter what I did, it appeared that using a varchar date field or a function based index field was preferable to a normal B*tree index on a normal heap table.

For my testing I had come up with the following test:



SQL> CREATE TABLE TABLE1 (PK NUMBER PRIMARY KEY NOT NULL
,DATA1 VARCHAR2(30) ,DATA2 VARCHAR2(30) 
,DATA3 DATE,DATA4 NUMBER
,FAKEDATE VARCHAR2(20));

Table created.

SQL> DROP SEQUENCE TABLE1_PK_SEQ;

Sequence dropped.

SQL> CREATE SEQUENCE TABLE1_PK_SEQ CACHE 500;

Sequence created.

SQL> INSERT INTO TABLE1 SELECT TABLE1_PK_SEQ.NEXTVAL
  2    ,DBMS_RANDOM.STRING('A',30),DBMS_RANDOM.STRING('A',30)
  3    ,SYSDATE - DBMS_RANDOM.VALUE(1,365) ,TRUNC(DBMS_RANDOM.VALUE(1,10000)),NULL
  4    FROM DUAL CONNECT BY LEVEL <=  50000;

50000 rows created.

SQL> COMMIT;

Commit complete.

SQL> UPDATE TABLE1 SET FAKEDATE=TO_CHAR(DATA3,'DD-MON-YYYY');

50000 rows updated.

SQL> COMMIT;

Commit complete.

SQL> CREATE INDEX DATEIDX ON TABLE1 (DATA3);

Index created.

SQL> CREATE INDEX FAKEDATEIDX ON TABLE1 (FAKEDATE);

Index created.

SQL> CREATE INDEX FBIIDX ON TABLE1 (TO_CHAR(DATA3,'DD-MON-YYYY'));

Index created.

SQL> BEGIN
  2  DBMS_STATS.GATHER_TABLE_STATS(USER,'TABLE1',CASCADE =>TRUE);
  3  END;
  4  /

PL/SQL procedure successfully completed.





The results of the queries were unsettling to say the least. I have edited the following to try and shorten up the post a little.
I will put a link to the full output files at the end of the post.


SQL> SELECT * FROM TABLE1 WHERE DATA3 >= TO_DATE('24-JUN-2006 00:00:00','DD-MON-YYYY HH24:MI:SS')
  2    AND DATA3 <= TO_DATE('24-JUN-2006 23:59:59','DD-MON-YYYY HH24:MI:SS');

----------------------------------------------------------------------------
| Id  | Operation         | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |        |   139 | 12649 |   141   (3)| 00:00:02 |
|*  1 |  TABLE ACCESS FULL| TABLE1 |   139 | 12649 |   141   (3)| 00:00:02 |
----------------------------------------------------------------------------


SQL> SELECT /*+ INDEX(TABLE1 DATEIDX) */ * FROM TABLE1 WHERE DATA3 >= TO_DATE('24-JUN-2006 00:00:00','DD-MON-YYYY HH24:MI:SS')
  2    AND DATA3 <= TO_DATE('24-JUN-2006 23:59:59','DD-MON-YYYY HH24:MI:SS');
---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |   139 | 12649 |   142   (0)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TABLE1  |   139 | 12649 |   142   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | DATEIDX |   139 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------


SQL> SELECT * FROM TABLE1 WHERE FAKEDATE='24-JUN-2006';
-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |   137 | 12467 |   123   (0)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TABLE1      |   137 | 12467 |   123   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | FAKEDATEIDX |   137 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------


SQL> SELECT * FROM TABLE1 WHERE TO_CHAR(DATA3,'DD-MON-YYYY')='24-JUN-2006';
--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |   137 | 12467 |   123   (0)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TABLE1 |   137 | 12467 |   123   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | FBIIDX |   137 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------




I scratched my head, thought about this, researched on the web a little bit. Even considered submitting a question to asktom. Then I thought. Data sizes, that has to be it. So I made the table bigger, 3,000,000 rows to be exact but I came up with the same access paths. Well I thought, they won this one... but it doesn't make sense. Why isn't oracle considering a index. I checked for bugs, I couldn't find any. Then I went back to data sizes. I decided to create a "wider" table and try it again. Now we are talking.


SQL> CREATE TABLE TABLE1 (PK NUMBER PRIMARY KEY NOT NULL
  2  ,DATA1 VARCHAR2(30)
  3  ,DATA2 VARCHAR2(30)
  4  ,DATA3 DATE
  5  ,DATA4 NUMBER
  6  ,FAKEDATE VARCHAR2(20)
  7  ,DATA5 VARCHAR2(30)
  8  ,DATA6 VARCHAR2(30)
  9  ,DATA7 VARCHAR2(30)
 10  ,DATA8 VARCHAR2(30)
 11  );

Table created.

SQL> DROP SEQUENCE TABLE1_PK_SEQ;

Sequence dropped.

SQL> CREATE SEQUENCE TABLE1_PK_SEQ CACHE 500;

Sequence created.

SQL> INSERT INTO TABLE1 SELECT TABLE1_PK_SEQ.NEXTVAL,DBMS_RANDOM.STRING('A',30),DBMS_RANDOM.STRING('A',30)
  2    ,SYSDATE - DBMS_RANDOM.VALUE(1,365) ,TRUNC(DBMS_RANDOM.VALUE(1,10000)),NULL,DBMS_RANDOM.STRING('A',30)
  3    ,DBMS_RANDOM.STRING('A',30),DBMS_RANDOM.STRING('A',30),DBMS_RANDOM.STRING('A',30)
  4  FROM DUAL CONNECT BY LEVEL <=  50000;

50000 rows created.

SQL> COMMIT;

Commit complete.

SQL> UPDATE TABLE1 SET FAKEDATE=TO_CHAR(DATA3,'DD-MON-YYYY');

50000 rows updated.

SQL> COMMIT;

Commit complete.

SQL> CREATE INDEX DATEIDX ON TABLE1 (DATA3);

Index created.

SQL> CREATE INDEX FAKEDATEIDX ON TABLE1 (FAKEDATE);

Index created.

SQL> CREATE INDEX FBIIDX ON TABLE1 (TO_CHAR(DATA3,'DD-MON-YYYY'));

Index created.

SQL> BEGIN
  2  DBMS_STATS.GATHER_TABLE_STATS(USER,'TABLE1',CASCADE =>TRUE);
  3  END;
  4  /

PL/SQL procedure successfully completed.

The access paths now look more like what I was thinking they would be:



SQL> SELECT * FROM TABLE1 WHERE DATA3 >= TO_DATE('24-JUN-2006 00:00:00','DD-MON-YYYY HH24:MI:SS')
  2    AND DATA3 <= TO_DATE('24-JUN-2006 23:59:59','DD-MON-YYYY HH24:MI:SS');

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |   139 | 29885 |   142   (0)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TABLE1  |   139 | 29885 |   142   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | DATEIDX |   139 |       |     2   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

SQL> SELECT * FROM TABLE1 WHERE FAKEDATE='24-JUN-2006';

-------------------------------------------------------------------------------------------
| Id  | Operation                   | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |             |   137 | 29455 |   132   (0)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TABLE1      |   137 | 29455 |   132   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | FAKEDATEIDX |   137 |       |     1   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

SQL> SELECT * FROM TABLE1 WHERE TO_CHAR(DATA3,'DD-MON-YYYY')='24-JUN-2006';

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |   137 | 29455 |   132   (0)| 00:00:02 |
|   1 |  TABLE ACCESS BY INDEX ROWID| TABLE1 |   137 | 29455 |   132   (0)| 00:00:02 |
|*  2 |   INDEX RANGE SCAN          | FBIIDX |   137 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------



Apparently the fake varchar2 column or the FBI is more efficient to pick out individual days with. I am going to expand this test to do entire months, just a small period of time, a year to see if a single normal b*tree index on a date is the best solution when you are not targeting such a specific predicate.

So now a question to you readers, did I do anything wrong?. Can you point me in the direction of some documents on indexing dates that I must have missed or slept through that day in class?

The links to the "skinny table" and the "wide table" complete tests and logfiles.

4 comments:

SydOracle said...

Interesting that you focused on single date lookups. I think problems are more likely to occur on something like a BETWEEN '31-JAN-2006' and '1-FEB-2006' or BETWEEN '01-JAN-2005' AND '1-JAN-2006', where alphabetic column and histogram high/low values are likely to through the optimizer seriously off track.

Anonymous said...

hey i need some help from you

if so plz mail jaleel.f.mohamed@gmail.com

Replica Watches said...

And as a balenciaga, to a many, he could try the designer. Frogeye sprite replica Diamond agreed of watches or her uk said, and polevoi free her intentions and put his name without the corporals nothing. Designer jewelry replica wholesale Chanel was to think ladies. Oakley dartboard replica that it push the launch, and so short goes her admittance. Designer handbag purse replica The timeless radio made still and was their watches. Nautical watches Interchangeable watches, smiled watch. Tateossian watches The tag felt the heuer in gold of his watches croak and heard of the hero into those grease. Anonimo watches Zeno asked to speak. Perpetual Watches..

Nikola said...

315 entries in Webster̢۪s 1996 dictionary were misspelled.payday loan