Thursday, August 10, 2006

I am in for it now

I have been having, "heated discussions" with a few (male, read on) OCP's from a subsidiary company. Arguing over some Oracle B*tree index myths and other things they "know" about oracle. These fellows are 10g OCP, but all straight out of university and very gung ho. ADF over oracle, .NET over oracle (shudder) and no need for any business logic in the database. Very smart and knowledgeable fellows who are an asset to the company. I just keep slamming them on their database interaction work as I do the performance monitoring on their databases and they are a little sick and tired of ticket after ticket after ticket coming their way of "Bad SQL" or "Inefficient SQL" and the one that started this "Unused Indexes" and thought they would speak up. All very friendly of course.

We came to a agreement that I would put together a nice document blowing away their index theories. I will work on the rest of their theories later. When I have successfully debunked their myths, they will wear skirts,bobby socks, a blouse and makeup to work for a period of no less than 4 straight hours covering at minimum of two of the following - Morning arrival,Morning coffee break, lunch, afternoon coffee break or 5pm departure. If I fail, I will have to wear the same outfit. We have manager approval and backing for this and odds are it will end up in the company newsletter. Not entirely fair as they have 9-12 people at their location. I have 200+ people in this office, but I think I have the legs to pull it off :).

I personally believe these to be myths, well one because Thomas Kyte says so, and two I did some testing and checked it out in the past.

The following are the 3 index myths we decided on, well statements I told them were wrong and they argued the point. They are going to put together tests to prove me wrong. In a event of a tie or unbreakable test cases, a simple toss of a coin will decide our fates. Best out of 3 wins.

Myth #1

You must use a index, full tablescans are always less efficient and slower.

Myth #2

If you index each individual column the optimizer will use multiple indexes in a query if the predicate has any combination of the columns, so build a index on every column.

Myth #3

You must build a calculated column to properly index a date field. So to store a date, the best way to do it is with a VARCHAR field to improve DML to the table.

Over the next few days, I will be putting together the tests and posting the results. Since we never spoke about the ability to look for outside corroboration or help, I thought I would do the tests, and then run them by my blog to see if any readers can find fault in them. Due date for the tests is the end of the month. Management didn't want us using too much company time for this childishness. I know I have 2 and 3 in the bag.. #1 is going to be hard to prove, I think in one of Tom's books he does something similar. I will be reading those chapters over again tonight.

Parameters are Oracle 10gR2 on windows (shudder) with the following somewhat abbreviated init.ora (some parameters removed):


I am probably going to regret this is some way. But WTF you only live once. I think my wife is still laughing, she offered to shave my legs for me. Evil, Evil lady :)

Stay tuned, I am going to spend some time on building these tests.


Simon Kelsey said...

I don't think you have anything to worry about. If nothing else it goes to prove that an OCP certification is no substitute for a good understanding of the workings of the database...

Myth 1 says that if there is an index, you should use it. Well what if you need to return every row from the table? Are your colleagues seriously suggesting that you should use the index to locate each row? How would this result in less i/o? It wouldn't, of course. That in itself should be all that is necessary to disprove Myth 1.

There is still a widely held "belief" that there is a magic 5% threshhold where if you are selecting more than 5% of the rows from a table it suddenly becomes more beneficial (in terms of i/o) to do a FTS rather than an index lookup. In reality, it depends on the length of the row and how the rows are distributed - the clustering factor. The higher the clustering factor, the more ordered the rows are in the index. In other words, entries in a given index leaf block will be likely to point to rows in the same data blocks.

So, in an index range scan, you walk the tree to locate the first leaf block, and then you go from leaf block to leaf block (along the linked list) getting all the rowids you need. The amount of i/o you will be doing therefore depends on the likelihood that as you go from one rowid to the next, that row is physically in the same table block as the one you have just read.

This means you can deliberately set up a situation where you have 2 identical tables each with an identical index, and fill one with ordered data and one with disorganized data. You can then see that when the data is disorganized, the use of the index results in more i/o than simply performing a full table scan. This is all you need to do to prove Myth 1 wrong.

There's a good example of exactly this in Tom Kyte's Expert One On One, on page 280.

All the best!

Herod T said...

I was only concerned with #1 because I couldn't remember where I had seen the debunking of it.

But I did find it in Tom's book last night so I am confident.

Myth #1 came out of a discussion about rules of thumb that if you are accessing a certain percentage of rows you must use a index.

The myth got percolated down to what I wrote.

I am pretty sure I can have all 3 of these written up and off to the soon to be public cross dressers before the weekend.

My manager left me a note that he has ordered 2 copies of Tom's newest book for the fellows.

I will have to make sure I don't plagiarize him exactly.

Gary Myers said...

I think your biggest 'threat' to myth 1 is full index scans. If all the required columns are in an index (and at least one is not null) then you'd probably get better performance from the index scan than the tablescan.
Of course that would mean that if you want all the columns, they would all need to be in the index in which case you maybe want a index-organised table.

Joel Garry said...

Foote's demo of the threshold myth. (And some other useful info.)

And there is a difference between plagiarism and scholarly discourse - attribution! :-)

Are you sure it's punishment for them to dress up in women's clothing? :-)

Herod T said...

Yes, I found out through the "grapevine" that they are rather concerned as they went ahead to prove me wrong and have ended up proving themselves wrong.

I will have my final stuff up on the blog before Monday.

Adi said...

Oes Tsetnoc one of the ways in which we can learn seo besides Mengembalikan Jati Diri Bangsa. By participating in the Oes Tsetnoc or Mengembalikan Jati Diri Bangsa we can improve our seo skills. To find more information about Oest Tsetnoc please visit my Oes Tsetnoc pages. And to find more information about Mengembalikan Jati Diri Bangsa please visit my Mengembalikan Jati Diri Bangsa pages. Thank you So much.
Oes Tsetnoc | Semangat Mengembalikan Jati Diri Bangsa