Myths on bitmap indexes
Although bitmap indexes are already available in Oracle 8i or even before (some sources even say 7.3!!!), they are still surrounded by many myths. Time for some clearing up. And believe it or not, every single myth mentioned here I heard many times in several projects.
First, I would like to point out that these are my personal experiences on specific projects and that they everyone’s experience with what is said ‘in the field’ may be different. It possibly is. And that brings me to the second point:
Secondly, in no way I claim to hold all the wisdom on this subject. I noticed that there are a lot of half-true opinions and ‘heard-of’ stories about bitmap index use (and other Oracle features, in fact). I will not pretend to tell the definitive story on Bitmap Indexes. However, my personal experiences can be useful to trigger you to do some investigation
in your specific environment as well; who knows what unexpected pleasure you
can derive from using Bitmap Indexes. At the very least my findings can stir up
some discussion with others bringing their experiences to the table.
Some people (one of them being Jonathan Lewis) pointed out that I was not entirely correct (or thorough) in my statements about bitmap indexes. He is surely right about that. So I took the liberty of rewriting part of this blog. I’m sure that it still doesn’t cover every case, so be warned. The only thing I ‘m trying to say is: check everything you read (including everything I’ve written here) or hear, on the internet, in classroom training events or during conference presentations . Test it in your own specific case, in your application on your database version. There is no general rule of thumbs. Do not make assumptions, especially not if they are based on bold statements written down on the internet.
Myth 1 – Bitmap indexes take up a lot more space than B-Tree indexes.
mydb> create table bm_test( datum date, ind1 varchar2(1));<br />Tabel is aangemaakt.
mydb> desc bm_test<br /> Naam Null? Type<br /> ----------------------------------------- -------- ----------------------------
DATUM DATE<br /> IND1 VARCHAR2(1)
mydb> create index btr1 on bm_test( ind1 );
Index is aangemaakt.
mydb> select segment_name, bytes, blocks, extents<br /> 2 from user_segments where segment_type = 'INDEX';
SEGMENT_NAME BYTES BLOCKS EXTENTS<br />--------------------------------- ---------- ----------<br />BTR1 1228800 150 30
mydb> drop index btr1;
Index is verwijderd.<br />
mydb> create bitmap index bmi1 on bm_test( ind1 );
Index is aangemaakt.
mydb> select segment_name, bytes, blocks, extents<br /> 2 from user_segments where segment_type = 'INDEX';
SEGMENT_NAME BYTES BLOCKS EXTENTS<br /><br />-------------------------- ---------- ----------<br />BMI1 40960 5 1
1 rij is geselecteerd.
There, 41 kB for the bitmap index compared to 1.2 MB for a B-tree, or 5 blocks vs 150 blocks. Keep in mind that the space occupied by bm indexes is related to many factors. In your environment, results may be entirely different.
Myth 2 – Bitmap indexes are only suitable for data warehouses
I’m still not sure what is the logic behind this thesis, because it is usually not followed by any explanation. I can imagine that for example usage flags are often queried and have extreme poor performance with b-tree indexes. Check the explain plans:
use of btree index on column with values ‘J’ and ‘N’
Execution plan<br />----------------------------------------------------------<br /> 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=19 Card=13415 Bytes=<br /> 134150)
1 0 TABLE ACCESS (FULL) OF 'BM_TEST' (TABLE) (Cost=19 Card=134<br /> 15 Bytes=134150)
Statistics<br />----------------------------------------------------------<br /> 1 recursive calls<br /> 0 db block gets<br /> 65 consistent gets<br /> 59 physical reads<br /> 0 redo size<br /> 574 bytes sent via SQL*Net to client<br /> 283 bytes received via SQL*Net from client<br /> 3 SQL*Net roundtrips to/from client<br /> 0 sorts (memory)<br /> 0 sorts (disk)<br /> 22 rows processed
use of bitmap index on same column
Execution plan<br />----------------------------------------------------------<br /> 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=24 Bytes=240)<br /> 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'BM_TEST' (TABLE) (Cost=5<br /> Card=24 Bytes=240)
2 1 BITMAP CONVERSION (TO ROWIDS)<br /> 3 2 BITMAP INDEX (SINGLE VALUE) OF 'BMI1' (INDEX (BITMAP))
Statistics<br />----------------------------------------------------------<br /> 136 recursive calls<br /> 0 db block gets<br /> 26 consistent gets<br /> 3 physical reads<br /> 0 redo size<br /> 595 bytes sent via SQL*Net to client<br /> 283 bytes received via SQL*Net from client<br /> 3 SQL*Net roundtrips to/from client<br /> 6 sorts (memory)<br /> 0 sorts (disk)<br /> 22 rows processed
Myth 3 – Bitmap indexes have slow performance on DML actions
This might me one of the origins for Myth 2. I’ve tried some updates on those 80K records, but found no proof this is a real problem. I did some timing experiments (which are not the most trustworthy, I know!), and in some cases the update on a table with bitmap was even twice as fast as with a b-tree. To be honest, I’m not too surprised with the amount of blocks used in mind. Especially look at the difference in Redo size and recursive calls:
mydb> create index btr1 on bm_tst( ind1) ;
Index is aangemaakt.
mydb> update bm_test set ind1 ='X';
80487 rijen zijn bijgewerkt.
Verstreken: 00:00:19.03
Execution plan<br />----------------------------------------------------------<br /> 0 UPDATE STATEMENT Optimizer=CHOOSE (Cost=51 Card=80487 Bytes=<br /> 160974)
1 0 UPDATE OF 'BM_TEST'<br /> 2 1 TABLE ACCESS (FULL) OF 'BM_TEST' (TABLE) (Cost=51 Card=8<br /> 0487 Bytes=160974)
Statistics<br />----------------------------------------------------------<br /> 3190 recursive calls<br /> 410208 db block gets<br /> 1319 consistent gets<br /> 365 physical reads<br /> 57035576 redo size<br /> 475 bytes sent via SQL*Net to client<br /> 445 bytes received via SQL*Net from client<br /> 4 SQL*Net roundtrips to/from client<br /> 42 sorts (memory)<br /> 0 sorts (disk)<br /> 80487 rows processed
<br />mydb> drop index btr1;
Index is verwijderd.
mydb> create bitmap index bmi1 on bm_test( ind1);
Index is aangemaakt.
mydb> update bm_test set ind1 ='Y';
80487 rijen zijn bijgewerkt.
Verstreken: 00:00:11.18
Execution plan<br />----------------------------------------------------------<br /> 0 UPDATE STATEMENT Optimizer=CHOOSE (Cost=3 Card=80487 Bytes=1<br /> 60974)
1 0 UPDATE OF 'BM_TEST'<br /> 2 1 TABLE ACCESS (BY INDEX ROWID) OF 'BM_TEST' (TABLE) (Cost<br /> =3 Card=80487 Bytes=160974)
3 2 BITMAP CONVERSION (TO ROWIDS)<br /> 4 3 BITMAP INDEX (FULL SCAN) OF 'BMI1' (INDEX (BITMAP))
Statistics<br />----------------------------------------------------------<br /> 50 recursive calls<br /> 88282 db block gets<br /> 212 consistent gets<br /> 965 physical reads<br /> 20102176 redo size<br /> 474 bytes sent via SQL*Net to client<br /> 445 bytes received via SQL*Net from client<br /> 4 SQL*Net roundtrips to/from client<br /> 2 sorts (memory)<br /> 1 sorts (disk)<br /> 80487 rows processed
There is a big catch here though, and I think that is exactly where the thesis ‘bm-indexes are slow’ comes from. When you do an update on a bitmap index, all the records in your table which have the same value will be locked untill you commit or rollback. Mind you, only the records that have the same key value. So not the entire table as is sometimes assumed. Also, this will occur only when you update the key value column, not when you update another column in the table.
What does all this say? Again, not much more than that it is crucial to know what your application does to (or with) your data, and what the and benefits and risks of every type of index is (or broader, every feature in the Oracle database). Is there a lot of DML on these low-cardinality columns? Orderstatus fields might be changed often, gender fields probably not (unless you’re writing software for a gender-change clinic). This is something few people can accurately predict beforehand.
Conclusion: test, test, test, do not make assumptions.
For those who are really advanturous, I have been working on something called FastBit. It is available for free under LGPL license. You may download the source code at http://sdm.lbl.gov/fastbit/src/. There are quite a bit of research work behind this software. In 2001, we found it to be about 12 times faster than the best commercially available compressed bitmap indexes, see http://crd.lbl.gov/%7Ekewu/ps/LBNL-48975.html and http://crd.lbl.gov/%7Ekewu/ps/LBNL-49627.html. There are also extensive theoretical analysis of the method as well, see http://crd.lbl.gov/%7Ekewu/ps/LBNL-49626.html. You can also see a small number of applications at http://sdm.lbl.gov/fastbit/applications.html.