Although bitmap indexes are already available in Oracle 8i or even before (some sources even say 7.3!!!), they are still surrounded by many myths. Time for some clearing up. And believe it or not, every single myth mentioned here I heard many times in several projects.
First, I would like to point out that these are my personal experiences on specific projects and that they everyone’s experience with what is said ‘in the field’ may be different. It possibly is. And that brings me to the second point:
Secondly, in no way I claim to hold all the wisdom on this subject. I noticed that there are a lot of half-true opinions and ‘heard-of’ stories about bitmap index use (and other Oracle features, in fact). I will not pretend to tell the definitive story on Bitmap Indexes. However, my personal experiences can be useful to trigger you to do some investigation
in your specific environment as well; who knows what unexpected pleasure you
can derive from using Bitmap Indexes. At the very least my findings can stir up
some discussion with others bringing their experiences to the table.
Some people (one of them being Jonathan Lewis) pointed out that I was not entirely correct (or thorough) in my statements about bitmap indexes. He is surely right about that. So I took the liberty of rewriting part of this blog. I’m sure that it still doesn’t cover every case, so be warned. The only thing I ‘m trying to say is: check everything you read (including everything I’ve written here) or hear, on the internet, in classroom training events or during conference presentations . Test it in your own specific case, in your application on your database version. There is no general rule of thumbs. Do not make assumptions, especially not if they are based on bold statements written down on the internet.
Myth 1 – Bitmap indexes take up a lot more space than B-Tree indexes.
mydb> create table bm_test( datum date, ind1 varchar2(1)); Tabel is aangemaakt.
mydb> desc bm_test Naam Null? Type ----------------------------------------- -------- ----------------------------
DATUM DATE IND1 VARCHAR2(1)
mydb> create index btr1 on bm_test( ind1 );
Index is aangemaakt.
mydb> select segment_name, bytes, blocks, extents 2 from user_segments where segment_type = 'INDEX';
SEGMENT_NAME BYTES BLOCKS EXTENTS --------------------------------- ---------- ---------- BTR1 1228800 150 30
mydb> drop index btr1;
Index is verwijderd.
mydb> create bitmap index bmi1 on bm_test( ind1 );
Index is aangemaakt.
mydb> select segment_name, bytes, blocks, extents 2 from user_segments where segment_type = 'INDEX';
SEGMENT_NAME BYTES BLOCKS EXTENTS -------------------------- ---------- ---------- BMI1 40960 5 1
1 rij is geselecteerd.
There, 41 kB for the bitmap index compared to 1.2 MB for a B-tree, or 5 blocks vs 150 blocks. Keep in mind that the space occupied by bm indexes is related to many factors. In your environment, results may be entirely different.
Myth 2 – Bitmap indexes are only suitable for data warehouses
I’m still not sure what is the logic behind this thesis, because it is usually not followed by any explanation. I can imagine that for example usage flags are often queried and have extreme poor performance with b-tree indexes. Check the explain plans:
use of btree index on column with values ‘J’ and ‘N’
Execution plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=19 Card=13415 Bytes= 134150)
1 0 TABLE ACCESS (FULL) OF 'BM_TEST' (TABLE) (Cost=19 Card=134 15 Bytes=134150)
Statistics ---------------------------------------------------------- 1 recursive calls 0 db block gets 65 consistent gets 59 physical reads 0 redo size 574 bytes sent via SQL*Net to client 283 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 0 sorts (memory) 0 sorts (disk) 22 rows processed
use of bitmap index on same column
Execution plan ---------------------------------------------------------- 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=5 Card=24 Bytes=240) 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'BM_TEST' (TABLE) (Cost=5 Card=24 Bytes=240)
2 1 BITMAP CONVERSION (TO ROWIDS) 3 2 BITMAP INDEX (SINGLE VALUE) OF 'BMI1' (INDEX (BITMAP))
Statistics ---------------------------------------------------------- 136 recursive calls 0 db block gets 26 consistent gets 3 physical reads 0 redo size 595 bytes sent via SQL*Net to client 283 bytes received via SQL*Net from client 3 SQL*Net roundtrips to/from client 6 sorts (memory) 0 sorts (disk) 22 rows processed
Myth 3 – Bitmap indexes have slow performance on DML actions
This might me one of the origins for Myth 2. I’ve tried some updates on those 80K records, but found no proof this is a real problem. I did some timing experiments (which are not the most trustworthy, I know!), and in some cases the update on a table with bitmap was even twice as fast as with a b-tree. To be honest, I’m not too surprised with the amount of blocks used in mind. Especially look at the difference in Redo size and recursive calls:
mydb> create index btr1 on bm_tst( ind1) ;
Index is aangemaakt.
mydb> update bm_test set ind1 ='X';
80487 rijen zijn bijgewerkt.
Verstreken: 00:00:19.03
Execution plan ---------------------------------------------------------- 0 UPDATE STATEMENT Optimizer=CHOOSE (Cost=51 Card=80487 Bytes= 160974)
1 0 UPDATE OF 'BM_TEST' 2 1 TABLE ACCESS (FULL) OF 'BM_TEST' (TABLE) (Cost=51 Card=8 0487 Bytes=160974)
Statistics ---------------------------------------------------------- 3190 recursive calls 410208 db block gets 1319 consistent gets 365 physical reads 57035576 redo size 475 bytes sent via SQL*Net to client 445 bytes received via SQL*Net from client 4 SQL*Net roundtrips to/from client 42 sorts (memory) 0 sorts (disk) 80487 rows processed
mydb> drop index btr1;
Index is verwijderd.
mydb> create bitmap index bmi1 on bm_test( ind1);
Index is aangemaakt.
mydb> update bm_test set ind1 ='Y';
80487 rijen zijn bijgewerkt.
Verstreken: 00:00:11.18
Execution plan ---------------------------------------------------------- 0 UPDATE STATEMENT Optimizer=CHOOSE (Cost=3 Card=80487 Bytes=1 60974)
1 0 UPDATE OF 'BM_TEST' 2 1 TABLE ACCESS (BY INDEX ROWID) OF 'BM_TEST' (TABLE) (Cost =3 Card=80487 Bytes=160974)
3 2 BITMAP CONVERSION (TO ROWIDS) 4 3 BITMAP INDEX (FULL SCAN) OF 'BMI1' (INDEX (BITMAP))
Statistics ---------------------------------------------------------- 50 recursive calls 88282 db block gets 212 consistent gets 965 physical reads 20102176 redo size 474 bytes sent via SQL*Net to client 445 bytes received via SQL*Net from client 4 SQL*Net roundtrips to/from client 2 sorts (memory) 1 sorts (disk) 80487 rows processed
There is a big catch here though, and I think that is exactly where the thesis ‘bm-indexes are slow’ comes from. When you do an update on a bitmap index, all the records in your table which have the same value will be locked untill you commit or rollback. Mind you, only the records that have the same key value. So not the entire table as is sometimes assumed. Also, this will occur only when you update the key value column, not when you update another column in the table.
What does all this say? Again, not much more than that it is crucial to know what your application does to (or with) your data, and what the and benefits and risks of every type of index is (or broader, every feature in the Oracle database). Is there a lot of DML on these low-cardinality columns? Orderstatus fields might be changed often, gender fields probably not (unless you’re writing software for a gender-change clinic). This is something few people can accurately predict beforehand.
Conclusion: test, test, test, do not make assumptions.
For those who are really advanturous, I have been working on something called FastBit. It is available for free under LGPL license. You may download the source code at http://sdm.lbl.gov/fastbit/src/. There are quite a bit of research work behind this software. In 2001, we found it to be about 12 times faster than the best commercially available compressed bitmap indexes, see http://crd.lbl.gov/%7Ekewu/ps/LBNL-48975.html and http://crd.lbl.gov/%7Ekewu/ps/LBNL-49627.html. There are also extensive theoretical analysis of the method as well, see http://crd.lbl.gov/%7Ekewu/ps/LBNL-49626.html. You can also see a small number of applications at http://sdm.lbl.gov/fastbit/applications.html.
I have a data warehouse, which is bulk loaded on a nightly basis. When doing the INSERTs into a table with multiple BITMAP indexes, performance slows to a crawl. However, disabling and then re-creating all of the bitmap indexes would take even longer than waiting for this slow INSERT performance. Any hints/ideas here? Thanks from Robert H. Goretsky of Hoboken, NJ
There are a lot of interest ideas in the academic research publications
, does anyone have any idea where oracle is going to adopt some of them?
After I encountered a bitmap index of BLEVEL 12 (see http://asktom.oracle.com/pls/ask/f?p=4950:8:::::F4950_P8_DISPLAYID:1208435961077#16907176305528) I am more care full with usage of BITMAP INDEX on dynamic data.
This was in 8i. But things are changing in Oracle sometimes more quickly that the Myths can react…
Jaromir D.B. Nemec
I don’t think I’ve ever heard any of these "myths" expressed in this way. However, taking them in order: 1) I have had several requests for help regarding the way in which bitmap indexes grow dramatically in size following a series of small changes – especially when commits are intermittent. Bitmap indexes BECOME large when subject to OLTP-like activity. 2) Bitmap indexes are often highly appropriate for data warehouses. Bitmap indexes are usually highly inappropriate for OLTP systems unless they are indexing read-only tables – because of the concurrency effects, and the way in which bitmap indexes can grow dramatically because during updates. 3) DML on tables with bitmap indexes is often subject to concurrency problems, and can easily run into deadlock problems because of the structure of the index entries. This can result in VERY slow processing. There are very good reasons why you should not use bitmap indexes in OLTP systems, unless the tables you index are read-only (or nearly read-only). Might I suggest that you consider taking this blog entry down, doing some more tests, and then rewriting it. Regards Jonathan Lewis
Due to the nature of bitmap indexes (the rowid mut be calculated) do we not run into heavy CPU Usage instead of I/O Usage having a lot of DML?
Greetings
Karl
Laurent: ah, that explains it. The case is a bit more specific than I thought. So when you insert in another session a value which is “locked” in the first session, the transaction waits for the other to complete. This of course causes massive waits which causes performance degradation. Interesting!
did you try my test on 10.2.0.3?
Session 1:
SQL> create table t(name varchar2(20), gender varchar2(10));
Table created.
SQL> create bitmap index i on t(gender);
Index created.
SQL> insert into t values (‘Marc’, ‘male’);
1 row created.
Session 2:
SQL> insert into t values (‘Mary’,’female’);
1 row created.
SQL> insert into t values (‘Jack’,’male’);
As you see, the fact that it is not convenient for OLTP is more than a myth!
Regards and thanks for starting the debate 😛
Edwin: I performed your test on Oracle 10.2.0.2.0 and in that version it’s not an issue anymore, I can’t reproduce the . So that proves there’s still myths!
I do not claim that everything in my blog is true, or that I did cover every possible case. On the contrary! What I do want to say is: do not dismiss any “new” feature because it doesn’t work in your particular situation or database version. Test what works best in every situation and in every new version of the database. See for yourself, test and try, and choose the best alternative.
Many examples I see here are about Oracle 8.1.7 or even 8.0.3. It is not said that those problems are fixed in newer versions, but they probably are.
I think Stewart words it very well: it are not so much myths as it are half-truths. Many of those half-truths exist and it’s up to us to find out what is true and what not. Every time again. And when I read all the comments (thank you all again!) I see that there’s still a lot more to find out for me on this subject. 🙂
We had a look at using bitmap indexes in an OLTP application, and it was a very poor fit and we abandoned it.
Firstly we hit “Myth 1” with the size of the bitmap indexes expanding MASSIVELY when lots of single row inserts/updates/deletes (or small number of rows) were done for the indexed columns. These single/few row changes are the bread-and-butter of OLTP databases.
Secondly, “Myth 2”. Its not so much the nature of the data, but the nature of the data processing. Data warehouse queries expect to deal with hundreds/thousands and even millions of rows (eg the orders submitted on a given day). OLTP queries expect to deal with single/several/dozens of rows (eg the items on a specific order). Bitmap indexes are used on RELATIVELY low cardinality columns, ie ones that may identify thousands of records out of millions, not ones that identify dozens of records out of millions.
Now many ‘OLTP’ databases will have some queries/reports that do deal with thousands of rows, and bitmap indexes may be appropriate there. However bitmaps aren’t really appropriate for the ‘bread-and-butter’ OLTP queries.
Thirdly, “Myth 3”. Most of the ‘poor performance’ comes from locking/concurrency issues and from the IO associated with Myth 1.
Function based indexes are a totally different issue and they are definately under represented.
William: what the explain plan in the second example proves, is that bitmap indexes can be used for low-cardinality (and b-trees absolutely not), so that there is indeed a use for bitmap indexes in every type of database. That sounds like an open door, but I still see a lot of systems where not a single bitmap or function-based index is used, just because of vague and half-understood reasons and assumptions like the ones I mentioned. Good point about the use of combined multiple bitmap-indexes, I’ve never tried that before to be honest.
Anyway, the discussion is flaring up and that’s a good thing to start with. We can all learn from that!
Seems to me that a OLTP systeem has a lot of concurrent users vs a Datawarehouse systeem with a small (one?) number of users.
Your tests are one user only.
Do this:
create table edwin ( kol number(1) );
create bitmap index ed_1 on edwin (kol);
insert into edwin values (1);
insert into edwin values (2);
insert into edwin values (3);
commit;
update edwin set kol=3 where kol = 1; –> do not commit;
Open another session en try
update edwin set kol=4 where kol = 2;
The second session will be locked out by the first session.
(Bitmap locks multiple records with an update on one record).
That’s for me reason enough to be very carefull with the use of
bitmaps in OLTP environments.
(tested on 8.1.7.4.1)
Another myth : bitmap is only good for gender and yes/no columns. You can have a good bitmap index on a column with 100’000 different values, in case the table has 100’000’000 rows.
I almost never saw a function based bitmap index nor a join bitmap index, but they could be useful too.
Again about myth 1, a bitmap index is usually much smaller than a btree, the example I gave is just a special case
Apologies:
> However #2 as states is about their unsuitability for OLTP in terms “the nature of the dataâ€
should have been
> However #2 is about their unsuitability for OLTP in terms of “the nature of the dataâ€
In your Myth 2 test, is that a comparison of a bitmap index acceess vs a full table scan? I’m not sure what that proves, other than that indexes can improve performance. Perhaps if a bitmap index were faster than a btree index it might support an argument that bitmap indexes are underrated. However #2 as states is about their unsuitability for OLTP in terms “the nature of the data”, and I’m not sure what test it would take to disprove that.
The myth I hear most often and don’t see listed above is “a column with a small number of distinct values is a good candidate for a bitmap index”, which is only half true. I think multiple low-cardinality columns that are queried in combination may be candidates for multiple bitmap indexes, but one on its own is not. Of course that could be a myth too 😉
On “Myth 3 – Bitmap indexes have slow performance on DML actions”; I think you need to create a stress to simulate an OLTP environment. Updates to bitmapped columns, and general insertion/deletion of data can cause serious lock contention problems and can degrade the quality of the indexes quite dramatically.
Concurrency is very important in OLTP environments. So I believe his subject can not be explained by a responce time analysis.
On “Myth 1 – Bitmap indexes take up a lot more space than B-Tree indexes.”; The size of the bitmap index varies dramatically with the distribution of the data.
Check these resources please for further details;
Understanding Bitmap indexes
http://www.jlcomp.demon.co.uk/03_bitmap_1i.html
http://jonathanlewis.wordpress.com/2006/11/29/bitmap-indexes/
Oracle® Database Data Warehousing Guide 10g Release 2 (10.2)
http://download-uk.oracle.com/docs/cd/B19306_01/server.102/b14223/indexes.htm#i1004902
Best regards,
Tonguc
Laurent: wasn’t sure about your point… I think you were agreeing with me. That was the point I (hopefully) was making: that DML statements affect the size of bitmaps, but inherently they are smaller than b-trees.
about myth 1, please have a look at Note 161113.1
There are a lot of myths about bitmap indexes… but these are not the ones I would focus on. The real myth is that they are only good for low-cardinality columns… which most people translate to mean “Male/Female” or “Y/N”. They are the best choice for low cardinality, but they also outperform b-trees in a lot of not-so-low cardinality situations when the amount of data is very high. They would never be very useful for extremely high cardinality situations, though. Also, bitmaps are a requirement for certain data warehouse features such as star transformation.
The “myth” about the size has a relationship to the “myth” about DML… because they are not really myths… but instead half-truths. Bitmaps grow disproportinately large when compared to b-trees when DML is executed against them. Because of the structure of the bitmap, insert/updates cause them to continue to grow with no limit, where b-trees tend to find a size/fit that maintains relatively constant. Also, as DML is performed against a bitmap, their selectivity (and performance) begins to drop. I would imagine that the bitmap index that is 100X larger (described above by Laurent) would not be so if it were dropped and recreated. Otherwise, I would be curious to see the test case, because that seems very unlikely.
Although Oracle has made strides in 10g to alleviate the DML issue (and succeeded, to a certain degree), because of their very nature, DML will always need to be avoided with bitmaps unless rebuilds are performed often. Again, this is because extreme DML against a bitmap will eventually render them useless.
In data warehouse environments, I always mark bitmaps unusable prior to the load. This is not necessarily because of the performance issues with loading (in 10g, but in 9i it was,) but more because they are less useful (and indeed much larger) after the DML is performed.
Don’t take my word for it… a simple test will tell you whether they are applicable for your situation. But be careful to include all the parameters. Don’t simple investigate their size after creation… make sure you investigate the size in a real-world situation, where updates/inserts are going on (if the do). Also, consider their performance (cost but also actual execution) prior to DML and after. I am certain you will see the difference.
The BIG problem with bitmap+oltp: if you have a table T(NAME,GENDER) and you do an INSERT(MARC,MALE), it will prevent another session from doing INSERT(LAURENT,MALE)
about myth 1, I have it once a bitmap index on an apparently appropriate column where the bitmap was 100x bigger than the table, but could not explain it 🙁
I think with the help of dbms_job you may do a load test for the update on the table with bitmap index, and tail the alert.log 🙂
With OLTP systems concurrency is very important.
Best regards.
note: in your “Myth 1” part both indexes are shown to be created b-tree.
Some questions:
1. What version is this? Bitmap index improved quite a bit in 10g over 9i
2. Try concurrent updates on bitmap indexed columns, not just single user. You mentioned OLTP, and multi-user concurrent single-row updates are the norms for OLTP.