Oracle InMemory compared to indexing image56

Oracle InMemory compared to indexing

In August 2014 Oracle released its RDBM 12.1.0.2 with a potentially useful and exiting new option: Database InMemory. Upon reading about it it became clear to me that this is a powerful option, worth examining deeper.

This blog will briefly describe what InMemory is and what it isn’t. The emphasis however is on practical examples. I didn’t have a real database at hand for some ultimate real life experience, but a local database on VirtualBox proved to be enough to show some interesting details.

 

First of all, what is Database InMemory?

I’ll tell you what it’s not: it does not mean that the database is completely loaded into RAM, thereby avoiding a lot of physical disk access. That doesn’t even come close to describing it.

Then what is it?

Using the InMemory option, specific segments of the database, like tables, materialized views, tablespaces or partitions, can be loaded in a separate part of the SGA. And they are stored in a special way known as InMemory column format.

This format is particularly well designed for column operations like the sum of all values in one column. It eliminates the need to go through all the other information in other columns that you don’t need. It is therefor good for BI-like operations. Operations that are often taken away from the OLTP database as to not disturb daily use of the database. InMemory should enable to have both OLTP and BI on the same database, thus reducing the number of databases and eliminating complex maintenance like ETL processes. As an added bonus, all queries can access all data up to the latest commit and not until last-night-when-the-ETL-job-ran.

Note that InMemory is no more than an extra representation of already existing data. It is an extra and doesn’t change anything to storage or traditional presense in SGA.

Here’s a picture that helps understanding.

 

image

 

The optimizer takes the precense of InMemory tables automatically in its considerations to make a plan. That is very important because it makes the use of it transparent: plans are automatically recalculated when a table is made InMemory.

Now for the real thing.

I’ve got a table with twelve columns and over 62 million rows. I made sure it’s in the buffer cache already.

My inmemory setting are


SQL> show parameter inmemory

NAME                                 TYPE        VALUE
------------------------------------ ----------- ------------------------------
inmemory_clause_default              string
inmemory_force                       string      DEFAULT
inmemory_max_populate_servers        integer     1
inmemory_query                       string      ENABLE
inmemory_size                        big integer 3G
inmemory_trickle_repopulate_servers_ integer     1
percent
optimizer_inmemory_aware             boolean     TRUE
SQL>

What tables are inmemory?


SQL> select owner,segment_name,populate_status from v$im_segments;

no rows selected

Elapsed: 00:00:00.00
SQL>

Nothing there.

Now we’ll do a query on one column in particular:


SQL> select sum(iata_code) from owner1.table1;

SUM(IATA_CODE)
--------------
1.2232E+13

Elapsed: 00:00:13.04
SQL>

How is this code handled?


SQL> explain plan for select sum(iata_code) from owner1.table1;

Explained.

Elapsed: 00:00:00.01
SQL> @utlxpls

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3207558960

-----------------------------------------------------------------------------
| Id  | Operation          | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |        |     1 |     2 |   140K  (1)| 00:00:06 |
|   1 |  SORT AGGREGATE    |        |     1 |     2 |            |          |
|   2 |   TABLE ACCESS FULL| TABLE1 |    62M|   119M|   140K  (1)| 00:00:06 |
-----------------------------------------------------------------------------

9 rows selected.

Elapsed: 00:00:00.02
SQL>

Now put that table inmemory and do the same queries. Mind the extra parameters ‘priority critical’. It ensures that the table will be put in inmemory immediately. Default behaviour is that the inmemory clause will only be executed the first time the table is scanned.


SQL> alter table owner1.table1 inmemory priority critical;

Table altered.

Elapsed: 00:00:00.06
SQL> select owner,segment_name,populate_status from v$im_segments;

OWNER   SEGMENT_NAME   POPULATE_STATUS
------- ------------- -----------------
OWNER1  TABLE1         COMPLETED

Elapsed: 00:00:00.02
SQL> select sum(iata_code) from owner1.table1;

SUM(IATA_CODE)
--------------
1.2232E+13

Elapsed: 00:00:00.29
SQL> explain plan for select sum(iata_code) from owner1.table1;

Explained.

Elapsed: 00:00:00.02
SQL> @utlxpls

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3207558960

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |     2 |  6140  (11)| 00:00:01 |
|   1 |  SORT AGGREGATE             |        |     1 |     2 |            |          |
|   2 |   TABLE ACCESS INMEMORY FULL| TABLE1 |    62M|   119M|  6140  (11)| 00:00:01 |
--------------------------------------------------------------------------------------

9 rows selected.

Elapsed: 00:00:00.05
SQL>

This is a spectacular improvement.  Execution time dropped from 13.4 seconds to 0.29 seconds, that’s 46 times faster. But wait: wouldn’t we achieve a similar result using an index on that column? Let’s see.


SQL> create index owner1.iata_code on owner1.table1(iata_code);

Index created.

Elapsed: 00:00:28.17
SQL> select sum(iata_code) from owner1.table1;

SUM(IATA_CODE)
--------------
1.2232E+13

Elapsed: 00:00:00.23
SQL> explain plan for select sum(iata_code) from owner1.table1;

Explained.

Elapsed: 00:00:00.00
SQL> @utlxpls

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2487845580

-----------------------------------------------------------------------------------
| Id  | Operation             | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT      |           |     1 |     2 |  1260   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE       |           |     1 |     2 |            |          |
|   2 |   INDEX FAST FULL SCAN| IATA_CODE |    62M|   119M|  1260   (1)| 00:00:01 |
-----------------------------------------------------------------------------------

9 rows selected.

Elapsed: 00:00:00.01
SQL>

Hmm, pity. The good old index beat the brand new inmemory. But wait. Maybe this was too simple. So, let’s make it a bit more complicated.


SQL> explain plan for select sum(iata_code) from owner1.table1 where code_role='HD';

Explained.

Elapsed: 00:00:00.01
SQL> @utlxpls

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3207558960

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |     6 |  6153  (12)| 00:00:01 |
|   1 |  SORT AGGREGATE             |        |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS INMEMORY FULL| TABLE1 |  4177K|    23M|  6153  (12)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------

2 - inmemory("CODE_ROLE"='HD')
filter("CODE_ROLE"='HD')

15 rows selected.

Elapsed: 00:00:00.01
SQL>

So we went back to the InMemory table access. But that’s not fair, there’s no index on code_role. So let’s create one and see what happens.


SQL> create index owner1.code_role on owner1.table1(code_role);

Index created.

Elapsed: 00:01:04.81
SQL> explain plan for select sum(iata_code) from owner1.table1 where code_role='HD';

Explained.

Elapsed: 00:00:00.01
SQL> @utlxpls

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3207558960

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |     6 |  6153  (12)| 00:00:01 |
|   1 |  SORT AGGREGATE             |        |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS INMEMORY FULL| TABLE1 |  4177K|    23M|  6153  (12)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------

2 - inmemory("CODE_ROLE"='HD')
filter("CODE_ROLE"='HD')

15 rows selected.

Elapsed: 00:00:00.02
SQL>

Good. Now it uses InMemory. And that seems logical: compare it to the plan that would have been followed had we not had InMemory:


SQL> alter table owner1.table1 no inmemory;

Table altered.

Elapsed: 00:00:00.27
SQL> explain plan for select sum(iata_code) from owner1.table1 where code_role='                   HD';

Explained.

Elapsed: 00:00:00.02
SQL> @utlxpls

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 1157679052

--------------------------------------------------------------------------------------------
| Id  | Operation               | Name             | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |                  |     1 |     6 | 89823   (1)| 00:00:04 |
|   1 |  SORT AGGREGATE         |                  |     1 |     6 |            |          |
|*  2 |   VIEW                  | index$_join$_001 |  4177K|    23M| 89823   (1)| 00:00:04 |
|*  3 |    HASH JOIN            |                  |       |       |            |          |
|*  4 |     INDEX RANGE SCAN    | CODE_ROLE        |  4177K|    23M|  8561   (1)| 00:00:01 |
|   5 |     INDEX FAST FULL SCAN| IATA_CODE        |  4177K|    23M|  5785   (1)| 00:00:01 |

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

2 - filter("CODE_ROLE"='HD')
3 - access(ROWID=ROWID)
4 - access("CODE_ROLE"='HD')

19 rows selected.

SQL>

But it doesn’t end here: what if we had one index on both columns?


SQL> alter table owner1.table1 no inmemory;

Table altered.

SQL> drop index owner1.code_role;

Index dropped.

SQL> drop index owner1.iata_code;

Index dropped.

SQL> create index owner1.iata_code_role on owner1.table1(code_role,iata_code);

Index created.

SQL> explain plan for select sum(iata_code) from owner1.table1 where code_role='HD';

Explained.

SQL> @utlxpls

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 4094297780

------------------------------------------------------------------------------------
| Id  | Operation         | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |                |     1 |     6 |  9214   (1)| 00:00:01 |
|   1 |  SORT AGGREGATE   |                |     1 |     6 |            |          |
|*  2 |   INDEX RANGE SCAN| IATA_CODE_ROLE |  4177K|    23M|  9214   (1)| 00:00:01 |
------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------

2 - access("CODE_ROLE"='HD')

14 rows selected.

SQL>

Now put the table InMemory again. I expect the optimizer to choose the InMemory option than, based on the cost above.


SQL> alter table owner1.table1 inmemory priority critical;

Table altered.

SQL> explain plan for select sum(iata_code) from owner1.table1 where code_role='HD';

Explained.

SQL> @utlxpls

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 3207558960

--------------------------------------------------------------------------------------
| Id  | Operation                   | Name   | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |        |     1 |     6 |  6153  (12)| 00:00:01 |
|   1 |  SORT AGGREGATE             |        |     1 |     6 |            |          |
|*  2 |   TABLE ACCESS INMEMORY FULL| TABLE1 |  4177K|    23M|  6153  (12)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):

PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------
---------------------------------------------------

2 - inmemory("CODE_ROLE"='HD')
filter("CODE_ROLE"='HD')

15 rows selected.

SQL>

We’ve seen that InMemory can indeed speed up selects, albeit not magnitudes faster than with indexes. I wonder what the impact will be on OLTP. After all, this columnar presentation needs to be maintained with every OLTP on the table. However, indexes also need to be maintained and that involves disk IO, both for the index itself as for undo and for redo and archiving.

I tried to make some more examples but it turned out that my database and environment are too limited. I even got contradictionary results.

And of course, a table has to be in memory only once to serve many purposes. For indexes, it’s not uncommon to have many of them on one table, serving different purposes. I’ve seen large databases where the indexes actually took up more space then the data.

 

I might make a part 2 of this blog some day. For now I leave it to you, readers, to gather more results. I think this option really has a lot of potential and I’m eager to read some results from the real world.