My challenge: replicate changes from database A to database B, once per day. The tables involved are pretty big (100M records). I really would not want to process them all into database B. However, the tables do not have a LAST_DATE_MODIFIED column that has a timestamp that is updated with the current timestamp whenever the record is changed (and when it is first created). Fortunately, we can work with a pseudo LAST_MODIFIED timestamp in Oracle Databases, based on a mechanism called Flashback Versions Query. And before you start wondering: is that available in my version or edition of the database or is that something I have to buy as extra option, I can reassure you: every Oracle Database starting with 10g has this feature and it does not cost you anything to use it. In fact, the underlying mechanism is what allows you to run read consistent queries – something that has been available in Oracle from (almost) the very beginning.
In this article I will tell you how you can get a pseudo last_modified timestamp for every record in every table. There are a few caveats to bear in mind:
- the flashback version query uses the UNDO data created by Oracle whenever a transaction is committed; we can only look back in time as far as the available UNDO data allows. This may mean that for rows that have not been updated recently, we do not get a result. Because I am looking for recently changed rows, this is acceptable for me because the UNDO data allows we to look back in time far enough to cover what I call recently
- the value we get for the pseudo last_modified timestamp is accurate only to about 3 seconds; this is good enough for my use case
- the value we get for the pseudo last_modified timestamp for a row is actually the timestamp for the most recent change to any row in the data block that contains the row; this means that the timestamp we get is the upper limit: the record has not been updated more recently than the value we get – but it could have been updated a lot less recently; this means we typically will get more rows reported as recently changed than in fact were recently changed; for my use case, the reduction in the number of records I have to process is still huge, so this overshooting does not worry me much.
The short cut to getting a last_date_modified timestamp for records queried from a table is by extending the query to a flashback versions query, by adding VERSIONS BETWEEN TIMESTAMP SYSTIMESTAMP – 1 AND SYSTIMESTAMP to the table reference in the FROM clause. This extension instructs the Oracle Database to use not just the current rows in the table as the data source for the query but retrieve all versions of the records in the table that have existed in the table in the last 24 hours (SYSTIMESTAMP –1). When we run a flashback versions query, we can use the pseudo-column reference versions_starttime. This gives us the creation time for each row version produced by the query – at least: if the creation time for that (older) row version is still available in the UNDO.
Here we see the flashback versions query – over the last 15 minutes – and the resulting row versions. Most records have been updated once in the last 15 minutes. only the first one (Aart) was completely new – inserted just a few minutes ago.
To get the proper last_modified timestamp for each record – and only a single row returned for each primary key – we extend the query a little bit further:
select *
from ( select e.first_name
, e.versions_starttime
, row_number() over (partition by e.employee_id order by e.versions_starttime) rn
from employees VERSIONS BETWEEN TIMESTAMP SYSTIMESTAMP – 15/24/60 AND SYSTIMESTAMP e
)
where rn = 1;
The resulting rows have either (perhaps) been changed in the last 15 minutes (when their versions_starttime is set) or they have most definitely not been modified (when it is not).
For my use case, it becomes quite simple to rule out many records that have not been modified or created in the last 24 hours. A small number of the records that have a value for versions_starttime will not actually have been changed but live in data block that contains another row that has been changed.
Comparing surrogate with a real LAST_DATE_MODIFIED column
In a database that I can fully control, I can add a real column to track the LAST_MODIFIED column, like this:
alter table employees add (last_date_modified timestamp default systimestamp)
In order to have the column value set whenever the record is updated, I create this trigger:
CREATE OR REPLACE TRIGGER emp_upd_trg
BEFORE update ON employees
FOR EACH ROW
ENABLE
BEGIN
:new.last_date_modified:= systimestamp;
END;
With each update, column LAST_MODIFIED of type timestamp is updated.
I can now compare the value from the real column with the value from the surrogate column based on Flashback Version Query. For example:
You will find small discrepancies – up to 3 seconds – between the rea;l column and the pseudo column. The discrepancy would be much bigger for records in a block with another row more recently updated. In my test this is not the case.
For my use case, these discrepancies are not meaningful. So I can happily use the upper limit for last modified timestamp produced by the flashback versions query.
Resources
LiveSQL Script with my experiments – https://livesql.oracle.com/apex/livesql/s/nskp9owjkcjhd8fs4dykzh3bq
Oracle Base – Flashback Versions Query – https://oracle-base.com/articles/10g/flashback-version-query-10g
Introduction to Oracle System Change Number – https://www.red-gate.com/simple-talk/databases/oracle-databases/oracle-system-change-number-an-introduction/
Flashback Versions Query pseudocolumns – Oracle Database docs – https://docs.oracle.com/cd/B19306_01/server.102/b14200/pseudocolumns003.htm#i1009358