Tracking the moving history of averages and other aggregates – Flashback Aggregates in Oracle SQL

You may have seen we write on Flashback functionality in the Oracle Database. It’s an area that I like – especially since the introduction of the Flashback Data Archive in Oracle 11g. As an application developer, I typically focus on Flashback Query (AS OF) and Flashback Versions query (VERSIONS BETWEEN …). The latter provides an overview of the entire history of records in a table (in so far the UNDO area or the Flashback Data Archive has the information available). The former allows us to query an entire table at a certain moment in the past.

This article looks at a combination of the two. Take the following situation – using table EMP once more as our sample set: once upon a time, all employees in Department 10 see their salary increased by 15%. The next month, all CLERKs are made happy with a $200 pay raise. Two months later, MANAGERs are cut back – salary decrease of 10%. And finally, in this brief history of time, everyone hired in 1981 is awarded 150 extra in their regular paycheck.

Using Flashback Versions query, we can find out for every employee what the various salary levels are that they have been at. And we could create a chart with the line of salary levels plotted against the time. However, that does not give us the evolution of the average salary (or any other old aggregate) over all employee records versus time. We cannot meaningfully calculate aggregate values against EMP VERSIONS – except for the minimum and maximum values that have occurred. Using Flashback Query (EMP AS OF TIMESTAMP) it is possible to calculate the average salary as it existed at one specific moment in time. That is helpful – but does not yet give us the values we need to track the evolution of the average salary over time.

I have not been able to find a SQL-only solution to this challenge. However, with a bit of PL/SQL – a Table Function – it is fairly easy to construct a simple query that returns all values that have existed throughout history for a certain aggregate – like the average salary. Here is how that goes.

This first query – points_in_time – returns all the distinct timestamp values from the EMP VERSIONS query.

with points_in_time as
( select distinct versions_starttime starttime
  from   emp versions between timestamp minvalue and maxvalue
)
select *
from   points_in_time

In other words: all moment in time that demarcate a new value for at least one Employee record. By taking all these points in time, we have all moments at which the value of the aggregate may have changed – missing none.

Image

So if we could only perform the Flashback Query (AS OF TIMESTAMP) at each of the timestamps returned by this query, we have the evolution of the average salary through time.

Well, I tried the following (and variations on this theme) – but not surprisingly this will not fly:

Image

So a trick is required. The trick involves an Abstract Data Type – num_and_time – and two Nested Table Types – num_and_time_tbl and timestamp_tbl:

create type timestamp_tbl as table of timestamp

create type num_and_time
as object (number_value number, point_in_time timestamp)

create type num_and_time_tbl as table of num_and_time

Next we create a Table Function that consumes an instance of of the timestamp_tbl.

This function iterates over the timestamps in the table and performs the Flashback AS OF TIMESTAMP query to calculate the average salary as it existed at some point in time for every timestamp found in the nested table. For each average salary it calculates, the function create an instance of the num_and_time type and adds it to the num_and_time_tbl instance the function will eventually return.

create or replace
function  average_salary
( p_points_in_time timestamp_tbl
) return num_and_time_tbl
is
  l_avg_sals num_and_time_tbl:= num_and_time_tbl();
  l_index number;
begin
  l_index := p_points_in_time.first;
  loop
    exit when l_index is null;
    l_avg_sals.extend;
    select num_and_time( avg(sal), p_points_in_time(l_index))
    into   l_avg_sals( l_avg_sals.last)
    from   emp as of timestamp p_points_in_time(l_index)
    ;
    l_index:= p_points_in_time.next(l_index);
  end loop;
  return l_avg_sals;
end average_salary;

With this Table Function at our disposal, the construction of the entire query is not very hard anymore, using the MULTISET operator to turn the points in time in line view result into a timestamp_tbl instance and the TABLE operator to query the num_and_time_tbl in a relational way:

 with points_in_time as
( select distinct nvl(versions_starttime, versions_endtime) point_in_time
  from   emp versions between timestamp minvalue and maxvalue
  union
  select  systimestamp from dual
)
select round(number_value) avg_sal, point_in_time
from   table( average_salary (cast (multiset(  select * from points_in_time) as timestamp_tbl)))
order
by     point_in_time

The outcome of this query for the series of salary changes mentioned at the beginning of this article is as follows (except of course for the timestamps – this has been a pressure cooker salary awarding process):

Image

Resources

Download scripts for this article: movingHistoricalAverage.