Death of the End Date - How LAG and LEAD help fight redundancy 20188367001

Death of the End Date – How LAG and LEAD help fight redundancy

Yesterday just prior to our AMIS Query on ADF Faces I was talking to Toon Koppelaars about quite something else: the end_date column in tables that contain contiguous records with BEGIN_DATE and END_DATE where there is no overlap and there are no gaps. Every BEGIN_DATE is the day after the END_DATE of the previous record and every END_DATE is the day before the BEGIN_DATE of the next record. Except of course for the first and last record. We agreed that in this situation, the END_DATE is really redundant information: it is always available from the next record. Situations where you will find this type of record include product price history, resume entries, inventory records, allocation data etc.

You could enter a debate whether END_DATE is or is not a property of the record. When you do your ER-modelling, would you include the end date as an attribute? And if you do, is that because you are implicitly taking into account the problems you may have if you have to derive the END_DATE from the BEGIN_DATE of a next record? It is a bit like building to fences between two adjacent gardens: once fence can clearly mark the border of both gardens, but somehow – lack of trust in your neighbour’s intentions? – you decide to have two coinciding fences.

Is it a problem if you have this END_DATE column where it is really only redundant or at least denormalized information?Well, you have to go through some additional steps:

  • When you insert a new record at the end, you have to update the current latest record and set is END_DATE to the BEGIN_DATE minus one of the new record
  • When you insert a new record somewhere in the middle, you have two update two existing records: change the END_DATE of the one before the new record and update the BEGIN_DATE of the record following the new record
  • If you change the BEGIN_DATE or END_DATE of an existing record, you have to update the END_DATE or BEGIN_DATE of the preceding or following record

Think of the garden fences: when you start a new garden between two existing gardens, you need move two fences and build two new ones. When you start a new garden at the end of street, you need to build two fences. If you buy piece of the garden next-doors, you have to move two fences.

Of course, for all these updates you will have to work around the mutating table challenge!

Alternatively, you can force the user to make all the correct changes by enforcing business rules that explicitly check that there are no gaps between the records – all end_date and begin_date are adjacent – and that there is no overlap between the records.

It sounds cumbersome. And it is. So why do it?

Well, the reasons most frequently given for having this END_DATE column – or this TO_POSITION because the same reasoning applies to data describing position – are:

  • It is convenient to quicly find the current record: WHERE END_DATE IS NULL
  • It is convenient to show in reports for records for which period they apply by showing BEGIN DATE and END DATE explicitly
  • To quickly find out for how long a certain record was valid: END_DATE – BEGIN_DATE

It is not hard to see that for these three cases, having the END_DATE is convenient. But not at all necessary. Let’s take a look at how Analytical Functions can help us to get these three pieces of functionality just as easily, without using END_DATE.

I have a simple example with product pricing data. Note: these are not actual figures, I have just made something up. I start out traditionally, using a column END_DATE.

create table product_prices
( product    varchar2(50)
, price      number(8,2)
, begin_date date
, end_date   date
)
/
insert into product_prices
values
( 'JDeveloper'
, 2500
, '01-JAN-01'
, '20-SEP-03'
)
/
insert into product_prices
values
( 'JDeveloper'
, 1000
, '21-SEP-03'
, '15-NOV-04'
)
/
insert into product_prices
values
( 'JDeveloper'
, 800
, '16-NOV-04'
, '15-JUL-05'
)
/
insert into product_prices
values
( 'JDeveloper'
, 0
, '16-JUL-05'
, null
)
/

insert into product_prices
values
( 'Eclipse'
, 0
, '30-JUN-01'
, null
)
/

insert into product_prices
values
( 'JBuilder'
, 5000
, '06-MAR-02'
, '30-AUG-03'
)
/
insert into product_prices
values
( 'JBuilder'
, 2500
, '31-AUG-03'
, '30-AUG-04'
)
/
insert into product_prices
values
( 'JBuilder'
, 1500
, '31-AUG-04'
, '30-JAN-05'
)
/
insert into product_prices
values
( 'JBuilder'
, 500
, '31-JAN-05'
, '28-FEB-05'
)
/
insert into product_prices
values
( 'JBuilder'
, 1500
, '01-MAR-05'
, null
)
/

Selecting the current price for all products is now quite simple:

select product
,      price
from   product_prices
where  end_date is null
/
PRODUCT                                                 PRICE
-------------------------------------------------- ----------
JDeveloper                                                  0
Eclipse                                                     0
JBuilder                                                 1500

Finding out for every price record from when until when it was valid is also very simple:

select product
,      price
,      begin_date
,      end_date
,      trunc(nvl(end_date,sysdate) - begin_date) days
from   product_prices
order
by     product
,      begin_date
/
PRODUCT                                                 PRICE BEGIN_DAT END_DATE        DAYS
-------------------------------------------------- ---------- --------- --------- ----------
Eclipse                                                     0 30-JUN-01                 1553
JBuilder                                                 5000 06-MAR-02 30-AUG-03        542
JBuilder                                                 2500 31-AUG-03 30-AUG-04        365
JBuilder                                                 1500 31-AUG-04 30-JAN-05        152
JBuilder                                                  500 31-JAN-05 28-FEB-05         28
JBuilder                                                 1500 01-MAR-05                  213
JDeveloper                                               2500 01-JAN-01 20-SEP-03        992
JDeveloper                                               1000 21-SEP-03 15-NOV-04        421
JDeveloper                                                800 16-NOV-04 15-JUL-05        241
JDeveloper                                                  0 16-JUL-05                   76

So it would seem that going through the trouble of having the additional column and the triggers to deal correctly with inserts and updates really pays off. Or does it? Using Analytical Functions, we can forget about the END_DATE column and retrieve the same information almost as easily. First, to find the current price record for each product – there are several ways to go about it, I only show three of them:

alter table product_prices drop column end_date
/
select distinct
       product
,      first_value( price) over (partition by product
                                 order by begin_date desc
                               ) current_price
from   product_prices
/
PRODUCT                                            CURRENT_PRICE
-------------------------------------------------- -------------
Eclipse                                                        0
JBuilder                                                    1500
JDeveloper                                                     0

Or:

select product
,      price
from   ( select product
         ,      price
         ,      row_number() over ( partition by product
                                    order by begin_date desc
                                  ) rn
         from   product_prices
       )
where  rn = 1
/

or:

select product
,      price
from   ( select product
         ,      price
         ,      lead(begin_date-1) over ( partition by product
                                 order by begin_date
                                ) end_date
         from   product_prices
       )
where  end_date is null
/

Now to find the period for which each price record is applicable:

select product
,      price
,      begin_date
,      lead(begin_date-1) over ( partition by product
                                 order by begin_date
                                ) end_date
from   product_prices
order
by     product
,      begin_date
/
PRODUCT                                                 PRICE BEGIN_DAT END_DATE
-------------------------------------------------- ---------- --------- ---------
Eclipse                                                     0 30-JUN-01
JBuilder                                                 5000 06-MAR-02 30-AUG-03
JBuilder                                                 2500 31-AUG-03 30-AUG-04
JBuilder                                                 1500 31-AUG-04 30-JAN-05
JBuilder                                                  500 31-JAN-05 28-FEB-05
JBuilder                                                 1500 01-MAR-05
JDeveloper                                               2500 01-JAN-01 20-SEP-03
JDeveloper                                               1000 21-SEP-03 15-NOV-04
JDeveloper                                                800 16-NOV-04 15-JUL-05
JDeveloper                                                  0 16-JUL-05

Now it would seem that with only a few simple enhancements to the queries, we can achieve exactly results without the END_DATE as we did with the END_DATE. And we can forget the complexity of the business rules or the triggers to deal with new or updated records as well as the redundant column. One fence should be enough!

4 Comments

  1. Garri January 20, 2006
  2. Sreenivasarao December 2, 2005
  3. Toon Koppelaars October 3, 2005
  4. Jeff Moss September 30, 2005