Finding the longest streak using SQL Analytical Functions

Some IT organizations that I visit have a sign somewhere indicating the number of days that have passed without a major issue. Production sites with proper respect for health and safety to something similar with signs like this one:

Image

In this article, I am wondering about two things:

  • given a list of all incidents, can I easily derive the longest streak of days without accident
  • and the reverse: given a list of events, can I easily find the longest consecutive string of days on which that particular type of event happened (this time not accidents but more positive things like ‘the boss giving a compliment’, ‘an inspection passed’ , ‘a shiny new blog article published’, ‘my neighbour not drinking an alcoholic beverage’ or ‘my son getting up on time’

Using Analytical Functions, it should be fairly easy to find out both things, I suspect – although it turns out that the first is a little more straightforward.

First of all, finding the longest period without incidents given the list of incidents is straightforward. And note that this is the same query as ‘the longest time between two employees hired (in EMP)’ .

Given the following Summer Incidents Registry:

Image

What is the longest incident free period? Using the LEAD function, this question is easily answered. From each incident, we look at the next one (using LEAD) and calculate the incident free period as the difference between the two dates. Longest period or the largest date difference is what we are looking for:

select *
from   ( select date_of_occurrence "Start of incident free period"
         ,      lead(date_of_occurrence) over (order by date_of_occurrence )  - date_of_occurrence "Length of period"
         from   incident_registry
         order
         by     2 desc nulls last
       )
where  rownum = 1

The result:

Image

Find the longest streak – consecutive period of events happening

The second challenge seems very similar. But while it is similar, it is slightly more complex as well. We need to look at a list of events – or dates on which events occurred – and find the longest period in which the event happened at least once on every day. Take for example this data:

Image

You may be able to very quickly spot the longest streak.

But now this data is stored in a table in the database and we need the SQL engine to retrieve the longest streak. Let’s see how that might work.

The core of the query retrieves all instances and for each instance the number of days since the previous instance as well as the number of days until the next instance.

select instance_date
,      instance_date - lag(instance_date) over (order by instance_date)
,      lead(instance_date) over (order by instance_date) - instance_date
from   instances

We can define a “running streak” as the set of instances starting with an instance that has 2 or more days since the previous instance and ending with an instance that has 2 or more days until the next instance.

select instance_date
                ,      1 +lead(case when before_next >1 then instance_date end) ignore nulls over (order by instance_date )
                       - instance_date streak_duration
                ,      since_previous
                ,      before_next
                from ( select instance_date
                       ,      instance_date - lag(instance_date) over (order by instance_date) since_previous
                       ,      lead(instance_date) over (order by instance_date) - instance_date before_next
                       from   instances
                     )

Image

With this second query, we find the streak duration using our definition of the beginning and ending of a running streak. Using a combination of a CASE expression – to only produce a value for a record that concludes a streak – and the IGNORE NULL predicate in the LEAD function to find the instance date for the instance that concludes the current streak:

Image

With two simple steps, we can now find the longest running streak:

select instance_date
,      streak_duration
from   ( select *
         from ( select instance_date
                ,      1 +lead(case when before_next >1 then instance_date end) ignore nulls over (order by instance_date )
                       - instance_date streak_duration
                ,      since_previous
                ,      before_next
                from ( select instance_date
                       ,      instance_date - lag(instance_date) over (order by instance_date) since_previous
                       ,      lead(instance_date) over (order by instance_date) - instance_date before_next
                       from   instances
                     )
              )
         where  nvl(since_previous,2) > 1
         and    before_next < 2
         order
         by     streak_duration desc
       )
where  rownum = 1

Image

which I am sure you had identified from the calendar view above long time ago.

Resources

Download scripts for this article: streaks.

One Response

  1. Rob van Wijk July 5, 2011