Some IT organizations that I visit have a sign somewhere indicating the number of days that have passed without a major issue. Production sites with proper respect for health and safety to something similar with signs like this one:
In this article, I am wondering about two things:
- given a list of all incidents, can I easily derive the longest streak of days without accident
- and the reverse: given a list of events, can I easily find the longest consecutive string of days on which that particular type of event happened (this time not accidents but more positive things like ‘the boss giving a compliment’, ‘an inspection passed’ , ‘a shiny new blog article published’, ‘my neighbour not drinking an alcoholic beverage’ or ‘my son getting up on time’
Using Analytical Functions, it should be fairly easy to find out both things, I suspect – although it turns out that the first is a little more straightforward.
First of all, finding the longest period without incidents given the list of incidents is straightforward. And note that this is the same query as ‘the longest time between two employees hired (in EMP)’ .
Given the following Summer Incidents Registry:
What is the longest incident free period? Using the LEAD function, this question is easily answered. From each incident, we look at the next one (using LEAD) and calculate the incident free period as the difference between the two dates. Longest period or the largest date difference is what we are looking for:
select * from ( select date_of_occurrence "Start of incident free period" , lead(date_of_occurrence) over (order by date_of_occurrence ) - date_of_occurrence "Length of period" from incident_registry order by 2 desc nulls last ) where rownum = 1
Find the longest streak – consecutive period of events happening
The second challenge seems very similar. But while it is similar, it is slightly more complex as well. We need to look at a list of events – or dates on which events occurred – and find the longest period in which the event happened at least once on every day. Take for example this data:
You may be able to very quickly spot the longest streak.
But now this data is stored in a table in the database and we need the SQL engine to retrieve the longest streak. Let’s see how that might work.
The core of the query retrieves all instances and for each instance the number of days since the previous instance as well as the number of days until the next instance.
select instance_date , instance_date - lag(instance_date) over (order by instance_date) , lead(instance_date) over (order by instance_date) - instance_date from instances
We can define a “running streak” as the set of instances starting with an instance that has 2 or more days since the previous instance and ending with an instance that has 2 or more days until the next instance.
select instance_date , 1 +lead(case when before_next >1 then instance_date end) ignore nulls over (order by instance_date ) - instance_date streak_duration , since_previous , before_next from ( select instance_date , instance_date - lag(instance_date) over (order by instance_date) since_previous , lead(instance_date) over (order by instance_date) - instance_date before_next from instances )
With this second query, we find the streak duration using our definition of the beginning and ending of a running streak. Using a combination of a CASE expression – to only produce a value for a record that concludes a streak – and the IGNORE NULL predicate in the LEAD function to find the instance date for the instance that concludes the current streak:
With two simple steps, we can now find the longest running streak:
select instance_date , streak_duration from ( select * from ( select instance_date , 1 +lead(case when before_next >1 then instance_date end) ignore nulls over (order by instance_date ) - instance_date streak_duration , since_previous , before_next from ( select instance_date , instance_date - lag(instance_date) over (order by instance_date) since_previous , lead(instance_date) over (order by instance_date) - instance_date before_next from instances ) ) where nvl(since_previous,2) > 1 and before_next < 2 order by streak_duration desc ) where rownum = 1
which I am sure you had identified from the calendar view above long time ago.
Download scripts for this article: streaks.