Last week, this weblog hit the 9M post reads mark. Since we started technology.amis.nl/blog in July 2004, it took us some 5 years to get to that point. And I started to wonder when we reached earlier milestones, like 100k, 1M and 5M. This took me to some SQL puzzles that before I started in anger might have been difficult to solve, but turned out to be rather straightforward – thanks to Analytical functions.

In this article a brief report on how to reconstruct history using Running Sums and a bit of LAG/LEAD analysis.

Suppose we have a table that records the number of Weblog Postreads for every day since the start of the blog:

```create table blog_statistics
( day_stamp   date
)
/

```

And for the sake of this argument, let’s have some sample data in this table – a little under 3 years worth of sample data:

```insert into blog_statistics
select sysdate - 100 + day_seq
,      50 * dbms_random.value
from   ( select level  day_seq
from   dual
connect
by     level &lt; 1000
)

```

We can find out easily on what day we first reached the 500 mark:

```select min(day_stamp)
from   ( select day_stamp
from   blog_statistics
)
/

MIN(DAY_S
---------
06-APR-09

```

Using the running sum we get with sum() over (order by), we list the total post read count from one day to the next, and the first row that crosses the 500 mark with tis running sum is the milestone date.

Finding several milestone dates in a single query – the 1000, 5000 and 10000 mark – can be done with this query:

```select min(day_stamp)
,      milestones.milestone
from   ( select day_stamp
from   blog_statistics
)
,      ( select 1000 mark
,      1    milestone
from   dual
union all
select 5000
,      2
from   dual
union all
select 10000
,      3
from   dual
) milestones
group
by     milestones.milestone
/

MIN(DAY_S  MILESTONE
--------- ----------
28-APR-09          1
22-SEP-09          2
26-APR-10          3

```

The next question I would like to have answered is: when did we achieve the first 25% of the grand total of postreads we have received until today:

```select min(day_stamp)
from   ( select day_stamp
,      sum(number_of_reads) over ( order by day_stamp asc)/ sum(number_of_reads) over () running_percentage
from   blog_statistics
)
where  running_percentage &gt; 0.25

MIN(DAY_S
---------
12-JUL-09
```

Then of course it would be nice to see the date on which we have 50 and 75% as well:

```select min(day_stamp)
,      milestones.percentage_mark
from   ( select day_stamp
,      sum(number_of_reads) over ( order by day_stamp asc)/ sum(number_of_reads) over () running_percentage
from   blog_statistics
)
,      ( select 0.25 * level percentage_mark from dual connect by level &lt;= 4 ) milestones
where  running_percentage &gt;= percentage_mark
group
by     milestones.percentage_mark
order
by     milestones.percentage_mark

MIN(DAY_S PERCENTAGE_MARK
--------- ---------------
12-JUL-09             .25
14-NOV-09              .5
21-MAR-10             .75
29-JUL-10               1

```

Finally, let’s see how the number of days needed to get from one percentage mark milestone to another evolves over time. Is the period needed shortening or does it increase?

Using the LAG function, it is easy to find the day stamp for the next milestone. Therefore, it is easy to calculate the numbers of days between two consecutive milestones.

```select day_stamp
,      day_stamp - lag(day_stamp) over (order by day_stamp asc)  days
from   (
select min(day_stamp) day_stamp
,      milestones.percentage_mark
from   ( select day_stamp
,      sum(number_of_reads) over ( order by day_stamp asc)/ sum(number_of_reads) over () running_percentage
from   blog_statistics
)
,      ( select 0.25 * level percentage_mark from dual connect by level &lt;= 4 ) milestones
where  running_percentage &gt;= percentage_mark
group
by     milestones.percentage_mark
order
by     milestones.percentage_mark
)
/

DAY_STAMP       DAYS
--------- ----------
12-JUL-09
14-NOV-09        125
21-MAR-10        127
29-JUL-10        130
```

From very limited, randomly generated data, we seem to spot a pattern that makes the periods to get from one milestone to the next is increasing, or so it seems.

Trying with 10 milestones:

```select day_stamp
,      day_stamp - lag(day_stamp) over (order by day_stamp asc)  days
from   (
select min(day_stamp) day_stamp
,      milestones.percentage_mark
from   ( select day_stamp
,      sum(number_of_reads) over ( order by day_stamp asc)/ sum(number_of_reads) over () running_percentage
from   blog_statistics
)
,      ( select 0.1 * level percentage_mark from dual connect by level &lt;= 10 ) milestones
where  running_percentage &gt;= percentage_mark
group
by     milestones.percentage_mark
order
by     milestones.percentage_mark
)
/

DAY_STAMP       DAYS
--------- ----------
04-MAY-09
20-JUN-09         47
04-AUG-09         45
17-SEP-09         44
14-NOV-09         58
01-JAN-10         48
18-FEB-10         48
13-APR-10         54
04-JUN-10         52
29-JUL-10         55
```

It makes you wonder how random the results from dbms_random are exactly.

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director for Fusion Middleware. Consultant, trainer and instructor on diverse areas including Oracle Database (SQL & PL/SQL), Service Oriented Architecture, BPM, ADF, JavaScript, Java in various shapes and forms and many other things. Author of the Oracle Press books: Oracle SOA Suite 11g Handbook and Oracle SOA Suite 12c Handbook. Frequent presenter on conferences such as JavaOne and Oracle OpenWorld. Presenter for Oracle University Celebrity specials.