# Reaching Milestones – some Analytical SQL-etics

Lucas Jellema

Last week, this weblog hit the 9M post reads mark. Since we started technology.amis.nl/blog in July 2004, it took us some 5 years to get to that point. And I started to wonder when we reached earlier milestones, like 100k, 1M and 5M. This took me to some SQL puzzles that before I started in anger might have been difficult to solve, but turned out to be rather straightforward – thanks to Analytical functions.

In this article a brief report on how to reconstruct history using Running Sums and a bit of LAG/LEAD analysis.

Suppose we have a table that records the number of Weblog Postreads for every day since the start of the blog:

```create table blog_statistics
( day_stamp   date
)
/

```

And for the sake of this argument, let’s have some sample data in this table – a little under 3 years worth of sample data:

```insert into blog_statistics
select sysdate - 100 + day_seq
,      50 * dbms_random.value
from   ( select level  day_seq
from   dual
connect
by     level < 1000
)

```

We can find out easily on what day we first reached the 500 mark:

```select min(day_stamp)
from   ( select day_stamp
from   blog_statistics
)
/

MIN(DAY_S
---------
06-APR-09

```

Using the running sum we get with sum() over (order by), we list the total post read count from one day to the next, and the first row that crosses the 500 mark with tis running sum is the milestone date.

Finding several milestone dates in a single query – the 1000, 5000 and 10000 mark – can be done with this query:

```select min(day_stamp)
,      milestones.milestone
from   ( select day_stamp
from   blog_statistics
)
,      ( select 1000 mark
,      1    milestone
from   dual
union all
select 5000
,      2
from   dual
union all
select 10000
,      3
from   dual
) milestones
group
by     milestones.milestone
/

MIN(DAY_S  MILESTONE
--------- ----------
28-APR-09          1
22-SEP-09          2
26-APR-10          3

```

The next question I would like to have answered is: when did we achieve the first 25% of the grand total of postreads we have received until today:

```select min(day_stamp)
from   ( select day_stamp
,      sum(number_of_reads) over ( order by day_stamp asc)/ sum(number_of_reads) over () running_percentage
from   blog_statistics
)
where  running_percentage > 0.25

MIN(DAY_S
---------
12-JUL-09
```

Then of course it would be nice to see the date on which we have 50 and 75% as well:

```select min(day_stamp)
,      milestones.percentage_mark
from   ( select day_stamp
,      sum(number_of_reads) over ( order by day_stamp asc)/ sum(number_of_reads) over () running_percentage
from   blog_statistics
)
,      ( select 0.25 * level percentage_mark from dual connect by level <= 4 ) milestones
where  running_percentage >= percentage_mark
group
by     milestones.percentage_mark
order
by     milestones.percentage_mark

MIN(DAY_S PERCENTAGE_MARK
--------- ---------------
12-JUL-09             .25
14-NOV-09              .5
21-MAR-10             .75
29-JUL-10               1

```

Finally, let’s see how the number of days needed to get from one percentage mark milestone to another evolves over time. Is the period needed shortening or does it increase?

Using the LAG function, it is easy to find the day stamp for the next milestone. Therefore, it is easy to calculate the numbers of days between two consecutive milestones.

```select day_stamp
,      day_stamp - lag(day_stamp) over (order by day_stamp asc)  days
from   (
select min(day_stamp) day_stamp
,      milestones.percentage_mark
from   ( select day_stamp
,      sum(number_of_reads) over ( order by day_stamp asc)/ sum(number_of_reads) over () running_percentage
from   blog_statistics
)
,      ( select 0.25 * level percentage_mark from dual connect by level <= 4 ) milestones
where  running_percentage >= percentage_mark
group
by     milestones.percentage_mark
order
by     milestones.percentage_mark
)
/

DAY_STAMP       DAYS
--------- ----------
12-JUL-09
14-NOV-09        125
21-MAR-10        127
29-JUL-10        130
```

From very limited, randomly generated data, we seem to spot a pattern that makes the periods to get from one milestone to the next is increasing, or so it seems.

Trying with 10 milestones:

```select day_stamp
,      day_stamp - lag(day_stamp) over (order by day_stamp asc)  days
from   (
select min(day_stamp) day_stamp
,      milestones.percentage_mark
from   ( select day_stamp
,      sum(number_of_reads) over ( order by day_stamp asc)/ sum(number_of_reads) over () running_percentage
from   blog_statistics
)
,      ( select 0.1 * level percentage_mark from dual connect by level <= 10 ) milestones
where  running_percentage >= percentage_mark
group
by     milestones.percentage_mark
order
by     milestones.percentage_mark
)
/

DAY_STAMP       DAYS
--------- ----------
04-MAY-09
20-JUN-09         47
04-AUG-09         45
17-SEP-09         44
14-NOV-09         58
01-JAN-10         48
18-FEB-10         48
13-APR-10         54
04-JUN-10         52
29-JUL-10         55
```

It makes you wonder how random the results from dbms_random are exactly.

#### Lucas Jellema

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director and Oracle Developer Champion. Solution architect and developer on diverse areas including SQL, JavaScript, Kubernetes & Docker, Machine Learning, Java, SOA and microservices, events in various shapes and forms and many other things. Author of the Oracle Press book Oracle SOA Suite 12c Handbook. Frequent presenter on user groups and community events and conferences such as JavaOne, Oracle Code, CodeOne, NLJUG JFall and Oracle OpenWorld.
Happy
0 %
0 %
Excited
0 %
Sleepy
0 %
Angry
0 %
Surprise
0 %

5 Star
0%
4 Star
0%
3 Star
0%
2 Star
0%
1 Star
0%

## 2 thoughts on “Reaching Milestones – some Analytical SQL-etics”

1. Well spotted! I did change the 500 to 1000 in the insert statement when I was preparing the blog article, after I had created and run the SQL. It does not change a bit in the meaningful content of the article however. And thanks for the congrats.
Lucas

2. Hi Lucas,

You pasted the wrong insert statement: the query results suggest the table contains 499 rows, while the insert statement says 999.
And congratulations with reaching the 9M mark ðŸ™‚

Regards,
Rob.