<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Advanced SQL to find valid periods &#8211; juggling with outer joins, running totals and analytical functions</title>
	<atom:link href="http://technology.amis.nl/2012/12/23/advanced-sql-to-find-valid-periods-juggling-with-outer-joins-running-totals-and-analytical-functions/feed/" rel="self" type="application/rss+xml" />
	<link>http://technology.amis.nl/2012/12/23/advanced-sql-to-find-valid-periods-juggling-with-outer-joins-running-totals-and-analytical-functions/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=advanced-sql-to-find-valid-periods-juggling-with-outer-joins-running-totals-and-analytical-functions</link>
	<description></description>
	<lastBuildDate>Tue, 11 Jun 2013 22:09:58 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5.1</generator>
	<item>
		<title>By: MetteMusens</title>
		<link>http://technology.amis.nl/2012/12/23/advanced-sql-to-find-valid-periods-juggling-with-outer-joins-running-totals-and-analytical-functions/#comment-7320</link>
		<dc:creator>MetteMusens</dc:creator>
		<pubDate>Fri, 18 Jan 2013 12:51:12 +0000</pubDate>
		<guid isPermaLink="false">http://technology.amis.nl/?p=20452#comment-7320</guid>
		<description><![CDATA[Hi Lucas

This is my version of a solution.

Step 1 - split into all possible intervals
Step 2 - join in the in and outs
Step 3 - get rid of all apart from just a I
Step 4 - make adjaecent intervals into one big

WITH timeline AS
 (
 --- chop in intervals not matter if in or out
SELECT  mydate start_date, lead(mydate) OVER (ORDER BY mydate) - (1) end_date
FROM 
     (
     SELECT start_date mydate, start_date, end_date, in_or_out FROM policy_periods   
       UNION
      SELECT   end_date+(1), start_date, end_date, in_or_out FROM policy_periods  
      )
)
, intervals AS(
---- join the data in approp. intervals
SELECT policy_id, start_date,  end_date,  wm_concat(in_or_out) event_part_ids
FROM 
     (
     SELECT distinct policy_id,  t.start_date,  t.end_date,  in_or_out
       FROM    timeline t
         JOIN   policy_periods i  ON nvl(i.end_date, DATE &#039;9999-12-31&#039;)  &gt;= t.start_date  AND i.start_date &lt;= nvl(t.end_date, DATE &#039;9999-12-31&#039;)   
        order by start_date, end_date, in_or_out
      )
GROUP BY policy_id, start_date, end_date
having  wm_concat(in_or_out) = &#039;I&#039;
)
----------------
-- make complete intervals of 2 or more intervals after eacho other
select policy_id
,   min(connect_by_root start_date) as start_date
, end_date
from intervals
where connect_by_isleaf = 1  
connect by nocycle prior end_date+1=start_date and prior policy_id = policy_id
group by policy_id, end_date
order by policy_id, end_date
;

Best regards
Mette]]></description>
		<content:encoded><![CDATA[<p>Hi Lucas</p>
<p>This is my version of a solution.</p>
<p>Step 1 &#8211; split into all possible intervals<br />
Step 2 &#8211; join in the in and outs<br />
Step 3 &#8211; get rid of all apart from just a I<br />
Step 4 &#8211; make adjaecent intervals into one big</p>
<p>WITH timeline AS<br />
 (<br />
 &#8212; chop in intervals not matter if in or out<br />
SELECT  mydate start_date, lead(mydate) OVER (ORDER BY mydate) &#8211; (1) end_date<br />
FROM<br />
     (<br />
     SELECT start_date mydate, start_date, end_date, in_or_out FROM policy_periods<br />
       UNION<br />
      SELECT   end_date+(1), start_date, end_date, in_or_out FROM policy_periods<br />
      )<br />
)<br />
, intervals AS(<br />
&#8212;- join the data in approp. intervals<br />
SELECT policy_id, start_date,  end_date,  wm_concat(in_or_out) event_part_ids<br />
FROM<br />
     (<br />
     SELECT distinct policy_id,  t.start_date,  t.end_date,  in_or_out<br />
       FROM    timeline t<br />
         JOIN   policy_periods i  ON nvl(i.end_date, DATE &#8217;9999-12-31&#8242;)  &gt;= t.start_date  AND i.start_date &lt;= nvl(t.end_date, DATE &#039;9999-12-31&#039;)<br />
        order by start_date, end_date, in_or_out<br />
      )<br />
GROUP BY policy_id, start_date, end_date<br />
having  wm_concat(in_or_out) = &#039;I&#039;<br />
)<br />
&#8212;&#8212;&#8212;&#8212;&#8212;-<br />
&#8211; make complete intervals of 2 or more intervals after eacho other<br />
select policy_id<br />
,   min(connect_by_root start_date) as start_date<br />
, end_date<br />
from intervals<br />
where connect_by_isleaf = 1<br />
connect by nocycle prior end_date+1=start_date and prior policy_id = policy_id<br />
group by policy_id, end_date<br />
order by policy_id, end_date<br />
;</p>
<p>Best regards<br />
Mette</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Erik van Roon</title>
		<link>http://technology.amis.nl/2012/12/23/advanced-sql-to-find-valid-periods-juggling-with-outer-joins-running-totals-and-analytical-functions/#comment-7318</link>
		<dc:creator>Erik van Roon</dc:creator>
		<pubDate>Mon, 14 Jan 2013 17:28:19 +0000</pubDate>
		<guid isPermaLink="false">http://technology.amis.nl/?p=20452#comment-7318</guid>
		<description><![CDATA[Lucas,

You said &quot;I would be interested in learning about yours&quot;
Ok, here goes.

I too accepted the challenge, and tried to build a solution without first taking a sneak peek at yours.
I hoped to be able to solve it, and end up with a similar solution.
Like you said: &quot;SQL challenges typically can be dealt with in a number of ways&quot;.
I ended up with a query that (in my very humble and irrelevant opinion) is somewhat simpler, and therefor perhaps easier to understand for a developer who didn&#039;t build it, but is forced to do maintenance on it.
I&#039;d hereby like to offer you my solution.

I&#039;m not trying to say my solution is better then yours.
In fact, I can imagine that I overlooked something.
Although my query produces the same end-result, and although I did my best to implement the requirements, not to produce the desired data, There might still be something wrong with it.
If so, I&#039;d like to hear about it.

I added some extra records (policy_id = 3) to your data for 2 test-cases that I thought should be in there.
a) 3 periods that overlap: record 2 overlaps 1, 3 overlaps 2 but 3 does not overlap 1
b) 2 periods that do not overlap, but do &#039;connect&#039;: period 1 ends at date X and period 2 starts at X+1
(in fact your query sees the last case as 2 separate periods)

My added data:
3 overlapping periods
&lt;pre&gt;
insert into policy_periods values (3, to_date(&#039;01-01-2012&#039;,&#039;DD-MM-YYYY&#039;), to_date(&#039;01-06-2012&#039;,&#039;DD-MM-YYYY&#039;), &#039;I&#039;);
insert into policy_periods values (3, to_date(&#039;01-03-2012&#039;,&#039;DD-MM-YYYY&#039;), to_date(&#039;01-09-2012&#039;,&#039;DD-MM-YYYY&#039;), &#039;I&#039;);
insert into policy_periods values (3, to_date(&#039;01-08-2012&#039;,&#039;DD-MM-YYYY&#039;), to_date(&#039;01-11-2012&#039;,&#039;DD-MM-YYYY&#039;), &#039;I&#039;);
&lt;/pre&gt;
2 connecting periods
insert into policy_periods values (3, to_date(&#039;15-11-2012&#039;,&#039;DD-MM-YYYY&#039;), to_date(&#039;15-12-2012&#039;,&#039;DD-MM-YYYY&#039;), &#039;I&#039;);
insert into policy_periods values (3, to_date(&#039;16-12-2012&#039;,&#039;DD-MM-YYYY&#039;), to_date(&#039;31-12-2012&#039;,&#039;DD-MM-YYYY&#039;), &#039;I&#039;);
commit;

So, what does my query do?
First, like you, I decided that within an ID and type (in or out) I need to fuse periods that overlap (or connect) together.
So in the WITH clause there is query &#039;aggregated_periods&#039;.
Here, in the innermost query, I check for each record if it&#039;s period overlaps (or connects) to the previous period (for this id and type).
If it doesn&#039;t, the start_date is selected. If it does, the start_date is selected as being NULL, because I don&#039;t care about that start_date.
Next, from the result-set I select the last_value of this start_date with the &#039;ignore nulls&#039; option. This way all records that &#039;belong together&#039; because they overlap/connect will get the same start_date: the first (and only not-null) one.
Then from this result-set for every record I select the MAX(end_date) for the combination of id, type and start_date.
Now all records that ‘belong together’ have the same start and end date, so a final DISTINCT completes my quest for aggregated periods.

Since the &#039;out&#039; records are dominant (they overrule everything) I need to somehow select the periods that are NOT blocked by any out-period
The query &#039;valid_periods&#039; determines these periods.
For every &#039;out&#039; period in aggregated_periods it selects a period as:
start_date = the day after the end_date of the previous out-period, NULL means &#039;the beginning of time&#039;
end_date = the day before the start_date of this out-period, NULL means &#039;the end of time&#039;
One extra period is added for the period starting the day after the last &#039;out&#039; end_date until the end of time.

Finally the main query takes the &#039;in&#039; periods from aggregated_periods.
Valid_periods does not return records if there are no &#039;out&#039; periods, so the &#039;in&#039; periods are outer joined to valid_periods that overlap.
If no records are returned by valid_periods this causes a join to NULL values, meaning: &#039;from the beginning until the end of time&#039;
Than for every selected record the start_date becomes the greatest of the in- and valid-start_date, en the end_date becomes the least of the in- and valid-end_date
&lt;pre&gt;
SELECT policy_id
,      in_or_out
,      start_date
,      end_date
FROM   policy_periods
ORDER BY policy_id
,        in_or_out
,        start_date
,        end_date
;
&lt;/pre&gt;

&lt;pre&gt;
      POLICY_ID I START_DAT END_DATE
--------------- - --------- ---------
              1 I 10-OCT-10 15-MAY-11
              1 I 20-NOV-11 04-JUN-12
              1 I 05-APR-12 12-AUG-12
              1 I 31-OCT-12 01-MAR-13
              1 I 31-JAN-13 01-JUN-13
              1 O 15-OCT-10 19-OCT-10
              1 O 14-FEB-11 10-SEP-11
              1 O 01-FEB-12 15-APR-12
              1 O 01-JUN-12 21-JUL-12
              2 I 20-NOV-10 24-JUN-11
              2 I 05-APR-11 12-AUG-12
              2 O 14-FEB-10 10-SEP-11
              3 I 01-JAN-12 01-JUN-12
              3 I 01-MAR-12 01-SEP-12
              3 I 01-AUG-12 01-NOV-12
              3 I 15-NOV-12 15-DEC-12
              3 I 16-DEC-12 31-DEC-12

17 rows selected.
&lt;/pre&gt;
&lt;pre&gt;
WITH aggregated_periods
AS   (SELECT DISTINCT
             policy_id
      ,      in_or_out
      ,      start_date
      ,      MAX(end_date) OVER (PARTITION BY policy_id,in_or_out,start_date)  end_date
      FROM   (SELECT policy_id
              ,      in_or_out
              ,      LAST_VALUE (start_date IGNORE NULLS) OVER (PARTITION BY policy_id,in_or_out ORDER BY start_date_org NULLS LAST)   start_date
              ,      end_date
              FROM   (SELECT policy_id
                      ,      in_or_out
                      ,      start_date          start_date_org
                      ,      CASE
                               WHEN start_date -1 &lt;= LAG (end_date) OVER (PARTITION BY policy_id,in_or_out ORDER BY start_date,end_date)
                               THEN NULL         -- overlap with or connecting to previous period, so not interested in this start_date
                               ELSE start_date
                             END                 start_date
                      ,      end_date
                      FROM   policy_periods
                     )
             )
     )
,    valid_periods
AS   (SELECT policy_id
      ,      LAG(end_date) OVER (PARTITION BY policy_id ORDER BY start_date) + 1    start_date
      ,      start_date -1                                                          end_date
      FROM   aggregated_periods
      WHERE  in_or_out        = &#039;O&#039;
      UNION
      SELECT policy_id
      ,      MAX(end_date) + 1     start_date
      ,      NULL                  end_date
      FROM   aggregated_periods
      WHERE  in_or_out        = &#039;O&#039;
      GROUP BY policy_id
     )
SELECT per_in.policy_id
,      GREATEST (per_in.start_date
                ,NVL(per_val.start_date
                    ,per_in.start_date
                    )
                )               start_date
,      LEAST    (per_in.end_date
                ,NVL(per_val.end_date
                    ,per_in.end_date
                    )
                )               end_date
FROM   aggregated_periods           per_in
LEFT OUTER JOIN   valid_periods     per_val
  ON   per_val.policy_id       = per_in.policy_id
  -- And valid_period should overlap in-period
  AND  (   per_val.start_date IS NULL
        OR per_val.start_date &lt;= per_in.end_date
       )
  AND  (   per_val.end_date  IS NULL
        OR per_val.end_date   &gt;= per_in.start_date
       )
WHERE  per_in.in_or_out    = &#039;I&#039;
ORDER BY per_in.policy_id
,        start_date
;
&lt;/pre&gt;

&lt;pre&gt;
      POLICY_ID START_DAT END_DATE
--------------- --------- ---------
              1 10-OCT-10 14-OCT-10
              1 20-OCT-10 13-FEB-11
              1 20-NOV-11 31-JAN-12
              1 16-APR-12 31-MAY-12
              1 22-JUL-12 12-AUG-12
              1 31-OCT-12 01-JUN-13
              2 11-SEP-11 12-AUG-12
              3 01-JAN-12 01-NOV-12
              3 15-NOV-12 31-DEC-12

9 rows selected.
&lt;/pre&gt;]]></description>
		<content:encoded><![CDATA[<p>Lucas,</p>
<p>You said &#8220;I would be interested in learning about yours&#8221;<br />
Ok, here goes.</p>
<p>I too accepted the challenge, and tried to build a solution without first taking a sneak peek at yours.<br />
I hoped to be able to solve it, and end up with a similar solution.<br />
Like you said: &#8220;SQL challenges typically can be dealt with in a number of ways&#8221;.<br />
I ended up with a query that (in my very humble and irrelevant opinion) is somewhat simpler, and therefor perhaps easier to understand for a developer who didn&#8217;t build it, but is forced to do maintenance on it.<br />
I&#8217;d hereby like to offer you my solution.</p>
<p>I&#8217;m not trying to say my solution is better then yours.<br />
In fact, I can imagine that I overlooked something.<br />
Although my query produces the same end-result, and although I did my best to implement the requirements, not to produce the desired data, There might still be something wrong with it.<br />
If so, I&#8217;d like to hear about it.</p>
<p>I added some extra records (policy_id = 3) to your data for 2 test-cases that I thought should be in there.<br />
a) 3 periods that overlap: record 2 overlaps 1, 3 overlaps 2 but 3 does not overlap 1<br />
b) 2 periods that do not overlap, but do &#8216;connect&#8217;: period 1 ends at date X and period 2 starts at X+1<br />
(in fact your query sees the last case as 2 separate periods)</p>
<p>My added data:<br />
3 overlapping periods</p>
<pre class="wp-code-highlight prettyprint">
insert into policy_periods values (3, to_date(&#039;01-01-2012&#039;,&#039;DD-MM-YYYY&#039;), to_date(&#039;01-06-2012&#039;,&#039;DD-MM-YYYY&#039;), &#039;I&#039;);
insert into policy_periods values (3, to_date(&#039;01-03-2012&#039;,&#039;DD-MM-YYYY&#039;), to_date(&#039;01-09-2012&#039;,&#039;DD-MM-YYYY&#039;), &#039;I&#039;);
insert into policy_periods values (3, to_date(&#039;01-08-2012&#039;,&#039;DD-MM-YYYY&#039;), to_date(&#039;01-11-2012&#039;,&#039;DD-MM-YYYY&#039;), &#039;I&#039;);
</pre>
<p>2 connecting periods<br />
insert into policy_periods values (3, to_date(&#8217;15-11-2012&#8242;,&#8217;DD-MM-YYYY&#8217;), to_date(&#8217;15-12-2012&#8242;,&#8217;DD-MM-YYYY&#8217;), &#8216;I&#8217;);<br />
insert into policy_periods values (3, to_date(&#8217;16-12-2012&#8242;,&#8217;DD-MM-YYYY&#8217;), to_date(&#8217;31-12-2012&#8242;,&#8217;DD-MM-YYYY&#8217;), &#8216;I&#8217;);<br />
commit;</p>
<p>So, what does my query do?<br />
First, like you, I decided that within an ID and type (in or out) I need to fuse periods that overlap (or connect) together.<br />
So in the WITH clause there is query &#8216;aggregated_periods&#8217;.<br />
Here, in the innermost query, I check for each record if it&#8217;s period overlaps (or connects) to the previous period (for this id and type).<br />
If it doesn&#8217;t, the start_date is selected. If it does, the start_date is selected as being NULL, because I don&#8217;t care about that start_date.<br />
Next, from the result-set I select the last_value of this start_date with the &#8216;ignore nulls&#8217; option. This way all records that &#8216;belong together&#8217; because they overlap/connect will get the same start_date: the first (and only not-null) one.<br />
Then from this result-set for every record I select the MAX(end_date) for the combination of id, type and start_date.<br />
Now all records that ‘belong together’ have the same start and end date, so a final DISTINCT completes my quest for aggregated periods.</p>
<p>Since the &#8216;out&#8217; records are dominant (they overrule everything) I need to somehow select the periods that are NOT blocked by any out-period<br />
The query &#8216;valid_periods&#8217; determines these periods.<br />
For every &#8216;out&#8217; period in aggregated_periods it selects a period as:<br />
start_date = the day after the end_date of the previous out-period, NULL means &#8216;the beginning of time&#8217;<br />
end_date = the day before the start_date of this out-period, NULL means &#8216;the end of time&#8217;<br />
One extra period is added for the period starting the day after the last &#8216;out&#8217; end_date until the end of time.</p>
<p>Finally the main query takes the &#8216;in&#8217; periods from aggregated_periods.<br />
Valid_periods does not return records if there are no &#8216;out&#8217; periods, so the &#8216;in&#8217; periods are outer joined to valid_periods that overlap.<br />
If no records are returned by valid_periods this causes a join to NULL values, meaning: &#8216;from the beginning until the end of time&#8217;<br />
Than for every selected record the start_date becomes the greatest of the in- and valid-start_date, en the end_date becomes the least of the in- and valid-end_date</p>
<pre class="wp-code-highlight prettyprint">
SELECT policy_id
,      in_or_out
,      start_date
,      end_date
FROM   policy_periods
ORDER BY policy_id
,        in_or_out
,        start_date
,        end_date
;
</pre>
<pre class="wp-code-highlight prettyprint">
      POLICY_ID I START_DAT END_DATE
--------------- - --------- ---------
              1 I 10-OCT-10 15-MAY-11
              1 I 20-NOV-11 04-JUN-12
              1 I 05-APR-12 12-AUG-12
              1 I 31-OCT-12 01-MAR-13
              1 I 31-JAN-13 01-JUN-13
              1 O 15-OCT-10 19-OCT-10
              1 O 14-FEB-11 10-SEP-11
              1 O 01-FEB-12 15-APR-12
              1 O 01-JUN-12 21-JUL-12
              2 I 20-NOV-10 24-JUN-11
              2 I 05-APR-11 12-AUG-12
              2 O 14-FEB-10 10-SEP-11
              3 I 01-JAN-12 01-JUN-12
              3 I 01-MAR-12 01-SEP-12
              3 I 01-AUG-12 01-NOV-12
              3 I 15-NOV-12 15-DEC-12
              3 I 16-DEC-12 31-DEC-12

17 rows selected.
</pre>
<pre class="wp-code-highlight prettyprint">
WITH aggregated_periods
AS   (SELECT DISTINCT
             policy_id
      ,      in_or_out
      ,      start_date
      ,      MAX(end_date) OVER (PARTITION BY policy_id,in_or_out,start_date)  end_date
      FROM   (SELECT policy_id
              ,      in_or_out
              ,      LAST_VALUE (start_date IGNORE NULLS) OVER (PARTITION BY policy_id,in_or_out ORDER BY start_date_org NULLS LAST)   start_date
              ,      end_date
              FROM   (SELECT policy_id
                      ,      in_or_out
                      ,      start_date          start_date_org
                      ,      CASE
                               WHEN start_date -1 &lt;= LAG (end_date) OVER (PARTITION BY policy_id,in_or_out ORDER BY start_date,end_date)
                               THEN NULL         -- overlap with or connecting to previous period, so not interested in this start_date
                               ELSE start_date
                             END                 start_date
                      ,      end_date
                      FROM   policy_periods
                     )
             )
     )
,    valid_periods
AS   (SELECT policy_id
      ,      LAG(end_date) OVER (PARTITION BY policy_id ORDER BY start_date) + 1    start_date
      ,      start_date -1                                                          end_date
      FROM   aggregated_periods
      WHERE  in_or_out        = &#039;O&#039;
      UNION
      SELECT policy_id
      ,      MAX(end_date) + 1     start_date
      ,      NULL                  end_date
      FROM   aggregated_periods
      WHERE  in_or_out        = &#039;O&#039;
      GROUP BY policy_id
     )
SELECT per_in.policy_id
,      GREATEST (per_in.start_date
                ,NVL(per_val.start_date
                    ,per_in.start_date
                    )
                )               start_date
,      LEAST    (per_in.end_date
                ,NVL(per_val.end_date
                    ,per_in.end_date
                    )
                )               end_date
FROM   aggregated_periods           per_in
LEFT OUTER JOIN   valid_periods     per_val
  ON   per_val.policy_id       = per_in.policy_id
  -- And valid_period should overlap in-period
  AND  (   per_val.start_date IS NULL
        OR per_val.start_date &lt;= per_in.end_date
       )
  AND  (   per_val.end_date  IS NULL
        OR per_val.end_date   &gt;= per_in.start_date
       )
WHERE  per_in.in_or_out    = &#039;I&#039;
ORDER BY per_in.policy_id
,        start_date
;
</pre>
<pre class="wp-code-highlight prettyprint">
      POLICY_ID START_DAT END_DATE
--------------- --------- ---------
              1 10-OCT-10 14-OCT-10
              1 20-OCT-10 13-FEB-11
              1 20-NOV-11 31-JAN-12
              1 16-APR-12 31-MAY-12
              1 22-JUL-12 12-AUG-12
              1 31-OCT-12 01-JUN-13
              2 11-SEP-11 12-AUG-12
              3 01-JAN-12 01-NOV-12
              3 15-NOV-12 31-DEC-12

9 rows selected.
</pre>
]]></content:encoded>
	</item>
</channel>
</rss>
