SQL based Charts - Scatterplot for x,y coordinate data sets (growth curve, euro/dollar ratio) image78

SQL based Charts – Scatterplot for x,y coordinate data sets (growth curve, euro/dollar ratio)

Today, against all priorities, I have resumed an old hobby: the creation of charts with nothing but SQL to work with. Several years back I created pure SQL Hi-Lo Charts, Pie Charts, Gauge (Speedometer), Bar Chart and Gantt Chart (all these chart types and their underlying SQL are available on the AMIS Technology Blog). Today, I felt an urgent need for a scatterplot – a simple X-Y axis presentation of two variable data points. Examples are the Euro/Dollar conversion ratio against time, a growth curve (height vs. age), position vs. time, nerdiness vs. distance from Silicon Valley etc. Oracle SQL has many facilities – especially the WITH clause for subquery factoring – that make the creation of charts not as hard as you might think. Their usefulness is still pretty questionable though. imageTo be honest: the Scatterplot took a little more effort than I had anticipated. Maybe I am getting a little rusty in SQL (with all my Java and SOA activities and all). Anyways, one of the charts I will be creating in SQL looks like this: It presents the growth curve for a female subject between ages 9 and a half years and 18 years. In that period, the subject grew from just over 134 cm to about 170 cm. The chart has a Y-axis with some markers to indicate the value scale (for the Y-coordinate of the data points). The horizontal axis has similar markers to indicate the value scale for the X-values. imageBoth axes are configurable in the underlying SQL-query: their value range, the number of markers and the overall size of the chart. The marker used in the chart (* in this case) can also be set. The data set for this sample chart can be presented as a simple table: This table does not exist as a database table – instead, an inline view is created that injects the data in to the chart generator, as we will see in a moment. Two types were created to help adapt the data to the SQL query:

 

 

 

create type measurement_t as object (x number(5,2), y number(12,5));

create type measurement_tbl_t as table of measurement_t;

Using these types, the Sub Query is defined as follows:

with age_height_measurements as
( select x, y
from table( measurement_tbl_t
( measurement_t(9.5, 137)
, measurement_t(10.0, 138)
, measurement_t(10.5, 139)
, measurement_t(11, 142)
, measurement_t(11.5, 144)
, measurement_t(12, 147)
, measurement_t(12.5, 150)
, measurement_t(13, 153)
, measurement_t(13.5, 155)
, measurement_t(14, 158)
, measurement_t(14.5, 161)
, measurement_t(15, 163)
, measurement_t(15.5, 165)
, measurement_t(16, 167)
, measurement_t(16.5, 168)
, measurement_t(17.0, 169)
, measurement_t(17.5, 169)
, measurement_t(18.0, 169)
)
)
)
, chart_query as -- the query that drives the chart; one bar per groupid, one sub-stack per category
( select x
, y
from age_height_measurements
)

Based on this sub query, the more generic chart-initialization subqueries chart_query (for the data), chart_dimensions (for the configuration of the chart, independent from the data) and limits (to specify the value ranges for the two axes) are defined:

, chart_query as -- the query that drives the chart; one bar per groupid, one sub-stack per category
( select x
, y
from dollar_euro_measurements
)
, chart_dimensions as
( select 20 height
, 50 width -- the width of the chart, can be changed
, 10 x_margin -- positions to the left of the y-axis
, 5 y_marker_distance
, 20 x_marker_distance
, '*' marker
from dual
)
, limits as
( select min(y)*0.8 low_limit -- decide to use 0 as the baseline for the chartÅ› y-axis or some other value
, max(y)*1.2 high_limit
, min(x) start_x
, max(x) end_x
from chart_query
)

The Y-axis with the marker values can be produced using two additional queries. One produces a record for each line in the chart and the other finds the lines that should have an Y-axis marker as well as the value associated with that marker:

, lines as
( select level line
from chart_dimensions
connect by level <= height
)
, y_markers as
( select line
, (height-line)/height * (high_limit - low_limit) + low_limit marker_value
from lines cross join chart_dimensions cross join limits
where mod(height - line, y_marker_distance) = 0
)

The X-axis and its markers is produced as two additional records, to be added (UNION ALL) to the final result of the query:

...
union all
select height+1 line, lpad(' ',x_margin)||axis
from x_axis_1 cross join chart_dimensions
where x = width
union all
select height + 2 line, lpad(' ',x_margin)||axis
from x_axis_2 cross join chart_dimensions
where x = width

imageThe lines produced by these two queries look like this:

image The subquery columns produces a single record for every column in the chart (determined by the width setting in chart_dimensions).

The x_axis subquery produces a record for each column and determines for the column if it represents a marker or not (resulting in a | or _ character) and if it is a marker, it also determines the value to be displayed for the marker. Subquery x_axis_1 is a recursive subquery that creates the axis itself by concatenating together all characters (_ and |) produced in the per-column-records produced by x_axis. Subquery x_axis_2 is similar, also using the recursive subquery mechanism. This query creates line with marker values in the right columns – matching the markers.

, columns as
( select level x
from chart_dimensions
connect by level <= width
)
, x_axis as
( select x
, case mod(width - x, x_marker_distance)
when 0 then '|' else '_'
end character
, case
when mod(width-x , x_marker_distance)= 0 then substr( lpad(x/width * ( end_x- start_x) + start_x,4), 1,4)
when mod(width -x, x_marker_distance)< 4 then ''
else ' '
end marker_value
from columns cross join chart_dimensions cross join limits
)
, x_axis_1 (axis, x) as
( select character axis
, x
from x_axis
where x = 1
union all
select axis || character
, x_axis.x
from x_axis_1
join
x_axis
on (x_axis_1.x +1 = x_axis.x)
)
, x_axis_2 (axis, x) as
( select to_char(marker_value) axis
, x
from x_axis
where x = 1
union all
select axis || marker_value
, x_axis.x
from x_axis_2
join
x_axis
on (x_axis_2.x +1 = x_axis.x)
)

Subquery values_on_line finds for each line which values should be plotted on that line. These values to be plotted in the chart are subsequently gathered by the chart_lines sub query. This query returns records for all lines in the chart using a recursive subquery that joins with values_on_line. This query works its way through all columns for all lines and for each column determines whether it should contain a marker or a space character.

, values_on_line as
(
select line
, c.x
, round((c.x - start_x)/(end_x - start_x) * width) col
from lines l
cross join limits
cross join chart_dimensions
left outer join
chart_query c
on (c.y between (height-line-0.5)/height * (high_limit - low_limit) + low_limit
and
(height-line+0.5)/height * (high_limit - low_limit) + low_limit
)
)
, chart_lines (line, chart_line, col) as
( select line
, '|' axis
, 0
from lines
union all
select c.line
, chart_line|| case when v.line is null then ' ' else marker end
, c.col+1
from chart_lines c
left outer join
values_on_line v
on (v.line = c.line and c.col+1 = v.col)
cross join chart_dimensions
where c.col < width
)

Finally the main query sweeps all results together – combining the y_markers with the chart_lines and adding the x_axis_1 and x_axis_2:

select line
, substr(lpad(nvl(to_char(round(marker_value,1)),' '),25),26-x_margin,x_margin) ||chart_line y
from lines
left outer join
y_markers
using (line)
join
chart_lines
using (line)
cross join chart_dimensions
where chart_lines.col = width
union all
select height+1 line, lpad(' ',x_margin)||axis
from x_axis_1 cross join chart_dimensions
where x = width
union all
select height + 2 line, lpad(' ',x_margin)||axis
from x_axis_2 cross join chart_dimensions
where x = width
order
by line

The complete query is now:

with age_height_measurements as
( select x, y
from table( measurement_tbl_t
( measurement_t(9.5, 137)
, measurement_t(10.0, 138)
, measurement_t(10.5, 139)
, measurement_t(11, 142)
, measurement_t(11.5, 144)
, measurement_t(12, 147)
, measurement_t(12.5, 150)
, measurement_t(13, 153)
, measurement_t(13.5, 155)
, measurement_t(14, 158)
, measurement_t(14.5, 161)
, measurement_t(15, 163)
, measurement_t(15.5, 165)
, measurement_t(16, 167)
, measurement_t(16.5, 168)
, measurement_t(17.0, 169)
, measurement_t(17.5, 169)
, measurement_t(18.0, 169)
)
)
)
, chart_query as -- the query that drives the chart; one bar per groupid, one sub-stack per category
( select x
, y
from age_height_measurements
), chart_dimensions as
( select 20 height
, 50 width -- the width of the chart, can be changed
, 10 x_margin -- positions to the left of the y-axis
, 5 y_marker_distance
, 20 x_marker_distance
, '*' marker
from dual
)
, limits as
( select min(y)*0.8 low_limit -- decide to use 0 as the baseline for the chartÅ› y-axis or some other value
, max(y)*1.2 high_limit
, min(x) start_x
, max(x) end_x
from chart_query
)
, lines as
( select level line
from chart_dimensions
connect by level <= height
)
, y_markers as
( select line
, (height-line)/height * (high_limit - low_limit) + low_limit marker_value
from lines cross join chart_dimensions cross join limits
where mod(height - line, y_marker_distance) = 0
)
, columns as
( select level x
from chart_dimensions
connect by level <= width
)
, x_axis as
( select x
, case mod(width - x, x_marker_distance)
when 0 then '|' else '_'
end character
, case
when mod(width-x , x_marker_distance)= 0 then substr( lpad(x/width * ( end_x- start_x) + start_x,4), 1,4)
when mod(width -x, x_marker_distance)< 4 then ''
else ' '
end marker_value
from columns cross join chart_dimensions cross join limits
)
, x_axis_1 (axis, x) as
( select character axis
, x
from x_axis
where x = 1
union all
select axis || character
, x_axis.x
from x_axis_1
join
x_axis
on (x_axis_1.x +1 = x_axis.x)
)
, x_axis_2 (axis, x) as
( select to_char(marker_value) axis
, x
from x_axis
where x = 1
union all
select axis || marker_value
, x_axis.x
from x_axis_2
join
x_axis
on (x_axis_2.x +1 = x_axis.x)
)
, values_on_line as
(
select line
, c.x
, round((c.x - start_x)/(end_x - start_x) * width) col
from lines l
cross join limits
cross join chart_dimensions
left outer join
chart_query c
on (c.y between (height-line-0.5)/height * (high_limit - low_limit) + low_limit
and
(height-line+0.5)/height * (high_limit - low_limit) + low_limit
)
)
, chart_lines (line, chart_line, col) as
( select line
, '|' axis
, 0
from lines
union all
select c.line
, chart_line|| case when v.line is null then ' ' else marker end
, c.col+1
from chart_lines c
left outer join
values_on_line v
on (v.line = c.line and c.col+1 = v.col)
cross join chart_dimensions
where c.col < width
)
select line
, substr(lpad(nvl(to_char(round(marker_value,1)),' '),25),26-x_margin,x_margin) ||chart_line y
from lines
left outer join
y_markers
using (line)
join
chart_lines
using (line)
cross join chart_dimensions
where chart_lines.col = width
union all
select height+1 line, lpad(' ',x_margin)||axis
from x_axis_1 cross join chart_dimensions
where x = width
union all
select height + 2 line, lpad(' ',x_margin)||axis
from x_axis_2 cross join chart_dimensions
where x = width
order
by line

This chart displays the Dollar/Euro conversion ratio for the last three years image Using this query combined with the data for the Dollar/Euro conversion ratio for the last three years, it is easy to create another scatterplot:
image The start of the the query producing this chart is like this:

with dollar_euro_measurements as
( select x, y
from table( measurement_tbl_t
( measurement_t(10.08, 1.43)
, measurement_t(10.16, 1.37)
, measurement_t(10.25, 1.136)
, measurement_t(10.5, 1.28)
, measurement_t(10.75, 1.39)
, measurement_t(11.0, 1.34)
, measurement_t(11.25, 1.45)
, measurement_t(11.5, 1.43)
, measurement_t(11.75, 1.37)
, measurement_t(12.0, 1.29)
, measurement_t(12.25, 1.33)
, measurement_t(12.5, 1.236)
, measurement_t(12.75, 1.30)
, measurement_t(13.0, 1.33)
, measurement_t(13.25, 1.30)
, measurement_t(13.5, 1.31)
, measurement_t(13.75, 1.36)
, measurement_t(14.0, 1.36)
, measurement_t(14.25, 1.38)
, measurement_t(14.375, 1.37)
)
)
)
, chart_query as -- the query that drives the chart; one bar per groupid, one sub-stack per category
( select x
, y
from dollar_euro_measurements
)
, chart_dimensions as
( select 20 height
,...