SQL based Charts - Scatterplot for x,y coordinate data sets (growth curve, euro/dollar ratio) image78

SQL based Charts – Scatterplot for x,y coordinate data sets (growth curve, euro/dollar ratio)

Today, against all priorities, I have resumed an old hobby: the creation of charts with nothing but SQL to work with. Several years back I created pure SQL Hi-Lo Charts, Pie Charts, Gauge (Speedometer), Bar Chart and Gantt Chart (all these chart types and their underlying SQL are available on the AMIS Technology Blog). Today, I felt an urgent need for a scatterplot – a simple X-Y axis presentation of two variable data points. Examples are the Euro/Dollar conversion ratio against time, a growth curve (height vs. age), position vs. time, nerdiness vs. distance from Silicon Valley etc. Oracle SQL has many facilities – especially the WITH clause for subquery factoring – that make the creation of charts not as hard as you might think. Their usefulness is still pretty questionable though. imageTo be honest: the Scatterplot took a little more effort than I had anticipated. Maybe I am getting a little rusty in SQL (with all my Java and SOA activities and all). Anyways, one of the charts I will be creating in SQL looks like this: It presents the growth curve for a female subject between ages 9 and a half years and 18 years. In that period, the subject grew from just over 134 cm to about 170 cm. The chart has a Y-axis with some markers to indicate the value scale (for the Y-coordinate of the data points). The horizontal axis has similar markers to indicate the value scale for the X-values. imageBoth axes are configurable in the underlying SQL-query: their value range, the number of markers and the overall size of the chart. The marker used in the chart (* in this case) can also be set. The data set for this sample chart can be presented as a simple table: This table does not exist as a database table – instead, an inline view is created that injects the data in to the chart generator, as we will see in a moment. Two types were created to help adapt the data to the SQL query:

 

 

 

1
2
3
create type measurement_t as object (x number(5,2), y number(12,5));
 
create type measurement_tbl_t as table of measurement_t;

Using these types, the Sub Query is defined as follows:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
with age_height_measurements as
( select x, y
from table( measurement_tbl_t
( measurement_t(9.5, 137)
, measurement_t(10.0, 138)
, measurement_t(10.5, 139)
, measurement_t(11, 142)
, measurement_t(11.5, 144)
, measurement_t(12, 147)
, measurement_t(12.5, 150)
, measurement_t(13, 153)
, measurement_t(13.5, 155)
, measurement_t(14, 158)
, measurement_t(14.5, 161)
, measurement_t(15, 163)
, measurement_t(15.5, 165)
, measurement_t(16, 167)
, measurement_t(16.5, 168)
, measurement_t(17.0, 169)
, measurement_t(17.5, 169)
, measurement_t(18.0, 169)
)
)
)
, chart_query as -- the query that drives the chart; one bar per groupid, one sub-stack per category
( select x
, y
from age_height_measurements
)

Based on this sub query, the more generic chart-initialization subqueries chart_query (for the data), chart_dimensions (for the configuration of the chart, independent from the data) and limits (to specify the value ranges for the two axes) are defined:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
, chart_query as -- the query that drives the chart; one bar per groupid, one sub-stack per category
( select x
, y
from dollar_euro_measurements
)
, chart_dimensions as
( select 20 height
, 50 width -- the width of the chart, can be changed
, 10 x_margin -- positions to the left of the y-axis
, 5 y_marker_distance
, 20 x_marker_distance
, '*' marker
from dual
)
, limits as
( select min(y)*0.8 low_limit -- decide to use 0 as the baseline for the chartś y-axis or some other value
, max(y)*1.2 high_limit
, min(x) start_x
, max(x) end_x
from chart_query
)

The Y-axis with the marker values can be produced using two additional queries. One produces a record for each line in the chart and the other finds the lines that should have an Y-axis marker as well as the value associated with that marker:

1
2
3
4
5
6
7
8
9
10
11
, lines as
( select level line
from chart_dimensions
connect by level <= height
)
, y_markers as
( select line
, (height-line)/height * (high_limit - low_limit) + low_limit marker_value
from lines cross join chart_dimensions cross join limits
where mod(height - line, y_marker_distance) = 0
)

The X-axis and its markers is produced as two additional records, to be added (UNION ALL) to the final result of the query:

1
2
3
4
5
6
7
8
9
...
union all
select height+1 line, lpad(' ',x_margin)||axis
from x_axis_1 cross join chart_dimensions
where x = width
union all
select height + 2 line, lpad(' ',x_margin)||axis
from x_axis_2 cross join chart_dimensions
where x = width

imageThe lines produced by these two queries look like this:

image The subquery columns produces a single record for every column in the chart (determined by the width setting in chart_dimensions).

The x_axis subquery produces a record for each column and determines for the column if it represents a marker or not (resulting in a | or _ character) and if it is a marker, it also determines the value to be displayed for the marker. Subquery x_axis_1 is a recursive subquery that creates the axis itself by concatenating together all characters (_ and |) produced in the per-column-records produced by x_axis. Subquery x_axis_2 is similar, also using the recursive subquery mechanism. This query creates line with marker values in the right columns – matching the markers.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
, columns as
( select level x
from chart_dimensions
connect by level <= width
)
, x_axis as
( select x
, case mod(width - x, x_marker_distance)
when 0 then '|' else '_'
end character
, case
when mod(width-x , x_marker_distance)= 0 then substr( lpad(x/width * ( end_x- start_x) + start_x,4), 1,4)
when mod(width -x, x_marker_distance)< 4 then ''
else ' '
end marker_value
from columns cross join chart_dimensions cross join limits
)
, x_axis_1 (axis, x) as
( select character axis
, x
from x_axis
where x = 1
union all
select axis || character
, x_axis.x
from x_axis_1
join
x_axis
on (x_axis_1.x +1 = x_axis.x)
)
, x_axis_2 (axis, x) as
( select to_char(marker_value) axis
, x
from x_axis
where x = 1
union all
select axis || marker_value
, x_axis.x
from x_axis_2
join
x_axis
on (x_axis_2.x +1 = x_axis.x)
)

Subquery values_on_line finds for each line which values should be plotted on that line. These values to be plotted in the chart are subsequently gathered by the chart_lines sub query. This query returns records for all lines in the chart using a recursive subquery that joins with values_on_line. This query works its way through all columns for all lines and for each column determines whether it should contain a marker or a space character.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
, values_on_line as
(
select line
, c.x
, round((c.x - start_x)/(end_x - start_x) * width) col
from lines l
cross join limits
cross join chart_dimensions
left outer join
chart_query c
on (c.y between (height-line-0.5)/height * (high_limit - low_limit) + low_limit
and
(height-line+0.5)/height * (high_limit - low_limit) + low_limit
)
)
, chart_lines (line, chart_line, col) as
( select line
, '|' axis
, 0
from lines
union all
select c.line
, chart_line|| case when v.line is null then ' ' else marker end
, c.col+1
from chart_lines c
left outer join
values_on_line v
on (v.line = c.line and c.col+1 = v.col)
cross join chart_dimensions
where c.col < width
)

Finally the main query sweeps all results together – combining the y_markers with the chart_lines and adding the x_axis_1 and x_axis_2:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
select line
, substr(lpad(nvl(to_char(round(marker_value,1)),' '),25),26-x_margin,x_margin) ||chart_line y
from lines
left outer join
y_markers
using (line)
join
chart_lines
using (line)
cross join chart_dimensions
where chart_lines.col = width
union all
select height+1 line, lpad(' ',x_margin)||axis
from x_axis_1 cross join chart_dimensions
where x = width
union all
select height + 2 line, lpad(' ',x_margin)||axis
from x_axis_2 cross join chart_dimensions
where x = width
order
by line

The complete query is now:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
with age_height_measurements as
( select x, y
from table( measurement_tbl_t
( measurement_t(9.5, 137)
, measurement_t(10.0, 138)
, measurement_t(10.5, 139)
, measurement_t(11, 142)
, measurement_t(11.5, 144)
, measurement_t(12, 147)
, measurement_t(12.5, 150)
, measurement_t(13, 153)
, measurement_t(13.5, 155)
, measurement_t(14, 158)
, measurement_t(14.5, 161)
, measurement_t(15, 163)
, measurement_t(15.5, 165)
, measurement_t(16, 167)
, measurement_t(16.5, 168)
, measurement_t(17.0, 169)
, measurement_t(17.5, 169)
, measurement_t(18.0, 169)
)
)
)
, chart_query as -- the query that drives the chart; one bar per groupid, one sub-stack per category
( select x
, y
from age_height_measurements
), chart_dimensions as
( select 20 height
, 50 width -- the width of the chart, can be changed
, 10 x_margin -- positions to the left of the y-axis
, 5 y_marker_distance
, 20 x_marker_distance
, '*' marker
from dual
)
, limits as
( select min(y)*0.8 low_limit -- decide to use 0 as the baseline for the chartś y-axis or some other value
, max(y)*1.2 high_limit
, min(x) start_x
, max(x) end_x
from chart_query
)
, lines as
( select level line
from chart_dimensions
connect by level <= height
)
, y_markers as
( select line
, (height-line)/height * (high_limit - low_limit) + low_limit marker_value
from lines cross join chart_dimensions cross join limits
where mod(height - line, y_marker_distance) = 0
)
, columns as
( select level x
from chart_dimensions
connect by level <= width
)
, x_axis as
( select x
, case mod(width - x, x_marker_distance)
when 0 then '|' else '_'
end character
, case
when mod(width-x , x_marker_distance)= 0 then substr( lpad(x/width * ( end_x- start_x) + start_x,4), 1,4)
when mod(width -x, x_marker_distance)< 4 then ''
else ' '
end marker_value
from columns cross join chart_dimensions cross join limits
)
, x_axis_1 (axis, x) as
( select character axis
, x
from x_axis
where x = 1
union all
select axis || character
, x_axis.x
from x_axis_1
join
x_axis
on (x_axis_1.x +1 = x_axis.x)
)
, x_axis_2 (axis, x) as
( select to_char(marker_value) axis
, x
from x_axis
where x = 1
union all
select axis || marker_value
, x_axis.x
from x_axis_2
join
x_axis
on (x_axis_2.x +1 = x_axis.x)
)
, values_on_line as
(
select line
, c.x
, round((c.x - start_x)/(end_x - start_x) * width) col
from lines l
cross join limits
cross join chart_dimensions
left outer join
chart_query c
on (c.y between (height-line-0.5)/height * (high_limit - low_limit) + low_limit
and
(height-line+0.5)/height * (high_limit - low_limit) + low_limit
)
)
, chart_lines (line, chart_line, col) as
( select line
, '|' axis
, 0
from lines
union all
select c.line
, chart_line|| case when v.line is null then ' ' else marker end
, c.col+1
from chart_lines c
left outer join
values_on_line v
on (v.line = c.line and c.col+1 = v.col)
cross join chart_dimensions
where c.col < width
)
select line
, substr(lpad(nvl(to_char(round(marker_value,1)),' '),25),26-x_margin,x_margin) ||chart_line y
from lines
left outer join
y_markers
using (line)
join
chart_lines
using (line)
cross join chart_dimensions
where chart_lines.col = width
union all
select height+1 line, lpad(' ',x_margin)||axis
from x_axis_1 cross join chart_dimensions
where x = width
union all
select height + 2 line, lpad(' ',x_margin)||axis
from x_axis_2 cross join chart_dimensions
where x = width
order
by line

This chart displays the Dollar/Euro conversion ratio for the last three years image Using this query combined with the data for the Dollar/Euro conversion ratio for the last three years, it is easy to create another scatterplot:
image The start of the the query producing this chart is like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
with dollar_euro_measurements as
( select x, y
from table( measurement_tbl_t
( measurement_t(10.08, 1.43)
, measurement_t(10.16, 1.37)
, measurement_t(10.25, 1.136)
, measurement_t(10.5, 1.28)
, measurement_t(10.75, 1.39)
, measurement_t(11.0, 1.34)
, measurement_t(11.25, 1.45)
, measurement_t(11.5, 1.43)
, measurement_t(11.75, 1.37)
, measurement_t(12.0, 1.29)
, measurement_t(12.25, 1.33)
, measurement_t(12.5, 1.236)
, measurement_t(12.75, 1.30)
, measurement_t(13.0, 1.33)
, measurement_t(13.25, 1.30)
, measurement_t(13.5, 1.31)
, measurement_t(13.75, 1.36)
, measurement_t(14.0, 1.36)
, measurement_t(14.25, 1.38)
, measurement_t(14.375, 1.37)
)
)
)
, chart_query as -- the query that drives the chart; one bar per groupid, one sub-stack per category
( select x
, y
from dollar_euro_measurements
)
, chart_dimensions as
( select 20 height
,...