Today, against all priorities, I have resumed an old hobby: the creation of charts with nothing but SQL to work with. Several years back I created pure SQL Hi-Lo Charts, Pie Charts, Gauge (Speedometer), Bar Chart and Gantt Chart (all these chart types and their underlying SQL are available on the AMIS Technology Blog). Today, I felt an urgent need for a scatterplot – a simple X-Y axis presentation of two variable data points. Examples are the Euro/Dollar conversion ratio against time, a growth curve (height vs. age), position vs. time, nerdiness vs. distance from Silicon Valley etc. Oracle SQL has many facilities – especially the WITH clause for subquery factoring – that make the creation of charts not as hard as you might think. Their usefulness is still pretty questionable though. To be honest: the Scatterplot took a little more effort than I had anticipated. Maybe I am getting a little rusty in SQL (with all my Java and SOA activities and all). Anyways, one of the charts I will be creating in SQL looks like this: It presents the growth curve for a female subject between ages 9 and a half years and 18 years. In that period, the subject grew from just over 134 cm to about 170 cm. The chart has a Y-axis with some markers to indicate the value scale (for the Y-coordinate of the data points). The horizontal axis has similar markers to indicate the value scale for the X-values.
Both axes are configurable in the underlying SQL-query: their value range, the number of markers and the overall size of the chart. The marker used in the chart (* in this case) can also be set. The data set for this sample chart can be presented as a simple table: This table does not exist as a database table – instead, an inline view is created that injects the data in to the chart generator, as we will see in a moment. Two types were created to help adapt the data to the SQL query:
1 2 3 | create type measurement_t as object (x number(5,2), y number(12,5)); create type measurement_tbl_t as table of measurement_t; |
Using these types, the Sub Query is defined as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | with age_height_measurements as ( select x, y from table ( measurement_tbl_t ( measurement_t(9.5, 137) , measurement_t(10.0, 138) , measurement_t(10.5, 139) , measurement_t(11, 142) , measurement_t(11.5, 144) , measurement_t(12, 147) , measurement_t(12.5, 150) , measurement_t(13, 153) , measurement_t(13.5, 155) , measurement_t(14, 158) , measurement_t(14.5, 161) , measurement_t(15, 163) , measurement_t(15.5, 165) , measurement_t(16, 167) , measurement_t(16.5, 168) , measurement_t(17.0, 169) , measurement_t(17.5, 169) , measurement_t(18.0, 169) ) ) ) , chart_query as -- the query that drives the chart; one bar per groupid, one sub-stack per category ( select x , y from age_height_measurements ) |
Based on this sub query, the more generic chart-initialization subqueries chart_query (for the data), chart_dimensions (for the configuration of the chart, independent from the data) and limits (to specify the value ranges for the two axes) are defined:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | , chart_query as -- the query that drives the chart; one bar per groupid, one sub-stack per category ( select x , y from dollar_euro_measurements ) , chart_dimensions as ( select 20 height , 50 width -- the width of the chart, can be changed , 10 x_margin -- positions to the left of the y-axis , 5 y_marker_distance , 20 x_marker_distance , '*' marker from dual ) , limits as ( select min (y)*0.8 low_limit -- decide to use 0 as the baseline for the chartś y-axis or some other value , max (y)*1.2 high_limit , min (x) start_x , max (x) end_x from chart_query ) |
The Y-axis with the marker values can be produced using two additional queries. One produces a record for each line in the chart and the other finds the lines that should have an Y-axis marker as well as the value associated with that marker:
1 2 3 4 5 6 7 8 9 10 11 | , lines as ( select level line from chart_dimensions connect by level <= height ) , y_markers as ( select line , (height-line)/height * (high_limit - low_limit) + low_limit marker_value from lines cross join chart_dimensions cross join limits where mod(height - line, y_marker_distance) = 0 ) |
The X-axis and its markers is produced as two additional records, to be added (UNION ALL) to the final result of the query:
1 2 3 4 5 6 7 8 9 | ... union all select height+1 line, lpad( ' ' ,x_margin)||axis from x_axis_1 cross join chart_dimensions where x = width union all select height + 2 line, lpad( ' ' ,x_margin)||axis from x_axis_2 cross join chart_dimensions where x = width |
The lines produced by these two queries look like this:
The subquery columns produces a single record for every column in the chart (determined by the width setting in chart_dimensions).
The x_axis subquery produces a record for each column and determines for the column if it represents a marker or not (resulting in a | or _ character) and if it is a marker, it also determines the value to be displayed for the marker. Subquery x_axis_1 is a recursive subquery that creates the axis itself by concatenating together all characters (_ and |) produced in the per-column-records produced by x_axis. Subquery x_axis_2 is similar, also using the recursive subquery mechanism. This query creates line with marker values in the right columns – matching the markers.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | , columns as ( select level x from chart_dimensions connect by level <= width ) , x_axis as ( select x , case mod(width - x, x_marker_distance) when 0 then '|' else '_' end character , case when mod(width-x , x_marker_distance)= 0 then substr( lpad(x/width * ( end_x- start_x) + start_x,4), 1,4) when mod(width -x, x_marker_distance)< 4 then '' else ' ' end marker_value from columns cross join chart_dimensions cross join limits ) , x_axis_1 (axis, x) as ( select character axis , x from x_axis where x = 1 union all select axis || character , x_axis.x from x_axis_1 join x_axis on (x_axis_1.x +1 = x_axis.x) ) , x_axis_2 (axis, x) as ( select to_char(marker_value) axis , x from x_axis where x = 1 union all select axis || marker_value , x_axis.x from x_axis_2 join x_axis on (x_axis_2.x +1 = x_axis.x) ) |
Subquery values_on_line finds for each line which values should be plotted on that line. These values to be plotted in the chart are subsequently gathered by the chart_lines sub query. This query returns records for all lines in the chart using a recursive subquery that joins with values_on_line. This query works its way through all columns for all lines and for each column determines whether it should contain a marker or a space character.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | , values_on_line as ( select line , c.x , round((c.x - start_x)/(end_x - start_x) * width) col from lines l cross join limits cross join chart_dimensions left outer join chart_query c on (c.y between (height-line-0.5)/height * (high_limit - low_limit) + low_limit and (height-line+0.5)/height * (high_limit - low_limit) + low_limit ) ) , chart_lines (line, chart_line, col) as ( select line , '|' axis , 0 from lines union all select c.line , chart_line|| case when v.line is null then ' ' else marker end , c.col+1 from chart_lines c left outer join values_on_line v on (v.line = c.line and c.col+1 = v.col) cross join chart_dimensions where c.col < width ) |
Finally the main query sweeps all results together – combining the y_markers with the chart_lines and adding the x_axis_1 and x_axis_2:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | select line , substr(lpad(nvl(to_char(round(marker_value,1)), ' ' ),25),26-x_margin,x_margin) ||chart_line y from lines left outer join y_markers using (line) join chart_lines using (line) cross join chart_dimensions where chart_lines.col = width union all select height+1 line, lpad( ' ' ,x_margin)||axis from x_axis_1 cross join chart_dimensions where x = width union all select height + 2 line, lpad( ' ' ,x_margin)||axis from x_axis_2 cross join chart_dimensions where x = width order by line |
The complete query is now:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 | with age_height_measurements as ( select x, y from table ( measurement_tbl_t ( measurement_t(9.5, 137) , measurement_t(10.0, 138) , measurement_t(10.5, 139) , measurement_t(11, 142) , measurement_t(11.5, 144) , measurement_t(12, 147) , measurement_t(12.5, 150) , measurement_t(13, 153) , measurement_t(13.5, 155) , measurement_t(14, 158) , measurement_t(14.5, 161) , measurement_t(15, 163) , measurement_t(15.5, 165) , measurement_t(16, 167) , measurement_t(16.5, 168) , measurement_t(17.0, 169) , measurement_t(17.5, 169) , measurement_t(18.0, 169) ) ) ) , chart_query as -- the query that drives the chart; one bar per groupid, one sub-stack per category ( select x , y from age_height_measurements ), chart_dimensions as ( select 20 height , 50 width -- the width of the chart, can be changed , 10 x_margin -- positions to the left of the y-axis , 5 y_marker_distance , 20 x_marker_distance , '*' marker from dual ) , limits as ( select min (y)*0.8 low_limit -- decide to use 0 as the baseline for the chartś y-axis or some other value , max (y)*1.2 high_limit , min (x) start_x , max (x) end_x from chart_query ) , lines as ( select level line from chart_dimensions connect by level <= height ) , y_markers as ( select line , (height-line)/height * (high_limit - low_limit) + low_limit marker_value from lines cross join chart_dimensions cross join limits where mod(height - line, y_marker_distance) = 0 ) , columns as ( select level x from chart_dimensions connect by level <= width ) , x_axis as ( select x , case mod(width - x, x_marker_distance) when 0 then '|' else '_' end character , case when mod(width-x , x_marker_distance)= 0 then substr( lpad(x/width * ( end_x- start_x) + start_x,4), 1,4) when mod(width -x, x_marker_distance)< 4 then '' else ' ' end marker_value from columns cross join chart_dimensions cross join limits ) , x_axis_1 (axis, x) as ( select character axis , x from x_axis where x = 1 union all select axis || character , x_axis.x from x_axis_1 join x_axis on (x_axis_1.x +1 = x_axis.x) ) , x_axis_2 (axis, x) as ( select to_char(marker_value) axis , x from x_axis where x = 1 union all select axis || marker_value , x_axis.x from x_axis_2 join x_axis on (x_axis_2.x +1 = x_axis.x) ) , values_on_line as ( select line , c.x , round((c.x - start_x)/(end_x - start_x) * width) col from lines l cross join limits cross join chart_dimensions left outer join chart_query c on (c.y between (height-line-0.5)/height * (high_limit - low_limit) + low_limit and (height-line+0.5)/height * (high_limit - low_limit) + low_limit ) ) , chart_lines (line, chart_line, col) as ( select line , '|' axis , 0 from lines union all select c.line , chart_line|| case when v.line is null then ' ' else marker end , c.col+1 from chart_lines c left outer join values_on_line v on (v.line = c.line and c.col+1 = v.col) cross join chart_dimensions where c.col < width ) select line , substr(lpad(nvl(to_char(round(marker_value,1)), ' ' ),25),26-x_margin,x_margin) ||chart_line y from lines left outer join y_markers using (line) join chart_lines using (line) cross join chart_dimensions where chart_lines.col = width union all select height+1 line, lpad( ' ' ,x_margin)||axis from x_axis_1 cross join chart_dimensions where x = width union all select height + 2 line, lpad( ' ' ,x_margin)||axis from x_axis_2 cross join chart_dimensions where x = width order by line |
This chart displays the Dollar/Euro conversion ratio for the last three years Using this query combined with the data for the Dollar/Euro conversion ratio for the last three years, it is easy to create another scatterplot:
The start of the the query producing this chart is like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | with dollar_euro_measurements as ( select x, y from table ( measurement_tbl_t ( measurement_t(10.08, 1.43) , measurement_t(10.16, 1.37) , measurement_t(10.25, 1.136) , measurement_t(10.5, 1.28) , measurement_t(10.75, 1.39) , measurement_t(11.0, 1.34) , measurement_t(11.25, 1.45) , measurement_t(11.5, 1.43) , measurement_t(11.75, 1.37) , measurement_t(12.0, 1.29) , measurement_t(12.25, 1.33) , measurement_t(12.5, 1.236) , measurement_t(12.75, 1.30) , measurement_t(13.0, 1.33) , measurement_t(13.25, 1.30) , measurement_t(13.5, 1.31) , measurement_t(13.75, 1.36) , measurement_t(14.0, 1.36) , measurement_t(14.25, 1.38) , measurement_t(14.375, 1.37) ) ) ) , chart_query as -- the query that drives the chart; one bar per groupid, one sub-stack per category ( select x , y from dollar_euro_measurements ) , chart_dimensions as ( select 20 height ,... |