Building a Route Planner with Oracle SQL – Finding the quickest route in a graph

Routeplanners have always been somewhat magical in my eyes. Finding the quickest route from A to B in a graph of thousands of points and connections. I have tried to imagine how to approach such as problem. I have even come up with several solutions that I think might help to implement a performant system for unveiling the fasted itinerary between two points in the graph.

The other day, while preparing for the Quiz on SQL and PL/SQL that Alex and I will do during ODTUG 2006 in two weeks time, I got hooked again on this same challenge. This time, the challenge translated to: write a SQL query that will find the quickest path between two points.

In this article I will describe what solution I came up with this time, using the Oracle Hierarchical Query and some of its more recent operators. I will demonstrate a query that will give us the quickest route between any two points in the graph shown below:
Building a Route Planner with Oracle SQL - Finding the quickest route in a graph routeplan1

....
 

The starting situation is the graph depicted above. This is the graphical representation of the points and connections. The same information is held in two tables: POINTS and CONNECTIONS.

Their create scripts are like this:

create table points
( id    number(5)
, name  varchar2(30)
)
/

create table connections
( from_id number(5)
, to_id   number(5)
, distance number(5)
)
/

They contain the data for the graph shown above, with five points and a total of six connections. Their contents are shown below:
Building a Route Planner with Oracle SQL - Finding the quickest route in a graph routeplan2The question I would like to answer with a SQL query is: what is the shortest route – in terms of distance – from A to D. Of course with this simple graph we can easily tell ourselves by merely glancing at the graph. It is from A to C to D, a total distance of 7. But can a query give me the same answer?

I create a helper-view to simplify the query we will write later on. This view joins together points (source), connections and points (destination), so that it will be easier to write the hierarchical query. Note: we could also have used an in-line view with the same query:

create or replace view point_connections
as
select from_point.name from_point
,      to_point.name   to_point
,      conn.distance  distance
from   points from_point
,      connections conn
,      points to_point
where  conn.from_id = from_point.id
and    conn.to_id = to_point.id
/

This view contains all connections. Note: I have assumed a directed graph where a connection from A to B does not necessarily mean there is also a connection from B to A. It would be simple to include those reverse connections as well, for example by extending the view definition with a UNION ALL and the same query with the from_id and to_id switched in the WHERE clause.

The query I intend to write will treat the graph as a tree. It will perform a CONNECT BY query, starting at Point A and using the point_connection to connect by prior to_point = from_point:

select level number_of_transfers
,      sys_connect_by_path(distance,'+') route_cost
,      CONNECT_BY_ROOT from_point||sys_connect_by_path(to_point,'/') route
from   point_connections pc
connect
by     nocycle prior to_point = from_point
start
with   from_point = 'A'  -- starting point of the trip
/
 

This query will return all routes starting at A , including the ones that do not end at D. Note that NOCYCLE was added to prevent routes from circling endlessly through the graph. Suppose that A connects to C connects to B connects back to A – a very real possibility. In that case, without the NOCYCLE, we would have blown up the query. Since in this query all routes have to start at A, the expression CONNECT_BY_ROOT from_point is a little overdone, as it will always return 'A'. However, to make the query as generic as possible, I have used CONNECT_BY_ROOT from_point to return for each record the value of from_point in the first step of the route.

For each route it will return an indication of the total distance, through the route_cost column that contains an expression seemingly adding together the distances for all connections travelled in the returned route:

NUMBER_OF_TRANSFERS ROUTE_COST      ROUTE
------------------- --------------- ---------
                  1 +3              A/B
                  2 +3+5            A/B/D
                  2 +3+2            A/B/E
                  3 +3+2+1          A/B/E/D
                  1 +6              A/C
                  2 +6+1            A/C/D
                  2 +6+2            A/C/E
                  3 +6+2+1          A/C/E/D

8 rows selected.

We need to filter the above query result on only the routes that end at D. Subsequently, we need to order them by route_cost, returning the shortest route first. The latter is achieved using a simple PL/SQL function that calculates the distance based on the route_cose expression, simply by performing a little piece of Dynamic SQL. Note: this trick of turning the value from a SYS_CONNNECT_BY_PATH expression into a calculated value is described in Mastering Oracle PL/SQL: Practical Solutions by Connor McDonald, Chaim Katz, Christopher Beck, Joel R. Kallman, David C. Knox.

The PL/SQL function used for calculating the ROUTE_COST is like this:

create or replace
function  evaluate_string
( p_string in varchar2
) return number
is
  l_result number;
begin
  execute immediate
    'select '||p_string||' from dual'
    into l_result;
  return l_result;
end evaluate_string;
/

The p_string that is passed into this function is assumed to contain a valid SQL expression. This expression is pasted into a simple query against dual. The result is returned. Usage for example:

select evaluate_string('5*4')
from   dual
/
EVALUATE_STRING('5*4')
----------------------
                    20

Which is quite silly as it is the exact same thing as

select 5*4
from   dual
/

The value of this little function is apparent when the query returns an expression we cannot predict. For example:

select ename
,      sal
     ||case job
       when 'SALESMAN'
       then '+'
       when 'CLERK'
       then '*'
       when 'PRESIDENT'
       then '/'
       else '-'
       end
     ||((sysdate - hiredate)/25) new_sal
from   emp
/

To know the values for new_sal, we need to dynamically calculate those values from the new_sal expressions, like this: 

select ename
,      evaluate_string(new_sal) new_salary
from   ( select ename
         ,      sal
              ||case job
                when 'SALESMAN'
                then '+'
                when 'CLERK'
                then '*'
                when 'PRESIDENT'
                then '/'
                else '-'
                end
              ||((sysdate - hiredate)/25) new_sal
         from   emp
       )
/
 

Note: this function is wide open to SQL Injection. Be sure to not allow other programs or users to make use of the function or add a test on the value of p_string – for example p_string may not contain the word FROM. Without such measures, someone might abuse the function with calls like:

declare
  l_result number;
begin
  l_result:= evaluate_string('(select sal from emp where ename=''KING'')');
end;
/

 thereby getting at our most secret data!  

We can now rewrite the query to include filter and order by clauses, finally getting what we want: the quickest route from A to D:

 

select graph.*
,      evaluate_string(graph.route_cost) distance
from ( select level number_of_transfers
       ,      sys_connect_by_path(distance,'+') route_cost
       ,      CONNECT_BY_ROOT from_point||sys_connect_by_path(to_point,'/') route
       from   point_connections pc
       connect
       by     nocycle prior to_point = from_point
       start
       with   from_point = 'A'  -- starting point of the trip
     ) graph
where regexp_like(route,'D$') -- D is the final destination
order
by    distance
/

 

 

The result:

Building a Route Planner with Oracle SQL - Finding the quickest route in a graph routeplan3

So clearly the quickest way to get from A to D is through C with a total distance of 7.

Eating a little more of the pudding

To prove our approach a little further, let's build a road from  E to D. Our graph(ic) now looks like this:

Building a Route Planner with Oracle SQL - Finding the quickest route in a graph routeplan4

The data in the table CONNECTIONS is changed like this:

Building a Route Planner with Oracle SQL - Finding the quickest route in a graph routeplan5

If we execute our query again – note: the exact same query as we are interested in finding an answer to the exact same question – we get a new answer. And already this answer is not entirely trivial. What would you think is the quickest way from A to D, just by looking at the picture of the graph? The answer:

Building a Route Planner with Oracle SQL - Finding the quickest route in a graph routeplan6

So going from A to B to E to D results in the quicket route.

I haven't got a clue as to the performance of this query with larger numbers of points and connections. I would expect we should do anything we can to only start routes at meaningful points and try to stop investigating a route as soon as we know it goes nowhere. Well, that is for later.

9 Comments

  1. Frank Zhou December 12, 2007
  2. Lucas Jellema October 22, 2006
  3. Steffen Mazanek August 22, 2006
  4. Gracie June 21, 2006
  5. Karl June 12, 2006
  6. salem June 7, 2006
  7. Justin Lokitz June 7, 2006
  8. Justin Lokitz June 7, 2006
  9. Jeroen van Wilgenburg June 6, 2006