Routeplanners have always been somewhat magical in my eyes. Finding the quickest route from A to B in a graph of thousands of points and connections. I have tried to imagine how to approach such as problem. I have even come up with several solutions that I think might help to implement a performant system for unveiling the fasted itinerary between two points in the graph.
The other day, while preparing for the Quiz on SQL and PL/SQL that Alex and I will do during ODTUG 2006 in two weeks time, I got hooked again on this same challenge. This time, the challenge translated to: write a SQL query that will find the quickest path between two points.
In this article I will describe what solution I came up with this time, using the Oracle Hierarchical Query and some of its more recent operators. I will demonstrate a query that will give us the quickest route between any two points in the graph shown below:
The starting situation is the graph depicted above. This is the graphical representation of the points and connections. The same information is held in two tables: POINTS and CONNECTIONS.
Their create scripts are like this:
create table points ( id number(5) , name varchar2(30) ) / create table connections ( from_id number(5) , to_id number(5) , distance number(5) ) /
They contain the data for the graph shown above, with five points and a total of six connections. Their contents are shown below:
The question I would like to answer with a SQL query is: what is the shortest route – in terms of distance – from A to D. Of course with this simple graph we can easily tell ourselves by merely glancing at the graph. It is from A to C to D, a total distance of 7. But can a query give me the same answer?
I create a helper-view to simplify the query we will write later on. This view joins together points (source), connections and points (destination), so that it will be easier to write the hierarchical query. Note: we could also have used an in-line view with the same query:
create or replace view point_connections as select from_point.name from_point , to_point.name to_point , conn.distance distance from points from_point , connections conn , points to_point where conn.from_id = from_point.id and conn.to_id = to_point.id /
This view contains all connections. Note: I have assumed a directed graph where a connection from A to B does not necessarily mean there is also a connection from B to A. It would be simple to include those reverse connections as well, for example by extending the view definition with a UNION ALL and the same query with the from_id and to_id switched in the WHERE clause.
The query I intend to write will treat the graph as a tree. It will perform a CONNECT BY query, starting at Point A and using the point_connection to connect by prior to_point = from_point:
select level number_of_transfers , sys_connect_by_path(distance,'+') route_cost , CONNECT_BY_ROOT from_point||sys_connect_by_path(to_point,'/') route from point_connections pc connect by nocycle prior to_point = from_point start with from_point = 'A' -- starting point of the trip /
This query will return all routes starting at A , including the ones that do not end at D. Note that NOCYCLE was added to prevent routes from circling endlessly through the graph. Suppose that A connects to C connects to B connects back to A – a very real possibility. In that case, without the NOCYCLE, we would have blown up the query. Since in this query all routes have to start at A, the expression CONNECT_BY_ROOT from_point is a little overdone, as it will always return 'A'. However, to make the query as generic as possible, I have used CONNECT_BY_ROOT from_point to return for each record the value of from_point in the first step of the route.
For each route it will return an indication of the total distance, through the route_cost column that contains an expression seemingly adding together the distances for all connections travelled in the returned route:
NUMBER_OF_TRANSFERS ROUTE_COST ROUTE ------------------- --------------- --------- 1 +3 A/B 2 +3+5 A/B/D 2 +3+2 A/B/E 3 +3+2+1 A/B/E/D 1 +6 A/C 2 +6+1 A/C/D 2 +6+2 A/C/E 3 +6+2+1 A/C/E/D 8 rows selected.
We need to filter the above query result on only the routes that end at D. Subsequently, we need to order them by route_cost, returning the shortest route first. The latter is achieved using a simple PL/SQL function that calculates the distance based on the route_cose expression, simply by performing a little piece of Dynamic SQL. Note: this trick of turning the value from a SYS_CONNNECT_BY_PATH expression into a calculated value is described in Mastering Oracle PL/SQL: Practical Solutions by Connor McDonald, Chaim Katz, Christopher Beck, Joel R. Kallman, David C. Knox.
The PL/SQL function used for calculating the ROUTE_COST is like this:
create or replace function evaluate_string ( p_string in varchar2 ) return number is l_result number; begin execute immediate 'select '||p_string||' from dual' into l_result; return l_result; end evaluate_string; /
The p_string that is passed into this function is assumed to contain a valid SQL expression. This expression is pasted into a simple query against dual. The result is returned. Usage for example:
select evaluate_string('5*4') from dual / EVALUATE_STRING('5*4') ---------------------- 20
Which is quite silly as it is the exact same thing as
select 5*4 from dual /
The value of this little function is apparent when the query returns an expression we cannot predict. For example:
select ename , sal ||case job when 'SALESMAN' then '+' when 'CLERK' then '*' when 'PRESIDENT' then '/' else '-' end ||((sysdate - hiredate)/25) new_sal from emp /
To know the values for new_sal, we need to dynamically calculate those values from the new_sal expressions, like this:
select ename , evaluate_string(new_sal) new_salary from ( select ename , sal ||case job when 'SALESMAN' then '+' when 'CLERK' then '*' when 'PRESIDENT' then '/' else '-' end ||((sysdate - hiredate)/25) new_sal from emp ) /
Note: this function is wide open to SQL Injection. Be sure to not allow other programs or users to make use of the function or add a test on the value of p_string – for example p_string may not contain the word FROM. Without such measures, someone might abuse the function with calls like:
declare l_result number; begin l_result:= evaluate_string('(select sal from emp where ename=''KING'')'); end; /
thereby getting at our most secret data!
We can now rewrite the query to include filter and order by clauses, finally getting what we want: the quickest route from A to D:
select graph.* , evaluate_string(graph.route_cost) distance from ( select level number_of_transfers , sys_connect_by_path(distance,'+') route_cost , CONNECT_BY_ROOT from_point||sys_connect_by_path(to_point,'/') route from point_connections pc connect by nocycle prior to_point = from_point start with from_point = 'A' -- starting point of the trip ) graph where regexp_like(route,'D$') -- D is the final destination order by distance /
The result:
So clearly the quickest way to get from A to D is through C with a total distance of 7.
Eating a little more of the pudding
To prove our approach a little further, let's build a road from E to D. Our graph(ic) now looks like this:
The data in the table CONNECTIONS is changed like this:
If we execute our query again – note: the exact same query as we are interested in finding an answer to the exact same question – we get a new answer. And already this answer is not entirely trivial. What would you think is the quickest way from A to D, just by looking at the picture of the graph? The answer:
So going from A to B to E to D results in the quicket route.
I haven't got a clue as to the performance of this query with larger numbers of points and connections. I would expect we should do anything we can to only start routes at meaningful points and try to stop investigating a route as soon as we know it goes nowhere. Well, that is for later.
Here is a pure SQL solution for finding the shortest path (distance) between the source and destination
http://www.jlcomp.demon.co.uk/faq/shortest_distance.html
Thanks,
Frank
We got a reference to this article on the SQL Server discussion forum: http://forums.microsoft.com/MSDN/ShowPost.aspx?PostID=800277&SiteID=1. I doubt however this will work in SQL Server.
Lucas
How does your algorithm scale?
Using the above code on a table with each node having distance data for all the other nodes around, the result show that the route could go back to the original node – doesn’t ” nocycle” take acre of that? Here is the a sample of the result
3 +3+3+3 StRita/JCampbl/StRita/SisTheresa 9
3 +3+5+1 StRita/JCampbl/BrockHS/SisTheresa 9
4 +3+3+5+1 StRita/JCampbl/StRita/BrockHS/SisTheresa 12
5 +3+3+6+1+1 StRita/JCampbl/StRita/JrHigh/BrockHS/SisTheresa 14
As you can see, St Rita appears twice in the route. Any suggestions would be a great help. Thanks
I am very impressed about what via SQL can be done. Even Lucas Example is an easy one it could be an idea for using it in an application. I see always an area of practical use without the demand of Oracle Spatial.
Greetings
Karl
Ever looked at the network data model that is supported by oracle Spatial Option?
The link in the last post for the Oracle10g Spatial Network Model should be: http://www.oracle.com/technology/products/spatial/pdf/10gr2_collateral/spatial_twp_ntwrkdatamod_10gr2_0512.pdf
Lucas…although I think what you’ve done here is quite impressive, Oracle10g already has this functionality (and quite a bit more) and it’s called Oracle Spatial. In fact, Orace10g Spatial has actual multimodal route planning as well as complex network (graph) analysis. Check out http://www.oracle.com/technology/products/spatial/pdf/10gr2_collateral/spatial_twp_ntwrkdatamod_10gr2_0512.pdff for more. Again, your work is pretty fantastic, but you may find that you are in fact trying to reinvent the wheel.
Really interesting that this is possible with a database. But I think it’s more efficient to write a small program to solve this problem. The problem you described is called the shortest path problem http://en.wikipedia.org/wiki/Shortest_path_problem
The most famous algorithm to solve this problem is developed by Dijkstra.
The graph in the article is too simple to show the benefit of the algorithm, so I created my own to explain the basics of the algorithm:
http://technology.amis.nl/blog/wp-content/images/routeplangraafjeroen2.gif
First we have to pick a random path to start with. It’s not completely random (but if you implement this with a computer it is). Let’s take ABEG (yes, I know ADG is the shortest)
The length of ABEG is 5. This is the shortest path so far. From B we can also branch to H. But 1+5>5, that path is to long, no matter what it does after point H. So we can stop here.
The algorithm also deals with cycles. Take ABEIJ for example. We can loop infinitely in the cycle EIJ, but at a certain moment it’s longer than 5 and we can stop (note that you still need some kind of cycle detection if this is the branch you started with, but a simple maximum length would do the trick because you can guess that the shortest path is not more than 20)
Let’s take the ADG now. ADG is length 3, it’s less than 5 so our new shortest path is 3. When we take the branch AC it’s 3, so we can stop immediately and conclude the shortest path is 3.
But the article still can be very useful for other problems or when you need the lengths of all routes (ie. for routing algorithms of computer networks)