Writing a Word Search puzzle solver in SQL

11

Brace yourself for yet another pretty useless demonstration of the unexpected wonders of SQL as well as my apparent lack of meaningful things to do with my life.

Undoubtedly, you are familiar with Word Search puzzles, the type of puzzle that present a matrix with a mash of letters in which specific words have been hidden. For example:

This Word Search contains the names of several countries (such as italy, holland, poland, spain, france, japan, togo, peru). If you look closely, you will find them:

In this article, I will show you – and I know this is something you have been waiting for – how a SQL Query can solve these puzzles for you:

....

The first in solving our Word Search puzzles is expressing them in a format we can handle in a SQL query. Of course we can store them in database tables, but we can take a short-cut, using a Collection Type and the Table Operator.

We define the type WORD_LIST

create type word_list is table of varchar2(500)/

and use it to instantiate in situ in our query a collection of strings, each representing one row in the Word Search Matrix:

with matrix as( select rownum seq  ,      lower(column_value)  txt  from   table ( word_list( 'FYYHNRD'                          , 'RLJCINU'                          , 'AAWAAHR'                          , 'NTKLPNE'                          , 'CILFSAP'                          , 'EOGOTPN'                          , 'HPOLAND'                          )               ))

The above in-line view is the initial section our Word Search Solving Query. The result of this query is:

The basic algoritm of the Word Search Solver is like this:

  • iterate over all cells in the matrix (all letters in the diagram)
  • from each cell, start a search in all eight directions: up and down, left and right and four diagonal combinations of up & down and left & right
  • for each all eight searches, check if one of the looked for words is found; if so, report the word, the cell where it starts and the direction in which it was found

The key to implementing this algoritm is an hierarchical query – as we shall see in a moment.

For the matrix in line view demonstrated above, our next step is an in-line view to visit all individual cells (letters) in the matrix. This is relatively simple:

, cells as  ( select distinct         seq, col  ,      substr(txt, col, 1) cell  from   matrix         cross join         ( select rownum col           from   matrix         )  )

Going forward, we have to explore eight directions. Directions are defined by two parameters that specify the movement along the horizontal and the vertical axis. The parameters can be -1, 0 or 1 – meaning [left, no horizontal movement, right] and [up, no vertical movement, down] respectively. Note that they cannot both be 0, as that would mean no exploration at all. These directions are generated in another in-line view in a somewhat contrived way; here is definitely room for improving the query.

, direction as  ( select distinct           sign(rownum - 2) dir    from   ( select count(*)             from   dual             group             by cube (1,2)           )    ), directions as   ( select h.dir hor    ,      v.dir ver    ,      case h.dir           when 1 then 'left'           when -1 then 'right'           end horizontal    ,      case v.dir           when -1           then 'down'           when 1           then 'up'            end vertical    from   direction h           inner join           direction v    on     ( not (h.dir=0 and v.dir=0) )  )

However, it does the job. Note that the directions in-line view returns all eight routes, both expressed in -1/0/1 notation as well as human-readable format.

The final in-line view we use is the one that provides the list of words to search for. In the example, we are looking for the names of countries. We inject these names in the following way into our query:

, words as   ( select column_value word    from   table( word_list('italy', 'holland' , 'poland' ,'france'                           ,'canada', 'japan','india','china','togo','peru'                           )                 )   )

Here of course we use the same ‘trick’ as before of generating rows on the fly by dynamically, inside the query, instantiating a collection and accessing it through the TABLE operator.

Now that all preparations have been made – the inline views that prepared all data used in the query (and note that we do not query any data at all from any database table or view) – we can start looking for answers.

The real heart of the query is this piece:

from   ( select c1.*         ,      hor         ,      ver         ,      horizontal         ,      vertical         from   cells c1                cross join                directions        )        cross join         wordsstart with cell = substr(words.word,1,1)connect by prior col-hor  =  col            and            prior seq-ver = seq            and            prior hor = hor            and            prior ver = ver           and           level < length(words.word)+1           and           cell = substr(words.word, level,1)

Here we join cells with directions – giving us one row to process for each cell for each of the eight directions. Then we join with words, to start an exploration from each cell in each of the eight directions for all of the search words. Alternatively, we could perhaps find all paths from all cells and try to join those paths to the search words. I leave that as an exercise to the reader.. (as lazy authors tend to write in their books).

The heart of the matter is next. We only start exploring from a cell if it contains the first letter of the current search word, why else bother. Then we start an hierarchical query, going from one cell to the next depending on the current direction, as specified by hor and ver (for horizontal and vertical movement). Suppose we are at cell row 4, column 3 and hor and ver are 1 (to the left) and -1 (going down). The hierarchical condition prior col-hor = col states that for the next cell in the hierarchical path its col value should be equal to the prior col value (which is 3) minus hor (which is 1) – so the next cell’s col value should be 2. Following similar reasoning, the next cell’s seq (or row) value should be 4 – (-1) = 5. So the next cell in the hierarchical path is row 5, column 2 – indeed going down and to the left.

The next two conditions ensure that we continue in the same direction from each subsequent cell in the hierarchical path. The last two conditions ensure that we do not keep exploring an obviously fruitless path: if the we have traversed more levels than the search word has characters we can stop the search and if the current cell does not contain the character it should for the current search word, we too can quit.

Adding a thin additional layer on top of this query, in order to filter all rows that did not find a search word, the final Word Search query becomes the following:

with matrix as( select rownum seq  ,      lower(column_value)  txt  from   table ( word_list( 'FYYHNRD'                          , 'RLJCINU'                          , 'AAWAAHR'                          , 'NTKLPNE'                          , 'CILFSAP'                          , 'EOGOTPN'                          , 'HPOLANDI'                          )               )), cells as  ( select distinct         seq, col  ,      substr(txt, col, 1) cell  from   matrix         cross join         ( select rownum col           from   matrix         )  ), direction as  ( select distinct           sign(rownum - 2) dir    from   ( select count(*)             from   dual             group             by cube (1,2)           )    ), directions as   ( select h.dir hor    ,      v.dir ver    ,      case h.dir           when 1 then 'left'           when -1 then 'right'           end horizontal    ,      case v.dir           when -1           then 'down'           when 1           then 'up'            end vertical    from   direction h           inner join           direction v    on     ( not (h.dir=0 and v.dir=0) )  ), words as   ( select column_value word    from   table( word_list('italy', 'holland' , 'poland' ,'france'                           ,'canada', 'japan','india','china','togo','peru'                           )                 )   )select seq + ver * (length(word)-1)  "Starting at row",      col + hor * (length(word)-1)  " and Column ",      horizontal  "Going",      vertical    " ",      cast(word as varchar2(25))        "you will find the word"from   ( select distinct                cell         ,      seq         ,      col         ,      hor         ,      ver         ,      sys_connect_by_path(cell,'*') p         ,      word                  ,      horizontal                  ,      vertical         from   ( select c1.*                  ,      hor                  ,      ver                  ,      horizontal                  ,      vertical                  from   cells c1                         cross join                         directions                )                cross join                 words         start with cell = substr(words.word,1,1)         connect by prior col-hor  =  col                     and                     prior seq-ver = seq                     and                     prior hor = hor                     and                     prior ver = ver                    and                    level < length(words.word)+1                    and                    cell = substr(words.word, level,1)        )where  translate(p,'Q*','Q') = word/

Let’s try a somewhat more challenging puzzle as well:

The solution – searching for terminology very much part of our daily life at AMIS – looks like this:

The query outlined above returns with:

The correct answer – after four minutes. It is clear that the search time increases rapidly with a larger set of search words and especially a larger matrix.

Resources

Download the queries demonstrated above: wordSearchQueries.txt.

Share.

About Author

Lucas Jellema, active in IT (and with Oracle) since 1994. Oracle ACE Director for Fusion Middleware. Consultant, trainer and instructor on diverse areas including Oracle Database (SQL & PLSQL), Service Oriented Architecture, BPM, ADF, Java in various shapes and forms and many other things. Author of the Oracle Press book: Oracle SOA Suite 11g Handbook. Frequent presenter on conferences such as JavaOne, Oracle OpenWorld, ODTUG Kaleidoscope, Devoxx and OBUG. Presenter for Oracle University Celebrity specials.

11 Comments

  1. Incredible, it reminds me of the “obfuscated C-code contest”, where one of the participants wrote the towers of Hanoi (recursive) using C pre-processor directives only.

    Maybe it is time to start an obfuscated SQL-query contest too. On the other hand, this would be a boring contest, as we would know the winner in advance ;-)

  2. And now I think about it, another solution for your “direction table” using the word_list type

    , direction as
    ( select to_number column_value ) dir
    from table( word_list( ‘-1′, ‘0’, ‘1’ ) )
    )

    Anton

  3. The algoritm is very brilliant! And its simple at the same time:

    from ( select *
    from cells
    , directions
    ) x
    connect by

    And the idea to use SQL to solve a puzzle like this is brilliant too, it has only one small drawback for real puzzles: it needs the “words” to search for :)

    But the thing I like the best is using the table type for generating some rows with data: select … from table ( word_list( …

    Anton

  4. Anton,

    As always you have improved way beyond my original – and apparently not only suboptimal but even faulty – code. Thanks for the improvements – it is faster and more robust.

    I still pride myself in the core of the algoritm…

  5. A brilliant post indeed. As an interested reader I did tried to understand everything. And offcourse I did tried to solve it the other way around, "find all paths from all cells and try to join those paths to the search words". On my computer runs the query from Lucas in about 25 seconds. A solution which "joins all paths from all cells" runs in less than one second!!. And it finds more solutions too. As I am a lazy author too, I did not bother to find the differences or whats causing it. Anton

    Download the code here: puzzle.sql

  6. How about a query to list all the letters that weren’t used – since most word search puzzles hide a final word or phrase in all the leftover spaces. Or is that another “exercise for the reader”? :)