Links to
Sudoku; or algorithms
1,
2,
3, or
4, or
5,
userguide
or download
There comes a time when the monotony of work
needs a little refreshing diversion. Some of us like to relax (or
frustrate) the time a way with a Sudoku puzzle which is fast
becoming pandemic craze.
I enjoy solving these puzzles myself, but with
a lust for a challenge, I could not resist the opportunity to
develop a Sudoku puzzle solver written entirely in Oracle SQL and
PL/SQL.
By developing this solver I have also
demonstrated how it is possible to develop a relatively simple
solution to solve a reasonably difficult problem.
Others have now joined in to help enhance the
usability of the solver, with special thanks to
Patrick
Barel who has developed a very simple but effective Windows
frontend in Delphi using
Qnxo, which is now
included in the download.
The solver will solve most of
The Times Fiendish puzzles. The solver application does not
support guessing, so you may find puzzles that it will not solve,
though there is an argument that a Sudoku puzzle should be
solvable without guessing.
I must point out that I am not a Sudoku fanatic
and not trying to present myself as someone who knows much about
Sudoku puzzles. There are plenty of websites that do that with a
huge amount of detail and attention. I am merely demonstrating how
SQL can be used to solve an everyday problem.
The algorithms I have used in my solution were
inspired by a very interesting article on Sudoku puzzle solving
written by
Paul Stephens, a UK Computer Journalist.
The Concepts
The
approach used to solve a Sudoku puzzle with a computer is not as
complex as you might think. However, it is probably not one you
would necessarily contemplate using to solve the puzzle manually, or
at least until you get stuck.
The approach is based on determining all possible candidates for a
particular cell and then applying some basic rules and pattern
matching to eliminate redundant candidates. As candidates get
removed, remaining candidates become certainties, either by being:
• The
remaining candidate in a cell.
• The candidate with the last remaining value in a box, row or
column.
So the
algorithms are doing two jobs; identifying certainties; and
eliminating candidates. It is a just a case of iterating through
these two steps applying the available algorithms, until no more
candidates remain.
It is worth stressing that the solver does not support guessing, so
some tricky puzzles may not be solvable, though there is an argument
that a Sudoku puzzle should be solvable without guessing.
Solution Design
I have chosen SQL and Oracle PL/SQL to solve
the puzzle, firstly because it is the programming language I use
every day, and secondly SQL is probably the most suited language for
implementing set theory solutions. The solver is implemented into a
single stored procedure supported by a number internal procedures
and functions. In hind sight, it would have been probably more
correct to have developed it as a package, but since it is not
really loosing out on the benefits gained by packages, I have kept
it as a stored procedure, unless it evolves into something more
sophisticated.
The algorithms are written purely in SQL, and
PL/SQL is only used to control the iterations and solution output.
Any other procedural language could be used to implement this
solution, such as java, vb, php, Delphi, etc, with the same SQL
embedded. However, the SQL itself is Oracle specific (mainly because
of the DECODE function), but may adapt quite easily to other
flavours of SQL if they can support the same degree of subquerying.
Being a SQL solution the design consists of
some database tables. It is a very simple design with one table
holding dynamic data i.e. the answers table, and the others hold
static reference data such as cells and combinations.
Puzzle Numbers  This table holds a list
of all the possible numbers in a cell. For a standard Sudoku puzzle,
this is only 9 rows holding values 1 to 9.
Cells  This table holds an entry for
each cell in the puzzle using row id (horizontal line of 9 cells)
and column id (vertical line of 9 cells) as the primary key, and box
id (3x3 cells) as an attribute, e.g. cell 2,3 is in box 1; cell 6,6
is box 5, etc.
The table only holds 81 rows of data for a
standard Sudoku puzzle. The table definition is:
Name
Null? Type
  
ROW_ID
NOT NULL NUMBER
COL_ID
NOT NULL NUMBER
BOX_ID
NOT NULL NUMBER
Combinations  This table only contains
static data and used for the set based algorithm. It lists every
member number of a given combination set. A combination set is a
group of unique numbers that form a set with n members, e.g. numbers
3, 5, and 7 belong to the set "357", and 5, 6, and 8 all belong to
the "568" set, 4 and 6 belong to the "46" set and so on. Since these
are not permutations, i.e. where the order of the numbers is
significant, set "234" is the same as "432", so in order to keep set
identifier consistent, it always represented with numbers in
sequence, i.e. use "234" rather than "432".
Name
Null? Type
  
SET_ID
NOT NULL NUMBER
PUZZLE_NUMBER
NOT NULL NUMBER SET_SIZE
NOT NULL NUMBER
The primary key of this table is the
combination of set id and member number. The set size is there to
filter on combination sets of a particular size. These are some
examples of the combinations table entries:

Set Id 
Puzzle Number 
Set Size 


34 
3 
2 


34 
4 
2 


123 
1 
3 


123 
2 
3 


123 
3 
3 


124 
1 
3 


124 
2 
3 


124 
4 
3 

There is an entry for every possible set of
combined numbers, from sets of 2 numbers through to 5 numbers, i.e.
12 through to 56789.
Answers  This table is the only dynamic
table and holds the starting numbers to the puzzle plus all initial
candidates to any cells not containing a solved number.
Name
Null? Type
  
PUZZLE_ID
NOT NULL NUMBER
STEP_NO
NOT NULL NUMBER
ROW_ID
NOT NULL NUMBER
COL_ID
NOT NULL NUMBER
PENCIL_MARK_IND NOT
NULL NUMBER
ANSWER
NOT NULL NUMBER
BOX_ID
NUMBER
Candidates are referred to as pencil marks,
where a pencil mark goes through several states as indicated by the
pencil_mark_ind flag, which has the following values:

1 
The pencil mark has been rubbed out. 


0 
The pencil mark is now a certainty (all starting numbers are
set to this value). 


1 
Normal pencil mark setting. i.e. candidate. 


2 
A temporary pencil mark setting identified by an algorithm
step. This can then be used to identify other candidates to
be rubbed out. 

The initial puzzle clues (which have a pencil
mark = 0) are loaded via an Excel spreadsheet that I have put
together which generates a series of insert statements from numbers
input into a grid. The data is inserted into a staging table called
puzzle_load, and copied to the answers table during the
execution of the solver.
Algorithms
The solver only uses four different algorithms.
A typical "The Times Fiendish" Sudoku puzzle usually needs to use
all of the algorithms to be solved, however "The Times Difficult"
can be solved usually with just using the first two.
The algorithms are split into two categories 
those that identify certainties in cells (first two) and those that
eliminate candidates (last two).
The algorithms are explained in more detail in
the following pages:
1. Singles  Cell
2. Singles Box
3. Cross Hatching
4. Partial Members Set
5. Full Member Set
The iterations of these algorithms are not
applied strictly in the sequence shown above. The Singles algorithms
are always applied when any candidates have been removed by the
other algorithms. Since the singles algorithms can remove candidates
themselves, the algorithm is reiterated until no more certainties
are identified. Then the next algorithm in sequence can be applied
and iterated continuously until no more candidates are eliminated,
followed by the singles algorithms again.
Conclusion
This Sudoku Solver solution has resulted in a
fairly simple design and very little coding, and demonstrates how
powerful SQL is in solving complex problems. It is far too easy for
a programmer to go off and develop a solution using a procedural
based programming language, one that they may be more familiar with
or one they consider more in fashion with the latest technological
preachings, than attempting to push the programming logic into SQL.
I am not saying all problems are better solved with SQL, but I often
find solutions far more complex than they need to be, mainly because
of the reluctance to use SQL efficiently and effectively, or
reluctance to accept that SQL is a superior programming option.
