Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

"Unique" Sudoku Puzzles 1

Status
Not open for further replies.

SamBones

Programmer
Aug 8, 2002
3,186
0
36
US
This is not a "puzzle" in the normal sense. More of a programming problem that's been rattling around in my head.

While playing Sudoku the other day, I started wondering if I had ever played that exact puzzle before. I play a lot of Sudoku (it's the go-to bathroom book in our house). If you are unfamiliar with it, just hit up Google.

Anyway, I was looking at a puzzle and noticed that there are two parts to it's "uniqueness". One is which positions have a hint number revealed, and two, what the number is in those positions. Here's an example puzzle just pulled randomly from the web.

[pre]
8 . . | . . 6 | 2 9 .
. 1 . | 3 5 . | . . 8
. 5 6 | 9 2 . | . 1 .
---------------------
. 8 . | . 6 2 | . 3 4
9 . 2 | . . . | 5 . 7
5 4 . | 8 7 . | . 2 .
---------------------
. 2 . | . 3 1 | 8 4 .
3 . . | . 9 4 | . 5 .
. 9 4 | 6 . . | . . 2
[/pre]
What I was wondering was, is there a concise way to create a code to uniquely identify this puzzle? A code that could be recorded and compared with other codes to identify a repeat puzzle? (not that I would actually do that, but just wondering HOW I could do that).

One twist, I consider this puzzle to be functionally identical to the previous example...

[pre]
9 . . | . . 7 | 3 1 .
. 2 . | 4 6 . | . . 9
. 6 7 | 1 3 . | . 2 .
---------------------
. 9 . | . 7 3 | . 4 5
1 . 3 | . . . | 6 . 8
6 5 . | 9 8 . | . 3 .
---------------------
. 3 . | . 4 2 | 9 5 .
4 . . | . 1 5 | . 6 .
. 1 5 | 7 . . | . . 3
[/pre]
That is, the numbers are different (all incremented by 1, 9 wraps to 1), but the pattern the numbers present are identical to the previous puzzle, so the means of solving it are the same. This is the part that was bugging me. Are they just publishing the same puzzle with the numbers moved around?

Even this puzzle would be functionally the same...

[pre]
8 . . | . . 6 | 2 1 .
. 9 . | 3 5 . | . . 8
. 5 6 | 1 2 . | . 9 .
---------------------
. 8 . | . 6 2 | . 3 4
1 . 2 | . . . | 5 . 7
5 4 . | 8 7 . | . 2 .
---------------------
. 2 . | . 3 9 | 8 4 .
3 . . | . 1 4 | . 5 .
. 1 4 | 6 . . | . . 2
[/pre]
In this puzzle, just the 9's and 1's are swapped, but the logic of solving it is the same as the first puzzle.

So, I've come up with a couple solutions for a code to uniquely identify the puzzle, but they are anything but concise. I'll save the description of my solution(s) for a bit, but I will say they use bitmaps and rely on the symmetry in the puzzle. These three example puzzles above all encode to the same code using my current best shot...

[tt]1161129923444.1234567172435123684379781933651864874823[/tt]

One thing about my attempt is that the "code" gets shorter for the more difficult puzzles since there are fewer hint numbers to be encoded.

So, I guess that means there are two parts to this "puzzle" post.

Part 1: Can you come up with a way of generating a code that uniquely identifies a Sudoku puzzle, accounting for number shifts that don't affect the solution pattern?

Part 2: Can you figure out my encoding scheme? (it's pretty simple brute force actually)


 
The 40 digit part of your encoding scheme seems obvious to me. The 13 digit part not so much.

The premise of your objective seems unattainable though. I've also wondered if I've solved the same puzzle more than once. The problem is that the initial values need not contain the same quantity of numbers, nor do they need to be the same values, nor do they need to be the same positions. This all but guarantees enough starting configurations for the same solution to be impossible to relate to one another by any means other than the solution. To prove my point on this, I solved the puzzle, juggled the numbers and chose different seed values. I kept the same number of seed values (40), but even this was probably not necessary. The puzzle below has the same solution to above.

[pre]. . .9| 3 . 6 | . . 8
. . . | . 8 9 | 3 . 1
3 8 6 | . . 1 | . 5 .
---------------------
9 . 5 | 8 6 . | 2 . 4
. . 7 | 5 . 4 | . 1 .
8 . 4 | 1 . . | 5 . 6
---------------------
6 7 . | 9 . . | 1 . .
. . . | . 2 . | 6 . 5
5 2 . | 6 1 8 | . . 7[/pre]

Side note, if you transcribed your puzzle correctly, this is a poor example as it does not have a unique solution, it actually has 4 solutions. My example is 1 of those solutions.





**********************************************
What's most important is that you realise ... There is no spoon.
 
While differnt seeding may end up with the same result you still may consider it a different puzzle. Let alone less seed values will make the puzzle harder, they may also give more different solutions, as you found 4.

I think the categorization should not be about the result pattern, but about a) Number of seed digits and b) their pattern both in positioning and in the given increments and permutations of the numbers. That's all you need to categorize.

I have to think about this more, I don't get the encoding, neither part of the 13.40 number.

I think of way to normalize a pattern. An idea to eleminate diffrent incrments would be to set the first digit found in a puzzle to 1 and decrement all other digits in th same manner (including the wrap around). It wouldn't make puzzles with permutations equal, though. And since geometrically mirrored and rotated puzzles would also be equal even for permutated puzzles this step of normalization would not find equal puzzles when mirrored or rotated. So it's a weak first thought only on the way to a normalized puzzle format.

Bye, Olaf.




 
If the puzzle can be solved four different ways, it may be because I got a bad puzzle. For the post I just Googled "sudoku" and copied the first puzzle I found. Plus I may have introduced typos since I was kind of hand drawing it to avoid having to insert images. I'll get a better example puzzle tomorrow.

I'll explain the encoded number in a bit. This puzzle just happens to encode to 13.40 because there are 40 seed values. A puzzle with fewer hint digits, or digits in different locations, would possibly generate fewer digits in the encoded value. But, it can be used to regenerate the puzzle, or be used to identify similar puzzles.

The 13.40 code could also be expressed as: 1161129923444.ABCDEFGAGBDCEABCFHDCGIGHAICCFEAHFDHGDHBC

From this I can reconstruct the puzzle.

 
Thanks for not fully spoiling that detail puzzle about your encoding.

I had another idea about normalizing that would work out for the solved puzzle only, though, eg you can normalize any permutations of digits by replacing the first row with 1 to 9 and permuating all other digits in the same manner as that replacement permutates the digits of the first row. Since increments with wraparound is indeed just a special case of permutations that's also already in it.

It's obviously not possible in that exact manner for the non solved puzzle. Your encoding starts of 1 to 7 or A to G indicate you're doing something similar anyway, but since there are still some rotations and flips to consider I wonder whether you really get all similar puzzles encoded in the same way.

Bye, Olaf.
 
Yeah, the second number in my code is a little obscure, just because I'm using the digits 1 through 9 to encode the digits 1 through 9, even though the 'code' digit 1 may not encode the 'solution' digit 1 and so on. That's why in the ABCD... version, the purpose of the second part of the code is a little more obvious.

Here's a better example taken from one of my Sudoku books. This example was chosen for a reason.

[pre]
. . . | . . . | . 3 6
4 . 2 | . 5 . | . . 8
1 6 . | 4 . . | . 2 .
---------------------
. 7 . | . . 8 | 2 . .
. . . | . 9 . | . . .
. . 3 | 7 . . | . 4 .
---------------------
. 8 . | . . 6 | . 9 3
3 . . | . 1 . | 5 . 2
7 9 . | . . . | . . .
[/pre]
This would encode to the following (11.27)...

[pre]15718715777.123456723486491836291175489[/pre]

Or, using the Alpha encoding...

[pre]15718715777.ABCDEFGBCDHFDIAHCFBIAAGEDHI[/pre]

And, if I wanted to actually be able to recreate this EXACT puzzle, I would use this code...

[pre]15718715777.364258164278293748693315279[/pre]

Given that 11.27 code, I can recreate the exact puzzle.

 
Encoding second part 40 digit example) [hide]The 40 digit part of the coding simply reads left to right 1 row at a time and assigns an arbitrary number/symbol to each value found.
The first 3 rows of numbers are actually 8629135856921. Each digit in the sequence is assigned a unique identifier for the order in which it appears. 8 appears first so it get a digit 1, 6 is second, 2 is 3rd, etc. The first 7 digits are unique from one another so they result in 1234567. the Eighth digit repeats the first digit so it is assigned a 1 again.

When the digits are replaced later on it does not matter which digit is chosen for each relative position. I believe there are 9! (362880) ways the numbers could be replaced and result in the same puzzle from a solving process perspective.

What this does not tell us is the positions of those numbers. Somehow I believe this is what the first 13 digits conveys but I haven't sussed that out as yet.
[/hide]


**********************************************
What's most important is that you realise ... There is no spoon.
 
Exactly!

Here's what the first part of the code is and how it's generated...

Given this as an example...

[pre]
. . . | . . . | . 3 6
4 . 2 | . 5 . | . . 8
1 6 . | 4 . . | . 2 .
---------------------
. 7 . | . . 8 | 2 . .
. . . | . 9 . | . . .
. . 3 | 7 . . | . 4 .
---------------------
. 8 . | . . 6 | . 9 3
3 . . | . 1 . | 5 . 2
7 9 . | . . . | . . .
[/pre]
As kwbMitel said, the second part of the code identifies the pattern of 'hint' digits that were given, but it doesn't provide information on where in the puzzle those are placed. That's the role of the first part of the code.

It's a bitmap of where in the puzzle hint digits were placed, but only of the first half. That's because the puzzle is point symetrical with the center point in the puzzle, so we don't really need to map the whole thing. I first go through the puzzle, left to right, top to bottom, and put a 0 if nothing is there, or a 1 if something is there, only down to the middle spot. That gives this (asterisks for cells I don't care about).

[pre]
0 0 0 | 0 0 0 | 0 1 1
1 0 1 | 0 1 0 | 0 0 1
1 1 0 | 1 0 0 | 0 1 0
---------------------
0 1 0 | 0 0 1 | 1 0 0
0 0 0 | 0 1 * | * * *
* * * | * * * | * * *
---------------------
* * * | * * * | * * *
* * * | * * * | * * *
* * * | * * * | * * *
[/pre]
This gives us the binary value 1110101000111010001001000110000001, which is a decimal 15718715777, the first part of the code.

So the two parts of the code are, 1) where hints are located, and 2) what the hints are.

 
The reason this puzzle was selected as a better example...

[pre]
. . . | . . . | . 3 6
4 . 2 | . 5 . | . . 8
1 6 . | 4 . . | . 2 .
---------------------
. 7 . | . . 8 | 2 . .
. . . | . 9 . | . . .
. . 3 | 7 . . | . 4 .
---------------------
. 8 . | . . 6 | . 9 3
3 . . | . 1 . | 5 . 2
7 9 . | . . . | . . .
[/pre]
...is because there are fewer hints (which makes the second part of the code smaller) and the first row starts with seven blanks (which makes the first part of the code smaller).
 
@ SamBones: Regarding the first part of the code

[Hide]I would never have figured that out BTW. Are you limiting your method to symmetrical puzzles or are you assuming that they all are symmetrical? You might notice that my variation on your puzzle above is not symetrical. As such, your coding method will not work for that puzzle I believe. I like your method but I would allow more flexibility by using 21 digits and converting to hexidecimal[/hide]

**********************************************
What's most important is that you realise ... There is no spoon.
 
I don't think I've ever seen an asymmetric clue Sudoku before. I know that some of the online Sudoku solvers will puke an error message if an asymmetric puzzle is input.

But you're right, while symmetry may be most common now, it looks like it was introduced in 1986 (according to Wikipedia).


I would think you'd just need to expand the first field bitmap to be 91 bits long. but that can generate a very large number. Then it's getting close to just listing out the entire puzzle like this (which I have seen online)...

[pre]000000036402050008160400020070008200000090000003700040080006093300010502790000000[/pre]

One of the original goals was to produce a key of minimal size.
 
@SamBones[hide]Re: I don't think I've ever seen an asymmetric clue Sudoku before.

I just spent the last 10 minutes looking at random printable Sudoku puzzles online and couldn't find a symmetrical one out of about 5 sites and 30 some puzzles.

Assuming your "91" to be a typo for 81.

Taking asymmetry into account, I think the effort required to covert to a shorter number is offset by the simplicity of just listing the 81 digits.[/hide]

**********************************************
What's most important is that you realise ... There is no spoon.
 
>15718715777.364258164278293748693315279

OK, that reveals you assign 1 or A to the first seed digit, B to the second one, etc. unless digits already have a code, of course, as same original digits have to have same code. Using letters is fine. That's what I thought of but didn't believe, when I said:

Your encoding will only cover same patterned puzzles, not sibling puzzles, which are just geometrically rotated, as their "first digit" would not be the same. Well, it's a weak argument, as the same applies to setting the first row of a result to 1-9 or A-I. So siblings due to geometrical symmetries have to be considered seperately anyway.

I see the wikipedia link shining through (a known isssue of the spoiler TGML tag), I know it has a lot of considerations about the possible number of puzzles etc.

There are also row swap operations, that change the geometry of the digit distribution in the cells in non symmetrcal ways and still can be considered the same nature of the puzzle, for example the considerations for row 1-3 are the same, no matter how you otde them.

From the perspective of the normalization of the solution you could therefore sort each three rows to aggregate all the row permutations into one puzzle solution. You may again apply that to the unsolved seed outset considering all unfilled cells as 0, but I doubt this will lead to a real and straight forward normalization of all similar puzzles.

So to categorize puzzles we need other rules and ideas than for solution categorization.

I still haven't figured out the first digits of your encoding, they would need to tell something about the positioning of the digits. Since that is a binary pattern you may just have decoded that in a 81 digit binary number, that could be converted as up to 25 decimal places decimal number.

Your encoding scheme does not fit its purpose, but that's also why you said you were coming here to ask for inspirations and ideas. I don't yet have a better idea, though.

Bye, Olaf.
 
@SamBones - An Alternative Method

I used simple substitution via a Matrix - This comes from my cipher experience

the numbers 1-9 are substituted by A-I

Numbers preceded by:
[pre][ul]
[li]0 Spaces = A-I[/li]
[li]1 Space = J-R[/li]
[li]2 Spaces = S-Z and -[/li]
[li]3 Spaces = a-i[/li]
[li]4 Space = j-r[/li]
[li]5 Spaces = s-z and +[/li]
[li]etc as needed[/li]
[/ul][/pre]
This results in your 40 seed values becoming HoBISLEhNFIBSZXBLDIKNPEDQGTTUAHDLiDN-DFk



**********************************************
What's most important is that you realise ... There is no spoon.
 
Oops, yeah, 81, not 91. Pardon my fat fingers. [bigsmile]

kwbMitel said:
Taking asymmetry into account, I think the effort required to covert to a shorter number is offset by the simplicity of just listing the 81 digits.

Well, just listing the 81 digits only encodes that exact puzzle. To get similar puzzles (same puzzle (pattern and hints) but with the digits in different order), it would need to look like this...

[pre]0000000ABC0D0E000FGB0C000D00H000FD000000I000000AH000C00F000B0IAA000G0E0DHI0000000[/pre]

Actually, that's not bad. Then maybe to compress the zeroes out, use numeric digits. The numeric digit represents how many zeros, or blanks. Something like this...

[pre]7ABC1D1E3FGB1C3D2H3FD6I6AH3C2F3B1IAA3G1E1DHI7[/pre]

That's pretty concise, encodes to be able to identify similar puzzles, and will allow asymmetric puzzles. It's size will vary depending on number of hints in the puzzle. It would also be very easy to code (Java, C, etc). I'm liking this one.

@kwbMitel, I don't quite understand your substitution matrix, but I assume it's something similar to this encoding.

@Olaf, wow, I hadn't considered row swap (and of course column swap). But that does make complete sense that it doesn't actually alter the puzzle topologically. I'll have to roll this one around in my head for a bit.

OlafDoschke said:
Your encoding scheme does not fit its purpose, but that's also why you said you were coming here to ask for inspirations and ideas.

Actually my first encoding scheme does fit my purpose (as far as I can tell). It just seemed very clumsy and brutish, and produced a very large 'key'. I was looking to optimise it, or replace it with something more elegant or clever.

And my hidden reason for coming here was that I love the Puzzles for Programmers Forum and it's been a long time since anything was posted here to play with. [bigsmile]

 
@kwbMitel, I don't quite understand your substitution matrix, but I assume it's something similar to this encoding.

Yes and no.

Using your method, the encoding length will vary depending on how the spaces are distributed. A 40 clue puzzle where each clue does not have a neighbour would result in an 81 character encoding.

My method embeds the spaces in a single character in such a way that the number of characters will always match the number of clues given. When I encoded my example, I used the actual digits for encoding so that particular one is an exact match for the puzzle. It is just as easy to encode by order of digit found much like your original dot40 code

Here is an example of a matrix for encoding
[pre] Spaces before digit
0 1 2 3 4 5 6+
1st digit A J S a j s etc
2nd digit B K T b k t
3rd digit C L U c l u
4th digit D M V d m v
5th digit E N W e n w
6th digit F O X f o x
7th digit G P Y g p y
8th digit H Q Z h q z
9th digit I R - i r +[/pre]

I'm not sure how many columns may be necessary in this matrix but 9 seems like a reasonable number that would cover most puzzles

Using your first puzzle(s) as an example, it would encode as follows for non-specific numbers

[pre]
8 . . | . . 6 | 2 9 .
. 1 . | 3 5 . | . . 8
. 5 6 | 9 2 . | . 1 .
---------------------
. 8 . | . 6 2 | . 3 4
9 . 2 | . . . | 5 . 7
5 4 . | 8 7 . | . 2 .
---------------------
. 2 . | . 3 1 | 8 4 .
3 . . | . 9 4 | . 5 .
. 9 4 | 6 . . | . . 2
[/pre]

[pre]The 8 is the 1st digit and has 0 preceding spaces so that converts to an A
The 6 is the 2nd digit and has 4 preceding spaces so that converts to an k
The 2 is the 3rd digit and has 0 preceding spaces so that converts to an C
The 9 is the 4th digit and has 0 preceding spaces so that converts to an D
The 1 is the 5th digit and has 2 preceding spaces so that converts to an W
The 3 is the 6th digit and has 1 preceding spaces so that converts to an O
The 5 is the 7th digit and has 0 preceding spaces so that converts to an G
The 8 is the 1st digit and has 3 preceding spaces so that converts to an a
The 5 is the 7th digit and has 1 preceding spaces so that converts to an P
The 6 is the 2nd digit and has 0 preceding spaces so that converts to an B
The 9 is the 8th digit and has 0 preceding spaces so that converts to an H
The 2 is the 3rd digit and has 0 preceding spaces so that converts to an C
The 1 is the 5th digit and has 2 preceding spaces so that converts to an W
[/pre]
ETC...


AkCDWOGaPBHCW etc...

**********************************************
What's most important is that you realise ... There is no spoon.
 
Oh, awesome! That's really clever.

Mine was encoding to approximately 2x the number of hints (depending on leading/trailing spaces or not).

[bigsmile]

 
Yes, the scheme using more encoding characters obviously wins in it's length, but less characters can be compressed, too. Sam, your encoding wouldn't need any 0, so you could encode 1 space with 0, 2 with 1 etc and go up to 10, anyway the encoding of digit+spaces in one character is even better, but makes me wonder what chars you'd need to use for puzzles with vast spaces, you'd need something to denote spaces without any digit, so eg : might mean something like 6 spaces without a digit, more spaces following, so a :A would mean first digit after 6 spaces and :J first digit after 7 spaces or ::A first digit after 12 spaces, etc. This way you can stop at the chart you already have for up to 5 spaces.

Bye, Olaf.
 
Olaf, read through kwbMitel's encoding. It takes into account spaces and only results in a code with as many characters as the number of hints.

As far as puzzles with "vast spaces", they can never be too vast. According to the Wikipedia page...

[URL unfurl="true" said:
https://en.wikipedia.org/wiki/Sudoku[/URL]]...the fewest givens that render a solution unique—was proven to be 17 in January 2012 (confirmed in September 2013). A number of valid puzzles with 17 givens have been found for the standard variation without a symmetry constraint, by Japanese puzzle enthusiasts, and 18 with the givens in rotationally symmetric cells.

With 81 squares, and 17 hints, the maximum blank squares is 63. With those evenly distributed above and below the midpoint, due to symmetry, the maximum possible run of blanks would be 32. Like this ('H' for Hint).

[pre]. . . | . . . | . . .
. . . | . . . | . . .
. . . | . . . | . . .
---------------------
. . . | . . H | H H H
H H H | H H H | H H H
H H H | H . . | . . .
---------------------
. . . | . . . | . . .
. . . | . . . | . . .
. . . | . . . | . . .[/pre]

But that's not a solvable puzzle. There has to be some distribution of hints so that there aren't three blank rows on top and bottom.

I have seen puzzles with blank squares, and puzzles with individual blank rows or columns, but there is always a number of hints in that row/column group to allow you to solve the blanks. So I don't think you will ever encounter a very large number of contiguous spaces "in the wild".
 
Well, I have understood the schema very well.

It doesn't need to end here, but 5 spaces isn't much, is it? So yo surley would need some more columns. And while there are more than just big and small letters and digits you need 9 more characters for each further space you want to be able to encode. The overall encoding shouldn't be too unreadable, should it? Even if it's just a code you would generate to compare puzzles. The idea to use the digits to encode spaces only again would be feasable. Since you only need them to represent more than 5 digits (in the scheme so far) 0 could mean 6, up to 9 meaning 15 spaces. And if two digits are used they can of course represent all numbers up to 99, so you'd never need more.

It means some puzzles code would be longer than others, but does that matter?

Bye, Olaf.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top