THE NP-COMPLETE ARCADE

Here, for your amusement and experimentation, is a set of applets that are based on well known NP-complete problems.

click image to go to applet page
Click on the image to go to the applet.
click image to go to applet page

The Million Dollar Sudoku Puzzle

Had you solved this 'super fiendish' puzzle, and posted your solution to The Times, you might have got yourself a bottle of bubbly. To earn that, you would have used a fair number of the deductive processes that you've developed over the last few years

Alternatively, you could have used something like the Java applet shown on the left. It's a non-deductive sudoku solver .

What I mean is that the program doesn't use any sudoku strategies. It simply looks for a permutation that fits the basic rules, and the given numbers.

The generalised form of this puzzle, solved in this way, is one of a steadily growing number of problems classed as NP-Complete that together constitute the PvsNP problem. I won't attempt to explain this. There are articles about it all over the net. You'll find a good introduction in the Wikipedia. If you want the full works, read Stephen Cook's statement of the problem. It was he who formulated the question. 

The Clay Institute of Mathematics are offering a prize of $1,000,000 to anyone who can solve the PvsNP problem. It's one of the seven Millenium Problems.

The world of technology would give a great deal more. 

Regardless of whether or not the PvsNP question is answered, the theoretically hard NP-complete problems in its domain are easily solvable in many real-world circumstances.

Sudoku puzzles are created in the expectation that they will be solved. Generally, there is only one solution. Normally they are solved by deduction. They can also be solved, as they are here, using similar methods to those used to solve the Travelling Salesman Problem. 

If you want a solution to a prize-winning sudoku puzzle, there are faster and more reliable solvers freely available elsewhere on the web. This applet is more fun. Its a bit like fruit machine.

One thing that's said about NP-complete problems is that if you've solved one you've solved them all. There are some that say that sudoku is the one to concentrate on.

Misterioso

Some years ago, I wrote a Java application to browse my collection of jazz recordings. One of the things I wanted do with it, was compile collections of tracks that played for exactly the amount of time that I specified. That was in the days of cartridges, when you aimed to fill each side exactly so that there was no need to rewind when you turned the tape over. 

When I tested the program, I was amazed when it came up with a set of tracks to exactly fit the tape. Not because the program was particularly complicated, but because I was testing it with just 20 tracks.

I've put the algorithm into the applet shown below. Click on the image if you want to play with it.

 

click image to go to applet page

 

What I didn't know then, was that the problem of selecting a subset of numbers from a set, to exactly total a target value is one of the best known PvsNP problems. It is the subset sum problem. See wiki again.

So does that get me a million dollars? Unfortunately not!The method it uses is a random process. By its nature, its too 'hit and miss' to qualify. However, for my purposes, it works perfectly.

Sudoku 17

I've run my sudoku applet with one of Gordon Royle's Minimum Sudokus. These have a single solution with just 17 givens (or hints, or clues, whatever you want to call them). As my program doesn't use any sudoku strategies, I had expected it to stupidly chomp through it in a reasonably short time........ But it didn't.

I was disappointed at first, but now I'm just a bit amazed. It's not because of the single solution (not a problem up until now). Nor is it the scarcity of the givens (see sad puzzle to the right). It just seems to have found it too difficult.

If you want to see the applet struggle with a minimum sudoku, go to my applet page. It's the second applet down.

click image to go to applet page
click image to go to applet page

Sit Down You're Rocking The Boat!

This little trick really blows my mind. Dressed up as a clever balancing act, it is simply a special case of the subset sum problem. It's called the partition problem.

For this demonstration, you have 20 sealed crates to load into the hold of a ship. Their weights are random numbers in the range 1-50,000 units. You have to flip them, port and starboard, so that you have exactly the same weight (or a difference of 1 unit if the total weight is an odd number) on either side.

What really amazes me is that 20 randomly generated numbers of this size, are so amenable to being partioned in this way.

The applet allows you to do the partitioning manually if you feel up to it. It also allows you to alter the numbers that you've been given.

 

A Question Of Balance

This next applet will help you find some answers to questions that no doubt occurred to you while you were playing with the 'StevedoreStomp' applet above. What happens with larger numbers of crates, and bigger crates.

With this one you can increase the number of crates to 1 million, and the maximum crate size to 1 billion. You can run for as long as you're prepared to keep pressing the 'CONTINUE' button that pops up every 10 million iterations. You can also play around with the 'STOMP' factor, that controls the degree to which applet is focussed on its goal.

Here are some statistics to keep in mind if the applet doesn't seem to be getting anywhere......

Number of crates Time required to review all the ways of dividing them between the 2 holds
20 1 second
30 20 minutes
40 12 days
50 35 years
60 37,000 years
70 37,000,000 years
80 Twice the age of the universe
1,000,000 Don't bother thinking about it!

click image to go to applet page

That's not to say that balancing the cargo gets harder simply as you increase the number of crates. In fact, the number of crates has very little direct effect on how long it takes to balance the cargo. (see however, last paragraph in this section, where I'm talking about stomp).

To balance a cargo of 1 million crates in the range 1 to 50,000 takes a few seconds.....   around the same time as it does for 20.

It is the maximum crate weight that matters. To balance 80 crates with max. weight 50,000 takes seconds. To balance the same number with max.weight 1 billion takes around half an hour.

The weights of the crates, that are randomly generated for this applet, range from 1 unit to the maximum number of units that you specify. As the range is increased, the difference between one crate and the next in weight also increases.  It is these differences that have to be combined to cancel each other out. For a given range of differences, you need a minimum number of them to do the trick.

So long as you have a sufficient number of crates to make it almost sure  that an exact balance is obtainable, the number of crates appears to be irrelevant.

The table below shows results from a large number of runs, using an applet similar to the one here....but automated for multiple runs. It shows the maximum crate weight that can be exactly balanced without a single failure in the series of runs for each Number of Crates. I started out with 5,000 runs each, and cut it down to 1,000 after I'd passed the 25-crate mark.

Minimum number of crates

Half the number of possible arrangements

Maximum crate weight 

Average number of iterations

n c=2^(n-1) m i
20 524,288 15,000 31,835
21 1,048,576 35,000 72,669
22 2,097,152 69,000 142,112
23 4,194,304 140,000 308,916
24 8,388,608 283,000 631,578
25 16,777,216 564,000 1,251,452
26 33,554,432 1,128,000 2,651,895
27 67,108,864 2,267,000 5,291,363

There's a wide variation in the number of iterations it takes the applet to balance a cargo....even with successive runs on the same cargo. The figures in the columns 3 and 4 are a bit wobbly, but I think it's clear enough how they progress. As n is incremented by 1, the observed values of m and i are doubled. The calculated value of c is also doubled.  The value of m is approximately c/32 or 2^(n-5), and i is approximately m*2 or c/16 or 2^(n-4).

Using these relationships, we can say that with 80 crates we can balance a maximum crate weight of 37,778,931,862,957,161,709,568 , and that it would take, on average, double that number of iterations to do so.

If we reckon that we can work through 1,000,000 iterations per second, this would take 2,395,924,141 years.

Looking at it one way, we could say that taking this long to shunt 80 crates around to get them balanced, has all the hallmarks of an NP-complete problem.

Looking at it another way, if we had a applet that just counted up to the maximum crate weight, it would work through around half of those iterations. We wouldn't regard that as a NP-complete problem.

So, in this case, is the applet concerned with working through the combinations of 80 crates, or with balancing large numbers, to get a successful result? Are these necessarily the same thing?

A 23-digit number is not that impressive to look at, but it's a hell of a lot of beans.

I've written another version of the applet that allows you to set a minimum crate weight. I think this is is a bit more realistic. It allows you to work with big crates that vary relatively little in weight.  The closer you set the minimum to the maximum, the more you apply the additional contraint that the number of crates on each side is the same. The amounts to be balanced are the excesses of the crate weights over the minumum, so the minimum number of crates and the iterations are reduced accordingly. Generally, the need to have the same number of crates on each side, is not a problem, but there will be limiting cases where it makes a difference.

STOMP

The stomp value is the percentage number of regressive changes that are disallowed. All the above trials were carried out with a value of 50. At that value, the number of crates has very little effect on the resolution time.

The stomp value itself has only a moderate effect on the running time........when the number of crates is small. With large number of crates, it's more evident. Here are the results of a series of runs (1000 per stomp value) for a maximum crate weight of 50,000.

Stomp Value Average iterations with 23 crates Average iterations with 100 crates Average iterations with 1,000 crates Average iterations with 1,000,000 crates
0 149,169 266,192 893,861 24,556,773
10 133,687 206,081 403,299 518,989
20 123,461 167,718 243,180 259,776
30 110,188 133,769 156,932 164,251
50 98,227 105,568 107,117 117,066
70 95,135 93,992 94,019 90,660
80 98,436 86,769 82,210 79,539
90 117,183 84,006 74,383 75,111
100 fails fails fails 79,432

 

Reality Check

Nothing I've said in the above section makes the theoretical partition problem any easier than it's stated to be.  I'm not addressing the PvsNP question here, I'm merely trying to discover the limits to the practical application of the basic random algorithm that I'm using.

In the real world the the crate weights would be input with their crate numbers, rather than generated by a random number generator as here. The output would be a list of crate numbers and their locations*.

If your cargo is, or looks to be, a normal set of crate weights in the ranges that are covered by these applets, the required result should be produced in seconds, or minutes, or an hour at most, depending on the crate weight range.

If process doesn't produce an exact balance in the expected time, then there probably isn't one. This can happen when you have more than enough crates, but most likely it's an indication that you haven't.

If the difference is small, and there are sufficient numbers of crates, it is worth repeating the operation a few times. Sometimes the process will take a lot longer than the average time. Within a few reruns it should finish in less than the average time (if there's a solution).

If it comes up with the same unbalanced result each time, it has probably got the best result available. The difference should be small.

If the difference is large, then there's almost certainly an anomaly in the data. If this is not easily spotted, you can cut your data up into checkable chunks**, and balance each of these. You'll probably only find a single bad one. More than one chunk of quirky data would reduce its quirkyness.

One obvious cause of imbalance would be a single crate that weighed more than all the others put together. This could be a data error. If you had one of these you should probably get your data entry controls tightened up before going any further. The balancing operation is very tolerant. With a lot of crates it is very difficult to come up with a set of crates that won't balance....even when the data is anomalous...and disasterously wrong!

If you don't get an exact balance, and it's important to know whether it's possible, the table at the beginning of this section will give you some idea of how long it will take you to find that it is not.

* You could have more than 2 locations, and relationships that were other than straight equality, without making the problem necessarily much more complicated. The method works in as well with bin packing as it does with partitioning. Even more complicated arrangements such as those sorted by sudoku applet would not be out of the question.

**Splitting the data into chunks can also be useful even when your data has no quirks. If you had a million crates you might not want to be too fussy about the location of the whole lot. You could set aside 100 crates that you would later use to create an asymetric balance to offset the imbalance of the roughly balanced 999,900.

 

To be continued......

contact: hagarain@aol.com