Friday, December 21, 2007

Why planning is hard

After my last post about planning I thought some more on the issue and had something close to an epiphany.

When you plan, in the solitaire sense, you need rules governing what moves are legal - transformation rules. If you treat these rules like black boxes, just understanding them by playing around with positions and see how they behave, you can only do so much. An important rule might be usable extremely rarely, but nevertheless be the key to success if you specifically aim to reach a position where it is applicable. This means that you might miss how important a rule (or an exception to a rule) is, when just "black-boxing" it, because it's usefulness or purpose might never come up.

Even if you have no such rare rules in your system, the best you can hope for if you want to analyze a system when black-boxing is just to formulate your own internal rules for how the system seems to behave.

Thus, the reason planning is hard is that you need to be able to analyze/understand code to understand transformation rules in general and understanding code is hard. You need to understand when you may take actions and what these actions do, i.e understand transformation rules, whatever system you are planning for.

A small prediction

The reason why good planning is a key to analyze code and prove things in formal systems is that you need to understand code in order to plan. Thus, I postulate, when something can analyze code better than I, it will quickly learn to do everything else intelligence-based better than I. The implication probably works both ways, so the first program that is thoroughly smarter than I will most likely have it's foundation laid upon the ability to reason about code.

Wednesday, December 12, 2007

Deceptively simple game

I would pay a handsome sum (say $1 million, if I could raise it) for a program that could do the following well-defined, seemingly simple, task.

General solitaire solver

Take as input a list of rules for a solitaire-like game. The rules are deterministic transformation rules, defining which moves are legal given a certain position. The rules will be given in whatever Turing complete language the solver likes. For example a simple Scheme dialect without side-effects or a subset of x86 machine code.

As long as it solves the task, the solver is free to treat the rules as black boxes that take one position and outputs a, possibly empty, list of potential positions.

The solver will then take an initial position as second input and one or more target positions as final input. In fact, to make it more general, take a function that tells whether a position is the target or not.

As output, I want a sequence of transformations that leads from the initial position to a target. It does not have to be the shortest sequence, just a sequence. Also I want the answer reasonably fast. At least as fast as I could solve it myself.

Extra features

While I would be very happy with just the above, here are some extra features that would be nice.

  • Instead of a binary target function, let me use a continuous target function, and give as output a sequence that gives an end position with as good a score as possible.
  • Accept one or more opponents. This would be useful for playing games - go, shogi, chess, etc, but apart from that would probably be a step towards the stochastic thing below.
  • Allow the transformation rules to behave in a stochastic/probabilistic manner.
Why?

I have no particular desire to solve solitaire automatically, so why would I want a generalized solitaire solver? Well, if you can input the rules for solitaire, you can also input the rules for Towers of Hanoi, which I used as an example of difficult planning in another post. Suddenly you can solve a whole range of reasoning problems, mathematical proofs, reasoning about programs and all sorts of interesting and important stuff.

It is interesting to think about the problem from the point of view of solving solitaire or some other simple one-man game. I think it makes it less intimidating.

Friday, September 7, 2007

The optimal IQ test

The hardest part for me when thinking about seed AI and optimal optimization, is coming up with a good fitness (IQ) test. Since you need the test to run fast, you end up testing that the algorithm can get somewhere fast, i.e checking only the extreme beginning of a performance curve that ultimately must continue to be good many thousand times longer. What we want to measure is something like the Big O performance of the algorithm in the limit and not what it looks like the first second of it's life. Another problem is that we want the intelligence to be as general as possible and not over-specialized on solving a few test cases.

A fitness test of a fitness test

Recently I got a new idea of what constitutes a good IQ test. Our current approach to seed AI is about developing a really good programmer that can program better versions of itself. A good fitness test is a test that has a high correlation between a program testing good on it and the same program being able to generate new programs that gets even better scores. Not only is this a necessary criterion. It might be sufficient. Any test of a program which means that this program is likely to produce new programs that perform well (strictly - reach a new global optimum) on the test, might be a good fitness test of what we are after. The test that produces new Masters (see this post) most frequently might be the best test. Getting the most new Masters over time, also ensures that the test does not take unnecessarily long to run. I am not completely sure, but we might need to force all tests to start with a kernel of an intelligence test (compress this string, predict this numeric sequence, something like that), just to set it of in the right direction and eliminate trivial solutions, like giving all programs a random IQ from some distribution. The trivial solution of giving all programs the perfect IQ, would not be a candidate, because no new globally optimal solutions would be found, so no new Masters would come, and thus that fitness test would not test well on the fitness test test (am I making sense?).

Having a fitness test of our fitness test suggests that we can start by evolving a good test, or even more beautifully, co-evolve solution and test.

Perhaps I am just dreaming, but it sure would be a beautiful algorithm if it worked...

Thursday, September 6, 2007

Coincidence?

I once read a short story about the creation of the world's most powerful computer. In essence, each time they tried to turn it on, they had some minor misfortune, a power outage, the maid accidentaly tripped on, and unplugged, the power cord, etc. The highly technical twist in the end was that since we live in a Multiverse, all things that can happen happens in a separate universe. It turns out that the computer was so advanced (or something) that it turned in to a black hole when switched on, destroying all life. Since the observers could only exist in the universes where the computer remained switched off, they experienced these "coincidences", that protected them.

A database of all human knowledge

When I read up a bit on Cyc, the other day, I came upon a competing project that I, myself, once added some mindpixels to.
Mindpixel was a web-based collaborative artificial intelligence project which aimed to create a database of millions of human validated true/false statements, or probabilistic propositions.
Unfortunately the project is now defunct, since the founder Chris McKinstry committed suicide on 23rd January, 2006.

Well, never fear, because from the Mindpixel page on Wikipedia, we learn that Open Mind Common Sense is a similar project, run by MIT, whose goal is to build a large common sense knowledge base from the contributions of many thousands of people across the Web.

Unfortunately that project is also stalling, since Push Singh who was slated to become a professor at the MIT Media Lab to lead the Commonsense Computing group in 2007, commited suicide on Tuesday, February 28, 2006. Just a month after the other visionary of web knowledge, Chris McKinstry.

Let the unreasonable conspiracy theories commence.

Wednesday, September 5, 2007

12:50, press Return

The deed is done.

I and my friend Nils made a "sprint" last night, where we finished the first version of our seed AI.

First we made a simple IQ-function that tests how well a program (a Program Generator or PG) can generate new programs (leaves) from feedback of how close a leaf is to what we want.

A PG that receives the best IQ so far gets a chance to generate new PG's, in effect it becomes a Program Generator Generator. We call this state a Challenger. When a Challenger generates a new PG with the best IQ so far, it has verified that not only does it have good IQ, but it can produce other programs with good IQ, and is thus promoted to the status of Master (and the smart PG gets to be Challenger).

The programs are generated and run in a circular buffer under a virtual machine, where all sequences of integers are valid programs and no operators can throw exceptions. Such a VM is much slower than machine code, but the process gets faster than if it were running on bare bones x86, because on an x86 (or other architecture) most bytes are meaningless and will throw exceptions, which are slow to process. A PG that is good enough (for example a human) to understand how to write code without generating (many) exceptions, would theoretically run faster on x86, but our current primitive PGs will benefit from a virtual environment.

Anyway, we wrote the code, pressed Enter (the title is a reference to the nice movie Pi), and voĆ­la, our random generating seed started finding more intelligent programs than itself - Challengers. After a while, Nils calls it 15 seconds, a Challenger managed to become Master and after a longer while the Master produced a Challenger that later became the third generation Master. Spectacular!

Now we just need a better IQ test and a way to inspect the generated programs! Well, we also need tons and tons of hardware. This is the sort of task that could happily use up Google's entire computer armada for a year and still benefit from more. Hmm.. perhaps if I ask them nicely..

Thursday, August 30, 2007

Seed AI

Le grand assumption

The assumption of seed AI is this:
If we can make a program intelligent enough, a "seed" of intelligence, we can also make it gradually improve itself.
If intelligence can be expressed as a short formula (think Maxwell's equations or E = mc2), we might not need to make a seed. We will simply have to find that formula. In general, the No-Free-Lunch theorem implies that there must always be scope for improvement, but there are nevertheless some promising paths that I will post about some other time.

Related to seed AI is the point where an AI can read and make sense of human text, such as Wikipedia, Principia Mathematica, etc. If we can reach that goal, an AI would quickly acquire superhuman cross-disciplinary knowledge, which in turn would help it to digest ever more advanced text. To get there, a program has to have plenty of common sense that we all take for granted. Cyc is an ambitious, long-running, project that tries to collect all this "common sense".

A superintelligent AI would be incredibly useful. Useful beyond your wildest fantasies. .

A more intelligent program is likely harder to improve, but at the same time a more intelligent program is better at improving, so we can have reasonable hope for the improvement process to continue indefinitely (or perhaps converge to a single point - the formula for intelligence), although it is hard to guess what the improvement curve will look like. Will the difficulty increase much faster than the capacity? No one knows. It is tempting to make an analogy with humans and note how hard it is for us to rewire the brain to make us fundamentally more intelligent. For most programs this is probably very different. A program is made to be modified, it is software and not, as our brains, firmware or wetware.

If we want to talk about improving programs, we have to define what it means to improve one's intelligence, and thus what it means to be intelligent. We want intelligent systems to be useful. Useful intelligence is, just as science, about prediction, planning and pattern recognition. These are all so intertwined as to be more or less the same thing.

Prediction

Given certain input we want to predict what the outcome might be. It is nice if this prediction involves not only the most likely outcome, but also estimates of the probabilities of all the possible outcomes. Even better is if the predictor gives an indication for how certain it is about the probabilities.

If I roll a regular dice, I am fairly sure that the probability of a 3 showing up is about 16.7%, of course the dice might be damaged or otherwise unfair, or perhaps I miscalculated 1 / 6 or misunderstand the laws of probability, etc. Neverthless, I am fairly certain. On the other hand, I estimate the probability of Sweden beating Brazil the next time they meet in soccer to about 10%, but I am fairly uncertain about that figure. Thus I should be cautious about acting on it, for example not taking bets. I am, however, quite certain that I am uncertain about my last probability estimation. It is probably not very useful to continue this recursion further, neither for me nor for a program, so I'll be quite satisfied if my AI knows certainties concerning probabilities, but not certainties about certainties.

Two classic examples where prediction is useful are weather forecasts and the stock market.


Planning

Prediction is closely related to planning. One way of formalizing planning is to make an enormous tree, where each choice I can make is a branching point and every consequence along with it's probability is also a branching point. In a complex world most of my millions of choices/actions will not have any bearing on me reaching a specific goal, so the tree gets unfeasibly large. The first step is to quickly predict which paths might actually have a significance towards me reaching my goal, thus pruning the tree. Then I have to predict what the consequences of my actions are likely to be, making a model of the outside world. Now I have a tree where I can start searching for a solution, in other words make a plan

A classic example of a planning problem is Towers of Hanoi. It is trivially easy to make a program that solves Towers of Hanoi, but it is harder to construct a general AI that, given the rules to the game, solves it in general. You cannot just exhaustively search your decision tree, because Towers of Hanoi with 30 discs requires 2^30 - 1= 1073741823 moves to complete. This means that the depth of the tree is 10^9 and, given at least two paths on each level, 2^(10^9) nodes. That amounts to more than a 1 followed by 300 million zeroes - a ridiculously large number. The planner must reason about the effects of the rules and recognize the pattern for moving the discs.


Pattern recognition

Recognizing patterns is, among other things, the useful property of being able to spot that given this, that follows more/less frequently. A neat way of deciding if you have spotted a pattern is to invoke Minimum Description Length or MDL. 10101010101010... can be described with the exact digits, or as a repeating pattern of 10s or as alternating 1 and 0. Which one is chosen depends on what language you have chosen to express your pattern in. For longer patterns it makes less and less difference what language you chose. The same reasoning applies to, for example, a picture. If we have a completely black 1000 x 1000 pixel square with a white 500 pixel (in diameter) circle in the middle , then that description is much shorter than actually encoding the image pixel for pixel. We have recognized a pattern.

Notice the close relationship between pattern recognition and compression.

Intelligence test

Constructing a true intelligence test, that can be executed reasonably fast, would be very useful in the research of general AI. You have to be careful when designing such a test, because if it is too simple you will end up with an AI that is specialized on solving exactly your test and nothing else.

If we had such a test, a fairly simple, but very interesting, experiment could be made.
  1. Start with a program that produces random output. The seed!
  2. Measure its intelligence. This producer of random noise is now your first and most intelligent program.
  3. Interpret the currently best program's output as new programs and measure the intelligence of these programs, give this intelligence as feedback to the generating program.
  4. Whenever a program that is more intelligent than the previous most intelligent program is found, use it as the new generator to search for even more intelligent programs.
You might need to add some precautions so that you do not enter an evolutionary dead end, for example by letting different promising generators run in parallell, but the above points are the basic gist of it. This will let you find out how much more time it takes for each successively more intelligent program to construct an even more intelligent program. If you are very, very, lucky and have constructed your intelligence test very well, this might even suffice as the Seed.

In coming posts I will describe what the mathematically perfect predictor looks like and what the mathematically perfect planner looks like. They are, at least on the surface, surprisingly dissimilar.