Thursday, August 30, 2007

Seed AI

Le grand assumption

The assumption of seed AI is this:
If we can make a program intelligent enough, a "seed" of intelligence, we can also make it gradually improve itself.
If intelligence can be expressed as a short formula (think Maxwell's equations or E = mc2), we might not need to make a seed. We will simply have to find that formula. In general, the No-Free-Lunch theorem implies that there must always be scope for improvement, but there are nevertheless some promising paths that I will post about some other time.

Related to seed AI is the point where an AI can read and make sense of human text, such as Wikipedia, Principia Mathematica, etc. If we can reach that goal, an AI would quickly acquire superhuman cross-disciplinary knowledge, which in turn would help it to digest ever more advanced text. To get there, a program has to have plenty of common sense that we all take for granted. Cyc is an ambitious, long-running, project that tries to collect all this "common sense".

A superintelligent AI would be incredibly useful. Useful beyond your wildest fantasies. .

A more intelligent program is likely harder to improve, but at the same time a more intelligent program is better at improving, so we can have reasonable hope for the improvement process to continue indefinitely (or perhaps converge to a single point - the formula for intelligence), although it is hard to guess what the improvement curve will look like. Will the difficulty increase much faster than the capacity? No one knows. It is tempting to make an analogy with humans and note how hard it is for us to rewire the brain to make us fundamentally more intelligent. For most programs this is probably very different. A program is made to be modified, it is software and not, as our brains, firmware or wetware.

If we want to talk about improving programs, we have to define what it means to improve one's intelligence, and thus what it means to be intelligent. We want intelligent systems to be useful. Useful intelligence is, just as science, about prediction, planning and pattern recognition. These are all so intertwined as to be more or less the same thing.

Prediction

Given certain input we want to predict what the outcome might be. It is nice if this prediction involves not only the most likely outcome, but also estimates of the probabilities of all the possible outcomes. Even better is if the predictor gives an indication for how certain it is about the probabilities.

If I roll a regular dice, I am fairly sure that the probability of a 3 showing up is about 16.7%, of course the dice might be damaged or otherwise unfair, or perhaps I miscalculated 1 / 6 or misunderstand the laws of probability, etc. Neverthless, I am fairly certain. On the other hand, I estimate the probability of Sweden beating Brazil the next time they meet in soccer to about 10%, but I am fairly uncertain about that figure. Thus I should be cautious about acting on it, for example not taking bets. I am, however, quite certain that I am uncertain about my last probability estimation. It is probably not very useful to continue this recursion further, neither for me nor for a program, so I'll be quite satisfied if my AI knows certainties concerning probabilities, but not certainties about certainties.

Two classic examples where prediction is useful are weather forecasts and the stock market.


Planning

Prediction is closely related to planning. One way of formalizing planning is to make an enormous tree, where each choice I can make is a branching point and every consequence along with it's probability is also a branching point. In a complex world most of my millions of choices/actions will not have any bearing on me reaching a specific goal, so the tree gets unfeasibly large. The first step is to quickly predict which paths might actually have a significance towards me reaching my goal, thus pruning the tree. Then I have to predict what the consequences of my actions are likely to be, making a model of the outside world. Now I have a tree where I can start searching for a solution, in other words make a plan

A classic example of a planning problem is Towers of Hanoi. It is trivially easy to make a program that solves Towers of Hanoi, but it is harder to construct a general AI that, given the rules to the game, solves it in general. You cannot just exhaustively search your decision tree, because Towers of Hanoi with 30 discs requires 2^30 - 1= 1073741823 moves to complete. This means that the depth of the tree is 10^9 and, given at least two paths on each level, 2^(10^9) nodes. That amounts to more than a 1 followed by 300 million zeroes - a ridiculously large number. The planner must reason about the effects of the rules and recognize the pattern for moving the discs.


Pattern recognition

Recognizing patterns is, among other things, the useful property of being able to spot that given this, that follows more/less frequently. A neat way of deciding if you have spotted a pattern is to invoke Minimum Description Length or MDL. 10101010101010... can be described with the exact digits, or as a repeating pattern of 10s or as alternating 1 and 0. Which one is chosen depends on what language you have chosen to express your pattern in. For longer patterns it makes less and less difference what language you chose. The same reasoning applies to, for example, a picture. If we have a completely black 1000 x 1000 pixel square with a white 500 pixel (in diameter) circle in the middle , then that description is much shorter than actually encoding the image pixel for pixel. We have recognized a pattern.

Notice the close relationship between pattern recognition and compression.

Intelligence test

Constructing a true intelligence test, that can be executed reasonably fast, would be very useful in the research of general AI. You have to be careful when designing such a test, because if it is too simple you will end up with an AI that is specialized on solving exactly your test and nothing else.

If we had such a test, a fairly simple, but very interesting, experiment could be made.
  1. Start with a program that produces random output. The seed!
  2. Measure its intelligence. This producer of random noise is now your first and most intelligent program.
  3. Interpret the currently best program's output as new programs and measure the intelligence of these programs, give this intelligence as feedback to the generating program.
  4. Whenever a program that is more intelligent than the previous most intelligent program is found, use it as the new generator to search for even more intelligent programs.
You might need to add some precautions so that you do not enter an evolutionary dead end, for example by letting different promising generators run in parallell, but the above points are the basic gist of it. This will let you find out how much more time it takes for each successively more intelligent program to construct an even more intelligent program. If you are very, very, lucky and have constructed your intelligence test very well, this might even suffice as the Seed.

In coming posts I will describe what the mathematically perfect predictor looks like and what the mathematically perfect planner looks like. They are, at least on the surface, surprisingly dissimilar.

1 comment:

cognomad said...

Thanks for the comment on my knol, David!
You're right, it sounds very similar on a high level, & I am sure there are many people who'd agree with the definition. But I don't know of anyone who used it to derive a universal, low-level, quantitative criterion to select inputs & algorithms. The key is to start from the beginning: raw sensory inputs, & "test" their predictive value, in the process discovering more & more complex patterns. That's what scalability is all about, if you can't evaluate pixels, it'll be super-exponentially more difficult to start from more complex data. That's why I think Cyc, NLP, & high-level approaches in general are hopeless for AGI.
I am sorry, but your "Intelligence test" idea, besides it being entirely hypothetical & presumably externally administered, has it exactly backwards. Just like many Algorithmic Learning approaches, you want to generate patterns & algorithms, instead of discovering them in a real world. Quite simply, we predict from experience, these patterns & algorithms will have *no* predictive value beyond mere chance, unless they're derived from the experience. Notice that the difference between patterns & algorithms is strictly in their origin: the former are discovered & the later are "invented".