Marcus Hutter – Universal Artificial Intelligence

Universal Artificial Intelligence

hedbot_aixi_bubble_smallLast year I did a series of interviews with Marcus Hutter while he was down in Melbourne for the Singularity Summit Australia 2012.

Marcus will also be speaking at the [highlight]Science, Technology & the Future conference in Melbourne on Nov 30 – Dec 1 2013 in Melbourne, Australia.[/highlight]

Hutter uses Solomonoff’s inductive inference as a mathematical formalization of Occam’s razor. Hutter adds to this formalization the expected value of an action: shorter (Kolmogorov complexity) computable theories have more weight when calculating the expected value of an action across all computable theories which perfectly describe previous observations.

At any time, given the limited observation sequence so far, what is the Bayes-optimal way of selecting the next action? Hutter proved that the answer is to use Solomonoff’s universal prior to predict the probability of each possible future, and execute the first action of the best policy (a policy is any program that will output all the next actions and input all the next perceptions up to the horizon). A policy is the best if, on a weighted average of all the possible futures, it will maximize the predicted reward up to the horizon. He called this universal algorithm AIXI.

Below is the transcription of the part of the interview series where Marcus talks about intelligence, Bounded Rationality, and AIXI.

What is Intelligence?

marcus hutter - interview with adam ford[dropcap]I[/dropcap]ntelligence is a very difficult concept (maybe that’s the reason why many people try to avoid diluting it or consider more narrow alternatives). I’ve worked on this question for many many years now. We went through the literature; psychology literature, philosophy literature; AI literature – what individuals, researchers, and also groups came up with definitions, they are very diverse. But there seems to be one recurrent theme and if you want to put it in one sentence, then you could define intelligence as:
“an agents ability to achieve goals in a wide range of environments”, or to succeed in a wide range of environments.
Now look at this sentence and ask, “wow, how can this single sentence capture the complexity of intelligence?” There are two answers to this question. First: many aspects of intelligence are emergent properties of intelligence, like being able to learn – if I want to succeed or solve a problem I need to acquire new knowledge, so learning is an emergent phenomenon of this definition.
And the second answer is: this is just a sentence that consists of a few words, what you really have to do, and that’s the hard part, is to transform it into meaningful equations and then study these equations: And that’s what I have done in the last 12 years.

Bounded Rationality

marcus_hutter_singularitysummit_australia_2012_1037x691[dropcap]I[/dropcap]t is an interesting question whether resource bounds should be included in any definition of intelligence or not, and the natural answer is of course they should. Well there are several problems: the first one is that nobody ever came up with a reasonable theory of bounded rationality (people have tried), so it seems to be very hard. And this is not specific to AI or intelligence, but it seems to be symptomatic in science. If you look at the several fields (i.e. the crown physics discipline) theories have been developed: Newton’s mechanics, General Relativity Theory, Quantum Field theory, the Standard Model of Particle Physics. They are more and more precise, but they get less and less computable, and having a computable theory is not a principle in developing these theories, of course at some point you have to test these theories and you want to do something with them, and then you need a computable theory – this is a very difficult issue (and you have to approximate them or do something about it) – but having computational resources built into the fundamental theories, that is at least in physics, and if you look at other disciplines, that is not how things work.
You design theories so that they describe your phenomenon as well as possible and the computational aspect is secondary. Of course if it is in-computable and you can’t do anything with it, you have to come up with another theory, but this always comes second. And only in computer science (and this comes naturally) computer scientists try to think about how they can design an efficient algorithm to solve my problem, and since AI is sitting in the computer science department traditionally, the mainstream thought is “how can I build a resource bounded artificial intelligent system”. And I agree that ultimately this is what we want. But the problem is so hard, that we (or a large fraction of the scientists) should take this approach, model the problem first, define the problem first, and once we are confident that we have solved this problem, then go to the second phase, and try to approximate the theory, try to make a computational theory out of it. And then there are many many possibilities, then you could still try to develop a resource bounded theory of intelligence, which will be very very hard if you want to have it principled, or you do some heuristics… or .. or .. or… many options. Or the short answer maybe I am not smart enough to come up with a resource bounded theory of intelligence, therefore I have only developed one without resource constraints (that would be the short answer).


aixi1line[dropcap]O[/dropcap]k so now we have this informal definition that intelligence is an agent’s ability to succeed or achieve goals in a wide range of environments. The point is you can formalize this theory, and we have done that and it is called AIXI. Or Universal AI is the general field theory and AIXI is the particular agent which acts optimally in this sense.
So that works as follows: it has a planning component, and it has a learning component. What the learning component does is: think about a robot walking around in the environment, and at the beginning it has little or no knowledge about the world, so what it has to do is to acquire data/knowledge of the world and then build its own model of the world, how the world works. And it does that using very powerful general theories on how to learn a model from data, from very complex scenarios. This theory is rooted in Kolomogrov complexity, algorithmic information theory – the basic idea is you look for the simplest model which describe your data sufficiently well. And this agent or robot has to do this continuously, gets new data and updates its model. So now the agent has this model, that is the learning part. Now it can use this model for predicting the future… And then it uses these predictions in order to make decisions, so the agent now thinks if I do this action, and this action… this will now happen and this is good or bad. I’ll come to the good or bad part soon. And if I do this other action it is maybe better or worse. And then the “only” thing what the agent has to do is think about all the potential future action sequences and take the one which is best according to the model which the agent has learned, which is not perfect but which over time gets better and better. Finally you have to qualify what does “best” mean, and that’s the utility part or succeeding: the agent gets occasional reward from the teacher, who could be just a human or the reward could be built in (for instance if the battery level is low it is bad, if it’s high it is good, if it finds a rock on Mars it is good, if it falls down a cliff it’s bad), so we have these rewards, and the goal of the agent is to maximize his reward over it’s lifetime. That’s the planning part. So first comes the learning part, then the prediction part, then the planning part, and then it gets to actions and the cycle continues.
harcus hutter blue backgroundSo this theory, the AIXI agent, it’s mathematically rigorously well defined. It is essentially unique, and you can prove amazing properties of this agent – in a certain sense you can prove that it’s the most intelligent system possible. I am translating the mathematical theorems into words, which is a little tricky but that’s the essence. The downside is that it’s in-computable. You asked before about the resource bounded intelligence so AIXI needs infinite computational resources, and in order to do something with it you need to approximate it, and we have done this in recent years also. At the moment it is at the toy stage so it can play PacMan, Tic Tac Toe, some simple form of Poker, and some other games… The point is not that it is able to play PacMan or Tic-Tac-Toe (they are not hard), the point is that the agent has no knowledge about these games, it starts really blank, and just by interacting with the environment – it does not even know the rules of the game – by interacting with this poker environment or PacMan environment it figures out what is going on, and learns how to behave well.
The cool thing really is and the difference to many other projects (there is Deep Blue who plays chess better than the Grand Masters, but it was systems specifically designed to play chess, and it can’t play go), this system is not tailored to any particular application. If you interface it with any problem (in theory it can be any problem: chess, solving a scientific problem) it will learn to do that very well and indeed optimally. The approximations we have at the moment, are of course, very limited, but if you look at these approximations they use standard compressors for the model learning part; There is nothing about PacMan in these data compressors: they are standard data compressors. For the planning part we use standard Monte-Carlo (random search) which has nothing to do with a particular problem, or a game – and this approximation is already able to learn by itself {these various games}. There is no PacMan knowledge built in. The only thing (of course) you have to do is to interface the game with this agent For PacMan you have these pixels in a 15×15 grid, and each square is a wall, is free, is food or there is a ghost, and this piece of information you give this agent and then it gets negative reward if it gets eaten by a ghost, positive reward if it eats a pallet, and that’s it, and the goal of the agent is to maximize reward, and everything else is figured out by itself.

Video Interviews

For more video interviews please Subscribe to Adam Ford’s YouTube Channel

YoutTube Playlist of Interview Series with Marcus Hutter:

At Singularity Summit Australia 2012 – “Can Intelligence Explode?”