## Marcus Hutter – Universal Artificial Intelligence

Last year I did a series of interviews with Marcus Hutter while he was down in Melbourne for the Singularity Summit Australia 2012.

Marcus will also be speaking at the [highlight]Science, Technology & the Future conference in Melbourne on Nov 30 – Dec 1 2013 in Melbourne, Australia.[/highlight]

Hutter uses Solomonoff’s inductive inference as a mathematical formalization of Occam’s razor. Hutter adds to this formalization the expected value of an action: shorter (Kolmogorov complexity) computable theories have more weight when calculating the expected value of an action across all computable theories which perfectly describe previous observations.

At any time, given the limited observation sequence so far, what is the Bayes-optimal way of selecting the next action? Hutter proved that the answer is to use Solomonoff’s universal prior to predict the probability of each possible future, and execute the first action of the best policy (a policy is any program that will output all the next actions and input all the next perceptions up to the horizon). A policy is the best if, on a weighted average of all the possible futures, it will maximize the predicted reward up to the horizon. He called this universal algorithm AIXI.

Below is the transcription of the part of the interview series where Marcus talks about intelligence, Bounded Rationality, and AIXI.

## What is Intelligence?

“an agents ability to achieve goals in a wide range of environments”, or to succeed in a wide range of environments.

Now look at this sentence and ask, “wow, how can this single sentence capture the complexity of intelligence?” There are two answers to this question. First: many aspects of intelligence are emergent properties of intelligence, like being able to learn – if I want to succeed or solve a problem I need to acquire new knowledge, so learning is an emergent phenomenon of this definition.

And the second answer is: this is just a sentence that consists of a few words, what you really have to do, and that’s the hard part, is to transform it into meaningful equations and then study these equations: And that’s what I have done in the last 12 years.

## Bounded Rationality

You design theories so that they describe your phenomenon as well as possible and the computational aspect is secondary. Of course if it is in-computable and you can’t do anything with it, you have to come up with another theory, but this always comes second. And only in computer science (and this comes naturally) computer scientists try to think about how they can design an efficient algorithm to solve my problem, and since AI is sitting in the computer science department traditionally, the mainstream thought is “how can I build a resource bounded artificial intelligent system”. And I agree that ultimately this is what we want. But the problem is so hard, that we (or a large fraction of the scientists) should take this approach, model the problem first, define the problem first, and once we are confident that we have solved this problem, then go to the second phase, and try to approximate the theory, try to make a computational theory out of it. And then there are many many possibilities, then you could still try to develop a resource bounded theory of intelligence, which will be very very hard if you want to have it principled, or you do some heuristics… or .. or .. or… many options. Or the short answer maybe I am not smart enough to come up with a resource bounded theory of intelligence, therefore I have only developed one without resource constraints (that would be the short answer).

## AIXI

So that works as follows: it has a planning component, and it has a learning component. What the learning component does is: think about a robot walking around in the environment, and at the beginning it has little or no knowledge about the world, so what it has to do is to acquire data/knowledge of the world and then build its own model of the world, how the world works. And it does that using very powerful general theories on how to learn a model from data, from very complex scenarios. This theory is rooted in Kolomogrov complexity, algorithmic information theory – the basic idea is you look for the simplest model which describe your data sufficiently well. And this agent or robot has to do this continuously, gets new data and updates its model. So now the agent has this model, that is the learning part. Now it can use this model for predicting the future… And then it uses these predictions in order to make decisions, so the agent now thinks if I do this action, and this action… this will now happen and this is good or bad. I’ll come to the good or bad part soon. And if I do this other action it is maybe better or worse. And then the “only” thing what the agent has to do is think about all the potential future action sequences and take the one which is best according to the model which the agent has learned, which is not perfect but which over time gets better and better. Finally you have to qualify what does “best” mean, and that’s the utility part or succeeding: the agent gets occasional reward from the teacher, who could be just a human or the reward could be built in (for instance if the battery level is low it is bad, if it’s high it is good, if it finds a rock on Mars it is good, if it falls down a cliff it’s bad), so we have these rewards, and the goal of the agent is to maximize his reward over it’s lifetime. That’s the planning part. So first comes the learning part, then the prediction part, then the planning part, and then it gets to actions and the cycle continues.

So this theory, the AIXI agent, it’s mathematically rigorously well defined. It is essentially unique, and you can prove amazing properties of this agent – in a certain sense you can prove that it’s the most intelligent system possible. I am translating the mathematical theorems into words, which is a little tricky but that’s the essence. The downside is that it’s in-computable. You asked before about the resource bounded intelligence so AIXI needs infinite computational resources, and in order to do something with it you need to approximate it, and we have done this in recent years also. At the moment it is at the toy stage so it can play PacMan, Tic Tac Toe, some simple form of Poker, and some other games… The point is not that it is able to play PacMan or Tic-Tac-Toe (they are not hard), the point is that the agent has no knowledge about these games, it starts really blank, and just by interacting with the environment – it does not even know the rules of the game – by interacting with this poker environment or PacMan environment it figures out what is going on, and learns how to behave well.

The cool thing really is and the difference to many other projects (there is Deep Blue who plays chess better than the Grand Masters, but it was systems specifically designed to play chess, and it can’t play go), this system is not tailored to any particular application. If you interface it with any problem (in theory it can be any problem: chess, solving a scientific problem) it will learn to do that very well and indeed optimally. The approximations we have at the moment, are of course, very limited, but if you look at these approximations they use standard compressors for the model learning part; There is nothing about PacMan in these data compressors: they are standard data compressors. For the planning part we use standard Monte-Carlo (random search) which has nothing to do with a particular problem, or a game – and this approximation is already able to learn by itself {these various games}. There is no PacMan knowledge built in. The only thing (of course) you have to do is to interface the game with this agent For PacMan you have these pixels in a 15×15 grid, and each square is a wall, is free, is food or there is a ghost, and this piece of information you give this agent and then it gets negative reward if it gets eaten by a ghost, positive reward if it eats a pallet, and that’s it, and the goal of the agent is to maximize reward, and everything else is figured out by itself.

At Singularity Summit Australia 2012 – “Can Intelligence Explode?”