# Project definition

## The LEIA project

The goal of the LEIA project is to design and implement a computer program that
autonomously learns expectations in the form of a probabilistic model of its environment.
The program has to learn autonomously in the sense that it does not depend on task related
evaluations of its performance. The program has to show 'curiosity' rather than
'obedience'. Learning has to be reasonably efficient in the sense that the program
does not get too much slower with time. An early prototype shows that such a program is
possible. LEIA is focused on learning itself rather than the goal of the learning system.
The danger of investigating, for example, the effect of medication on the health of patients
is that *we* learn about the domain. If we preprogram that knowledge in a system, that
system doesn't learn at all. Such a system makes more or less accurate predictions based
on *a priori* knowledge. The real challenge in LEIA is to understand fundamental
aspects of learning, like uncertainty, freedom of action, co-operation, etc.

### Autonomous learning

The central hypothesis of the LEIA project is that it is possible for a computer program to
learn from its environment without the requirement that the environment guides the learning
process. From now on we will use the term *learning system* instead of *learning
computer program* when other (natural) learning systems are implied too. We want to
study possibilities for strategies that enable a learning system to learn in an arbitrary
environment in the sense that it uses actions to find and verify regularities in the
environment. These regularities can be used to update a probabilistic model of the environment
that improves over time in a statistically quantifiable way. In short: we want a learning
system that shows exploratory behavior rather than a learning system that relies on teaching
behavior from its environment and preprogrammed *a priori* knowledge. We aim for a
system that learns to form reasonable expectations of an arbitrary environment, rather than a
system that approximates the true process behind a severely limited environment. It is widely
believed that "A computer produces nothing more than you put into it." A program that
interacts with its environment guided by nothing more than curiosity will prove that statement
wrong.

### Interactive learning

The first basic assumption for the LEIA project is that autonomous learning systems must be interactive. Theories like statistical inference and off-line learning are restricted to learning systems that can only predict. Those learning systems cannot influence their environment in any way. Of course there are situations where predictions are sufficient, but there are indications that interaction can improve the learning speed of a learning system dramatically. LEIA intends to find out if, and how, this could work.

### Model growth

The hypothesis that autonomous learning is possible grew from the combination of two existing ideas: the minimum description length principle (MDL) and interactive learning.

#### The minimum description length principle.

A second basic assumption for the LEIA project is that knowledge *about* the environment
can be measured *independent* of that environment. The rationale for that assumption is
hidden in the theory behind the minimum description length principle (MDL).

MDL is a statistical inference technique that selects a probabilistic model for a data set. The data set might have been generated by a process that is more complex than can be described with the set of models that we use. Suppose we want to make a probabilistic model of natural language. We could use probabilistic models that predict the next word in a text, based on the previous word in the text only. We know that we use more context when producing prose. Nevertheless MDL can guide a learning algorithm to choose an optimal model (in a certain sense) from the available models.

The MDL principle is based on the fact that a statistical model can be used as a data compressor. The better a model fits the data, the more efficiently that model can be used to compress that data. MDL works by the comparison of two quantities: the complexity of the candidate model and the efficiency of that model for compressing the data set. In order to express these two quantities in such a way that they can be compared, the data set is re-encoded in a two part code. The first part of the code encodes the candidate model. The second part of the code encodes the data, using the candidate model as a compression scheme. An MDL model for the data is defined as a model that achieves the smallest total code length possible for any model in our class of probabilistic models.

#### Using the minimum description length principle to measure knowledge.

The code length of the MDL model of a data set is a good measure for the amount of knowledge that the learning system has obtained from the data set. Particularly noteworthy is the fact that this measure is not defined as the proximity to the 'true' process that determines the behavior of environment. That is just as well, because the process that determines the behavior of the environment may be unknown or uncomputable or just too complex for the type of models that we use. Even better: a learning system can compute the amount of knowledge in its own model. MDL defines a measure of statistical knowledge that can be used by the learning system itself.

#### Using the minimum description length principle interactively.

The third assumption for the LEIA project is that autonomous learning systems should optimize
*model growth*. This assumption follows from the first two. In an interactive setting, the
data set would describe the interaction between the learning system and its environment up to a
certain time. The resulting model can be used together with a possible strategy to predict a
possible future for the interaction. These predictions can in turn be used to select the
'best' strategy.

In itself this technique can be applied to every learning task that is studied in the area of
Computational Learning Theory. (There is a theorem that states that every model class can be
transformed in a class of probabilistic models and that MDL models perform well under a large
variety of loss/credit functions.) In interactive learning, however, the measure of knowledge
that is implied by MDL turns out to be very useful. The rate of growth of the complexity of
the model can be used by an interactive learning system as the criterion to optimize when
choosing actions. Of course the environment has a large part in determining the interaction,
but the *criterion to optimize is independent of the environment*. In that sense a
learning system that optimizes model growth can be truly autonomous.

### Computational complexity

The final basic assumption for the LEIA project is that a computer program that implements the
theory should be *practical*. This implies some severe limitations on our expectations of
'optimal' statistical models and 'optimal' strategies for choosing actions.
Practicality of algorithms is enforced by placing limits on two kinds of complexity associated
with computations. Time complexity measures the number of atomic steps that must be performed to
complete the algorithm. Space complexity measures the amount of information that must be kept
around during the calculations. Sometimes there is a trade-off between these forms of
computational complexity: the algorithm can run faster when it keeps more data around or it can be
space-economic but slow.

The largest class of probabilistic models that we could theoretically use is the class of semi-computable semi-measures. The computation of members of this class can be very expensive in both space and time. However, the most important reason why we cannot use this class of models, is that we cannot use MDL on this class. This follows from the fact that the length of the two-part code is not a computable function of the code for the model and the description of the data seen so far. Even for more structured model classes, like the class of all probabilistic context-free grammars, the time required to compute the length of the two-part code is still exponential in the length of the data. Therefore we want to search for efficient algorithms to compute the description length of two-part codes for model classes that are as rich as possible.

Once we have an MDL model (or a close approximation) we need to find a strategy or policy to choose actions that will lead as fast as possible to the largest possible model growth. We expect that techniques from Operations Research will enable us to find candidate policies efficiently.

Visit the LEIA project page on GitHub for project details.