I present a quantitative approach to interactive learning and adaptive behavior, which integrates model- and decision-making into one theoretical framework. This approach follows simple principles by requiring that the observers behavior and the observers internal representation of the world should result in maximal predictive power at minimal complexity. Classes of optimal action policies and of optimal models can be derived from an objective function that reflects this trade-off between prediction and complexity. The resulting optimal models then summarize, at different levels of abstraction, the process causal organization in the presence of the feedback due to the learners actions. A fundamental consequence of the proposed principle is that the optimal action policies have the emerging property that they balance exploration and control. Interestingly, the explorative component is present also in the absence of policy randomness, i.e. in the optimal deterministic behavior. Exploration is therefore not the same as policy randomization. This is a direct result of requiring maximal predictive power in the presence of feedback. It stands in contrast to, for example, Boltzmann exploration, which is frequently used in Reinforcement Learning (RL). Time permitting, I will discuss what happens when one includes explicit goals and rewards into the theory, as is popular in RL.
ICTP - Strada Costiera, 11
I - 34151 Trieste Italy (+39) 040 2240 111 email@example.com