The assumption of rational traders has been the subject of a harsh debate in theoretical economics for the past 40 years. Recent experimental evidence gathered by cognitive neuroscientists suggests that the way we learn in simple tasks is in stark contrast with the rationality assumption: the way we learn is biased. Confirmation bias, for example,
is the tendency to incorporate the information in line with our priors and disregard the information in contrast with them.
These findings beg for an explanation. In particular, if evolution selected such biases, they should be beneficial in some circumstances: for example, in tasks with two asymmetric bandits, there are 'optimal biases' which allow individuals to increase the average earned reward. The reason for these additional gains is that biased beliefs are magnified, allowing one to distinguish more clearly between two similar options.
Interestingly, in these contexts, the optimal bias corresponds to a learning dynamics that breaks detailed balance, leading, therefore, to irreversible dynamics, even in the stationary state. We argue, by means of analytical calculations and numerical simulations, that the optimal bias corresponds to dynamics that maximize the entropy production in the stationary state. In particular, in this context, the stationary state with maximum entropy production allows agents to safely explore the environment without being stuck in a sub-optimal belief.