Empowerment (artificial intelligence)

Empowerment in the field of artificial intelligence formalises and quantifies (via information theory) the potential an agent perceives that it has to influence its environment.^[1]^[2] An agent which follows an empowerment maximising policy, acts to maximise future options (typically up to some limited horizon). Empowerment can be used as a (pseudo) utility function that depends only on information gathered from the local environment to guide action, rather than seeking an externally imposed goal, thus is a form of intrinsic motivation.^[3]

The empowerment formalism depends on a probabilistic model commonly used in artificial intelligence. An autonomous agent operates in the world by taking in sensory information and acting to change its state, or that of the environment, in a cycle of perceiving and acting known as the perception-action loop. Agent state and actions are modelled by random variables ( $S:s\in {\mathcal {S}},A:a\in {\mathcal {A}}$ ) and time ( $t$ ). The choice of action depends on the current state, and the future state depends on the choice of action, thus the perception-action loop unrolled in time forms a causal bayesian network.

Definition

Empowerment ( ${\mathfrak {E}}$ ) is defined as the channel capacity ( $C$ ) of the actuation channel of the agent, and is formalised as the maximal possible information flow between the actions of the agent and the effect of those actions some time later. Empowerment can be thought of as the future potential of the agent to affect its environment, as measured by its sensors.^[3]

${\mathfrak {E}}:=C(A_{t}\longrightarrow S_{t+1})\equiv \max _{p(a_{t})}I(A_{t};S_{t+1})$

In a discrete time model, Empowerment can be computed for a given number of cycles into the future, which is referred to in the literature as 'n-step' empowerment.^[4]

${\mathfrak {E}}(A_{t}^{n}\longrightarrow S_{t+n})=\max _{p(a_{t},...,a_{t+n-1})}I(A_{t},...,A_{t+n-1};S_{t+n})$

The unit of empowerment depends on the logarithm base. Base 2 is commonly used in which case the unit is bits.

Contextual Empowerment

In general the choice of action (action distribution) that maximises empowerment varies from state to state. Knowing the empowerment of an agent in a specific state is useful, for example to construct an empowerment maximising policy. State-specific empowerment can be found using the more general formalism for 'contextual empowerment'.^[4] $C$ is a random variable describing the context (e.g. state).

${\mathfrak {E}}(A_{t}^{n}\longrightarrow S_{t+n}{\mid }C)=\sum _{c{\in }C}p(c){\mathfrak {E}}(A_{t}^{n}\longrightarrow S_{t+n}{\mid }C=c)$

Application

Empowerment maximisation can be used as a pseudo-utility function to enable agents to exhibit intelligent behaviour without requiring the definition of external goals, for example balancing a pole in a cart-pole balancing scenario where no indication of the task is provided to the agent.^[4] Empowerment has been applied in studies of collective behaviour^[5] and in continuous domains.^[6]^[7] As is the case with Bayesian methods in general, computation of empowerment becomes computationally expensive as the number of actions and time horizon extends, but approaches to improve efficiency have led to usage in real-time control.^[8] Empowerment has been used for intrinsically motivated reinforcement learning agents playing video games,^[9] and in the control of underwater vehicles.^[10]

References

^ Klyubin, A., Polani, D., and Nehaniv, C. (2005a). All else being equal be empowered. Advances in Artificial Life, pages 744–753.
^ Klyubin, A., Polani, D., and Nehaniv, C. (2005b). Empowerment: A universal agent- centric measure of control. In Evolutionary Computation, 2005. The 2005 IEEE Congress on, volume 1, pages 128–135. IEEE.
^ ^a ^b Salge, C; Glackin, C; Polani, D (2014). "Empowerment -- An Introduction". In Prokopenko, M (ed.). Guided Self-Organization: Inception. Emergence, Complexity and Computation. Vol. 9. Springer. pp. 67–114. arXiv:1310.1863. doi:10.1007/978-3-642-53734-9_4. ISBN 978-3-642-53733-2. S2CID 9662065.
^ ^a ^b ^c Klyubin, A., Polani, D., and Nehaniv, C. (2008). Keep your options open: an information-based driving principle for sensorimotor systems. PLOS ONE, 3(12):e4018. https://dx.doi.org/10.1371%2Fjournal.pone.0004018
^ Capdepuy, P., Polani, D., & Nehaniv, C. L. (2007, April). Maximization of potential information flow as a universal utility for collective behaviour. In 2007 IEEE Symposium on Artificial Life (pp. 207-213). Ieee.
^ Jung, T., Polani, D., & Stone, P. (2011). Empowerment for continuous agent—environment systems. Adaptive Behavior, 19(1), 16-39.
^ Salge, C., Glackin, C., & Polani, D. (2013). Approximation of empowerment in the continuous domain. Advances in Complex Systems, 16(02n03), 1250079.
^ Karl, M., Soelch, M., Becker-Ehmck, P., Benbouzid, D., van der Smagt, P., & Bayer, J. (2017). Unsupervised real-time control through variational empowerment. arXiv preprint arXiv:1710.05101.
^ Mohamed, S., & Rezende, D. J. (2015). Variational information maximisation for intrinsically motivated reinforcement learning. arXiv preprint arXiv:1509.08731.
^ Volpi, N. C., De Palma, D., Polani, D., & Indiveri, G. (2016). Computation of empowerment for an autonomous underwater vehicle. IFAC-PapersOnLine, 49(15), 81-87.

[klyubin2005a-1] Klyubin, A., Polani, D., and Nehaniv, C. (2005a). All else being equal be empowered. Advances in Artificial Life, pages 744–753.

[klyubin2005b-2] Klyubin, A., Polani, D., and Nehaniv, C. (2005b). Empowerment: A universal agent- centric measure of control. In Evolutionary Computation, 2005. The 2005 IEEE Congress on, volume 1, pages 128–135. IEEE.

[salge2014-3] Salge, C; Glackin, C; Polani, D (2014). "Empowerment -- An Introduction". In Prokopenko, M (ed.). Guided Self-Organization: Inception. Emergence, Complexity and Computation. Vol. 9. Springer. pp. 67–114. arXiv:1310.1863. doi:10.1007/978-3-642-53734-9_4. ISBN 978-3-642-53733-2. S2CID 9662065.

[klyubin2008-4] Klyubin, A., Polani, D., and Nehaniv, C. (2008). Keep your options open: an information-based driving principle for sensorimotor systems. PLOS ONE, 3(12):e4018. https://dx.doi.org/10.1371%2Fjournal.pone.0004018

[capdepuy2007-5] Capdepuy, P., Polani, D., & Nehaniv, C. L. (2007, April). Maximization of potential information flow as a universal utility for collective behaviour. In 2007 IEEE Symposium on Artificial Life (pp. 207-213). Ieee.

[jung2011-6] Jung, T., Polani, D., & Stone, P. (2011). Empowerment for continuous agent—environment systems. Adaptive Behavior, 19(1), 16-39.

[salge2013-7] Salge, C., Glackin, C., & Polani, D. (2013). Approximation of empowerment in the continuous domain. Advances in Complex Systems, 16(02n03), 1250079.

[karl2017-8] Karl, M., Soelch, M., Becker-Ehmck, P., Benbouzid, D., van der Smagt, P., & Bayer, J. (2017). Unsupervised real-time control through variational empowerment. arXiv preprint arXiv:1710.05101.

[rezende2015-9] Mohamed, S., & Rezende, D. J. (2015). Variational information maximisation for intrinsically motivated reinforcement learning. arXiv preprint arXiv:1509.08731.

[volpi2016-10] Volpi, N. C., De Palma, D., Polani, D., & Indiveri, G. (2016). Computation of empowerment for an autonomous underwater vehicle. IFAC-PapersOnLine, 49(15), 81-87.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]