Authors: Erik M. Lintunen, Nadia M. Ady, Sebastian Deterding, Christian Guckelsberger
Paper Content:
Page 1:
arXiv:2502.07423v1 [cs.AI] 11 Feb 2025Towards a Formal Theory of the Need for Competence
via Computational Intrinsic Motivation
Erik M. Lintunen (erik.lintunen@aalto.fi),
Nadia M. Ady (nadia.ady@aalto.fi),
Sebastian Deterding (s.deterding@imperial.ac.uk),
Christian Guckelsberger (christian.guckelsberger@aalt o.fi)
Abstract
Computational models offer powerful tools for formalising
psychological theories, making them both testable and appl ica-
ble in digital contexts. However, they remain little used in the
study of motivation within psychology. We focus on the “need
for competence”, postulated as a key basic human need within
Self-Determination Theory (SDT)—arguably the most influen -
tial psychological framework for studying intrinsic motiv ation
(IM). The need for competence is treated as a single construc t
across SDT texts. Yet, recent research has identified multi-
ple, ambiguously defined facets of competence in SDT. We
propose that these inconsistencies may be alleviated by dra w-
ing on computational models from the field of artificial intel li-
gence, specifically from the domain of reinforcement learni ng
(RL). By aligning the aforementioned facets of competence—
effectance, skill use, task performance, and capacity grow th—
with existing RL formalisms, we provide a foundation for ad-
vancing competence-related theory in SDT and motivational
psychology more broadly. The formalisms reveal underlying
preconditions that SDT fails to make explicit, demonstrati ng
how computational models can improve our understanding of
IM. Additionally, our work can support a cycle of theory de-
velopment by inspiring new computational models formalisi ng
aspects of the theory, which can then be tested empirically t o
refine the theory. While our research lays a promising foun-
dation, empirical studies of these models in both humans and
machines are needed, inviting collaboration across discip lines.
Keywords: need for competence; self-determination theory;
intrinsic motivation; motivational psychology; theory de velop-
ment; intrinsically motivated reinforcement learning; co mpu-
tational modelling
Introduction
Self-Determination Theory (SDT) (Deci and Ryan, 1985) is
one of the most influential psychological theories of motiva -
tion, including intrinsic motivation (IM). It has also insp ired
artificial intelligence (AI) research on computational IM ( e.g.,
Oudeyer and Kaplan, 2007). Computational IM equips arti-
ficial agents with abilities for open-ended development, ta sk-
agnostic learning, and dealing with the potential sparsity of
rewards (Colas et al., 2022, p. 1161). Notably, SDT, like
many psychological theories, is expressed in “soft” verbal
propositions. In light of the current theory crisis (Oberau er
and Lewandowsky, 2019), verbal theory has drawn increas-
ing critique: theories that are not formally specified may fa il
to produce hypotheses that can be rigorously tested. While
many psychology researchers have advocated for computa-
tional modelling as a paradigm for theory development (e.g. ,
Marsella et al., 2010; Oberauer and Lewandowsky, 2019;
Robinaugh et al., 2021), formal specification in the form of
precisely defined computational models remains largely ab-
sent in SDT. Judging by the most recent integrative volume ofthe theory, the Oxford Handbook of Self-Determination The-
ory(Ryan, 2023), methodological advances chiefly focus on
new psychometric approaches and neuroscientific evidence.
Here, we make the case that computational models of IM
can benefit the development of SDT. More specifically, De-
terding et al. (2024) have shown that the need for compe-
tence , one of the basic psychological needs underpinning IM
in SDT, is poorly specified, with multiple definitions that la ck
clear operationalisation. They derive four distinct facet s of
competence by means of a detailed analysis of SDT texts
spanning many decades (1985–2023), reproduced in Table 1
(Deterding et al., 2024, pp. 7–10). We focus on matching
computational formalisms to each of these facets.
Table 1: The four distinct facets of competence identified
by Deterding et al. (2024, pp. 7–10).
Effectance (C1) Observing that one’s action
causes consequences,
which can be unintended.
Skill use (C2) Observing (a)an
opportunity (an appropriate
situation) to use a skill
and/or (b)that one uses a
particular capacity in the
course of one’s action.
Task performance (C3) Observing that one
performs (a)well at an
intended task, (b)to an
extent that requires a
certain skill or skill level.
Capacity growth (C4) Observing gain in the (a)
strength and/or (b)range
of one’s skills.
One reason computational models of IM offer a powerful
paradigm for developing SDT is that translating a concept li ke
competence into an algorithmic description induces a diffe r-
ent way of thinking about theory: the process forces us to
specify abstract and informal verbal theories to the point t hat
they can be turned into a system that can actually produce
observable behaviour. Observing how a theory behaves can
“reveal implicit assumptions and hidden complexities” in v er-
Page 2:
bal theory, thus inspiring theory construction (Marsella e t al.,
2010, p. 4). Murayama’s (2022) recent adoption of reinforce -
ment learning (RL) as an underpinning framework for a the-
ory of curiosity is a good case in point, as is Ady et al.’s work
on specific machine curiosity (2022), to name a few.
To advance formal development of SDT and competence-
related theory in motivational psychology more broadly, a k ey
challenge is identifying formalisms that sufficiently appr ox-
imate the theory. In this paper, we align each distinct com-
petence facet with examples of formalisms from the compu-
tational literature. We then discuss how each example for-
malises aspects of the theory. These formalisms not only lay
a foundation for specifying the need for competence but also
unveil a wealth of existing implementations, simulators, a nd
evaluation metrics from the AI literature for use in psycho-
logical motivation research. In future work, these will all ow
us to test whether formalisations fitting the facets, either in-
dividually or in conjunction, produce empirical results th at
align with propositions made by the theory.
Background: Computational IM & RL
We begin by justifying why RL is a suitable domain from
which to draw computational models of competence. Addi-
tionally, to support understanding of the models we discuss ,
we introduce some key concepts from intrinsically motivate d
RL—most notably, goals andskills and their relationship to
the reward function—and recommend further reading.
The idea that IM might be driven by biological reward
systems (Barto et al., 2004, p. 113, in reference to Kakade
and Dayan, 2002; Dayan and Balleine, 2002) has driven the
choice of methods centrally used in computational IM: RL
(Baldassarre, 2022, p. 252). Due to their usefulness, intri nsi-
cally motivated RL methods have been studied extensively in
the computational RL literature at large (see reviews by Col as
et al., 2022; Aubret et al., 2023; Lidayan et al., 2024). An as -
sumption underlying much work in RL is the reward hypoth-
esis: “all of what we mean by goals and purposes can well be
thought of as the maximization of the expected value of the
cumulative sum of a received scalar signal (called reward)”
(Sutton and Barto, 2018, p. 53; cf., Silver et al., 2021, p. 4) .
This hypothesis informs the formalisms used for competence -
based intrinsic motivation (CB-IM), which tend to rely on th e
assumption that every goal can be sufficiently defined by a re-
ward function. Intrinsically motivated RL is characterised by
the use of task-agnostic intrinsic reward functions that co llate
or compare agent-internal variables independently of thei r se-
mantics (Oudeyer and Kaplan, 2008, p. 3; cf., Berlyne, 1965,
p. 246 and Oudeyer and Kaplan, 2007).
Many of our examples operate within the framework of
goal-conditioned reinforcement learning (GCRL) (see revi ew
by Liu et al., 2022), which extends the standard definition of
a reward function to be conditioned on goals. A goal, or a
goal-defining variable , is a parameter to the reward function
(cf., Colas et al., 2022, p. 1165; Aubret et al., 2023, p. 6).
Then, a skill is a policy given a goal, optimising for the returnaccording to the reward conditioned on that goal. Examples
of goal-defining variables include indices (e.g., Eysenbac h
et al., 2019) or elements drawn from a learned distribution
(e.g., Nair et al., 2018). Their role is simply to indicate wh ich
reward function the agent is aiming to maximise.
Goals can be viewed as “a set of constraints ... that the
agent seeks to respect” (Colas et al., 2022, p. 1165, emphasi s
in original). While the most immediate intuition of a goal is
often as a desired state for the agent to reach (e.g., Kaelbli ng,
1993, p. 1; Schaul et al., 2015, p. 2), the formalism allows
for a more general set of constraints on behaviour (Lintunen
et al., 2024b, pp. 22–23). In effect, any behaviour that can b e
defined by attempting to maximise some reward function on
the environment can be formulated as a goal–skill pairing.
Aligning Computational Models With SDT
Based on a review of classic (Oudeyer and Kaplan, 2007) and
state-of-the-art (Colas et al., 2022; Aubret et al., 2023) c om-
putational IM, we next turn our focus to matching computa-
tional formalisms to each of the competence facets (Table 1) .
Our matching is non-exhaustive, and instead we focus on ex-
emplifying how each facet could be formalised.
Effectance (C1)
Let us consider the case where competence in SDT refers
toeffectance , i.e., motivation by observing that one’s own
actions cause change in the environment. To recognise ef-
fectance, an agent needs a mechanism to distinguish whether
a change in the environment is caused by the agent itself or
some external factor. An example of a computational model
fitting effectance is Rewarding Impact-Driven Exploration
(RIDE) (Raileanu and Rockt¨ aschel, 2020), as part of which
the agent is rewarded for the amount of change it causes in its
state observation: the “impact” of the agent’s actions, mea -
sured between two consecutive observations. Specifically, the
impact-driven reward is defined as:
Rt(st,st+1):=/bardblφ(st+1)−φ(st)/bardbl2/radicalbig
N(st+1), (1)
where φ(st)andφ(st+1)are the learned representations of two
consecutive states and N(st+1)represents or approximates
the number (count) of times the latter state has been visited
(Raileanu and Rockt¨ aschel, 2020, pp. 4–5).
In RIDE, the agent computes its reward using the learned
representation of the observation as opposed to the raw sen-
sory space (e.g., pixels). Inspired by Pathak et al. (2017),
the parameters of the model φare optimised so that an agent
learns the features it can control—that is, it learns to igno re
aspects of the environment it cannot affect. Decomposing th e
reward function, the agent is rewarded for a combination of
changes in those state features that it is able to consistent ly
have some effect on (Eq. 1, numerator) and the novelty of the
state into which its action leads (Eq. 1, denominator). The
effect of the denominator is that the reward obtainable from
particular states diminishes the more the agent observes th em,
Page 3:
even if the agent’s action has the same effective outcome as
previously observed. In recent SDT texts, effectance moti-
vation does not require novelty, but the original usage of th e
term required observations to be novel (White, 1959, p. 322) .
By having the reward given by RIDE wane the more a particu-
lar situation is perceived, RIDE offers a simple formalisat ion
of this original novelty aspect of effectance motivation.
Skill use (C2)
We next consider the case in which competence refers to skill
useor the “exercise ...of one’s capacities” (Ryan and Deci,
2017, p. 86). Following Deterding et al. (2024, p. 6), we cove r
the specific understanding of “exercise” as implementing or
bringing something to bear (Merriam-Webster.com, 2025); a n
alternative interpretation as engaging a skill repeatedly in or-
der to grow it will be treated under Capacity growth (C4). As
per Table 1, competence as skill-use motivation can be un-
derstood in two different ways: observing an opportunity (a n
appropriate situation) to use a skill (C2.a) and that one use s a
particular capacity in the course of one’s action (C2.b).
An opportunity (an appropriate situation) to use a skill
(C2.a) Broadly, there are two sets of computational ap-
proaches used to model an agent learning about opportunitie s
to use particular skills: those in which the agent ignores th e
current state of the world and those in which it does not. In
both cases, hierarchy is used: an agent learns a distributio n
over its set of policies (i.e., a high-level policy) to deter mine
which skill to use. In the former case, an agent follows the
same policy regardless of its state observation—in this cas e,
an “appropriate” opportunity can be thought of as a learning
opportunity, determined not by observing the environment,
but from information about how the learning is proceeding.
In the latter case, identifying an opportunity to use a parti cu-
lar skill can also draw on observations of the environment.
Specifically, the first case of C2.a, in which an agent is
learning about opportunities to use a skill while ignoring t he
current state of the world, is well exemplified by computa-
tional approaches using Learning Progress (LP), which ap-
proximates some derivative of performance over time (ini-
tially proposed by Oudeyer et al., 2007, p. 7; see also relate d
earlier work by Schmidhuber, 1991, p. 1461). In systems em-
ploying LP, it is typically used as a measure to support the
selection of intermediate-difficulty goals with respect to the
changing skills of an agent over time. LP, as an intrinsic re-
ward for goal selection, has been well studied across both
machine and human reinforcement learners (e.g., Colas et al .,
2019; Ten et al., 2021; Molinaro et al., 2024), and could thus
offer good starting points for formalising C2.a.
While a reward function prioritises the choice of skills, an
agent also needs a mechanism to select a skill from that prior i-
tisation. One commonly used mechanism is Thompson sam-
pling (Thompson, 1933), also known as probability matching ,
where an agent draws a “belief”—choosing a skill according
to a probability distribution (learned from some reward sig -
nal such as LP values) over its set of skills—and then actsgreedily with respect to the drawn skill until it has more “ev -
idence” (e.g., a change in LP values) and its “belief” (i.e.,
skill-selection policy) is updated. Thompson sampling has
been shown to explain aspects of human exploration in cer-
tain tasks (Gershman, 2018), making it a candidate for a psy-
chologically plausible formalism of human skill selection .
An example of a system formalising the second case of
C2.a, in which an agent learns about opportunities to use
particular skills while accounting for the current state of the
world, is Variational Intrinsic Control (VIC). In VIC, the a p-
propriate skill for a given starting state is decided using a sep-
arate learned model of how empowering the skill has been in
the past. We mean empowering in the sense of its definition in
the AI literature,1as the combined ability of an agent to con-
trol its environment and to sense this control (cf., Salge et al.,
2014, p. 2; Gregor et al., 2016, p. 2). The decision over skill s
is conditional on the state of the world, in that the policy fo r
choosing skills is updated based on how empowering the skill
turned out to be for that given situation . The distribution over
skills is updated during skill use, such that more empowerin g
skills become more likely to be chosen (Gregor et al., 2016,
p. 4), using the reward (Gregor et al., 2016, p. 2):
Rt(s0,st,g):=logqφφφ(g|s0,st)/bracehtipupleft/bracehtipdownright/bracehtipdownleft/bracehtipupright
(α)−logpθθθ(g|s0)/bracehtipupleft/bracehtipdownright/bracehtipdownleft/bracehtipupright
(β). (2)
(α)The discriminator, qφφφ, is an arbitrary variational distri-
bution parametrised by φφφ(Barber and Agakov, 2003, p. 2).
Given the first ( s0) and last ( st) observations induced by the
skill corresponding to the goal being pursued, qφφφdefines a
probability distribution over goals. Successfully discri mi-
nating goals in the observation space requires the agent to
observe distinct regions of the state space. If a goal is not
discriminable based on observations, two or more skills are
producing overlapping behaviours (and therefore, the skil ls
lack diversity); conversely, if a goal is discriminable, th en
the corresponding skill is inducing trajectories unique to
that skill. Thus, this reward term is maximised when there
is no uncertainty in the prediction.
(β)The probability model, pθθθ(g|s0), parametrised by θθθ, rep-
resents the goal-selection policy. As part of the reward, th e
purpose of this term is to reinforce high entropy: the agent
should keep pursuing a wide range of goals, for each start-
ing state, s0, as the reward is higher when the goal had a
low probability of being selected.
Given a starting state, s0, a goal, g, is selected according
to the distribution defined by qφφφ, which the agent pursues un-
til the termination state st. Then, pθθθis reinforced based on
how empowering the corresponding skill was (estimated us-
ing Eq. 2). The final state becomes the new starting state,
s0←sf, and the process is repeated. This can motivate the
1“Empowerment” has also been studied in the context of human
exploration, e.g., as “exploring options that enable the ge neration of
as many more options as possible” (Br¨ andle et al., 2023, p. 1 482).
Page 4:
agent to follow skills that lead to distinct end states ( α) and
learn a wide range of skills for any given situation ( β).
That one uses a particular capacity in the course of one’s
action (C2.b) While C2.a is more about finding suitable op-
portunities to use particular skills, such as in the case of V IC
learning a state-conditional policy over skills, C2.b is mo re
about an agent being motivated by observing that a skill is in
use at all. The first term of the VIC reward function (Eq. 2, α)
can be adapted to emphasise an agent’s ability to distinguis h
that a particular skill is in use at a given time, by only condi -
tioning on the current state. The reward for one such system,
Diversity Is All You Need (DIAYN) (Eysenbach et al., 2019),
can be defined2as:
Rt(st+1,g):=logqφφφ(g|st+1), (3)
where qφφφis an arbitrary variational distribution parametrised
byφφφ(refer back to the text following Eq. 2 for detail). An
agent is rewarded for its ability to distinguish the current goal,
g; thus, the overall return is maximised when the agent can
perfectly infer which of its skills is in use at any given time .
DIAYN, in further differentiation from VIC, ignores the
state of the world in its choice of skill, instead choosing sk ills
uniformly at random—in other words, learning skills that ar e
empowering on average over the state space. For further ex-
amples, one can turn to other variational empowerment meth-
ods (see Choi et al., 2021, p. 3). The choice to fix the distribu -
tion over skills as uniform—although not an intuitive choic e
for a psychologically plausible model of goal selection, ex -
cept maybe in special cases such as the absence of prior
information—is common in the CB-IM literature. This ap-
proach ensures that all skills receive, in expectation, an e qual
amount of training signal to improve. DIAYN learns a diverse
set of skills across a wide array of situations, such as run-
ning, walking, hopping, flipping, or gliding to move through
a space (Eysenbach et al., 2019, p. 5). It thus shows that a
formalism motivated by skill usecan drive skill learning .
Task performance (C3)
Next, we discuss computational models for two possible ways
of understanding task performance : observing that one per-
forms well at an intended task (C3.a) and performing to an
extent that requires a certain skill or skill level (C3.b).
Observing that one performs well at an intended task
(C3.a) Given an agent has set itself a goal (task), if an agent
has a mechanism to estimate its progress towards a chosen
goal, it can use that estimate as a proxy for its own perfor-
mance on the task. An example system of this kind is Rein-
forcement Learning With Imagined Goals (RIG) (Nair et al.,
2018). RIG encodes the agent’s observations into a lower-
dimensional latent space from which the agent samples goals
to pursue. Defining the reward as the negative distance be-
tween the goal and the current state in latent space encourag es
2See Eysenbach et al., 2019, Appendix A, p. 12 for discussion
on a constant term omitted here.the agent to move its observations closer to the goal:
Rt(st+1,g):=−/bardblφ(st+1)−φ(g)/bardblA, (4)
where φ(st+1)is the learned state representation of st+1,φ(g)
denotes the goal drawn from the latent space, and Ais a ma-
trix that can give more importance to some dimensions of the
latent space over others (Nair et al., 2018, p. 5).
In contrast to DIAYN, in which the agent learns to master
a fixed number of skills, in RIG, the agent selects from an in-
finite number of goals (the space is continuous). This result s
in a sort of smoothness of the space that the agent can benefit
from: if one goal is sampled near another that the agent has
already learned to achieve, it may be able to generalise and
use some parts of an existing policy to more efficiently learn
to achieve the new goal. In the RIG system, goals take on
the intuitive meaning as desired states for the agent to reac h
(discussed in Background: Computational IM & RL), and
“performing well” at an intended task is operationalised as
minimising the semantic distance between the agent’s obser -
vations and its self-generated goals. Over time, RIG agents
learn to generate goals similar to the observations they mak e
of their environment, while simultaneously learning the sk ills
necessary to consistently perform well (reach the goals).
Specifically to an extent that requires a certain skill or
skill level (C3.b) While RIG captures the idea of motiva-
tion by performance on a self-selected task (C3.a), it does n ot
take into account the level of skill needed to perform well on
the task (C3.b). For this, we can look to other system designs
inspired by the idea of preferring to focus on goals or skills at
an appropriate level of difficulty, given what the agent alre ady
knows. To identify skills of appropriate difficulty, such sy s-
tems can make use of Automatic Curriculum Learning (ACL)
(for a review see Portelas et al., 2021), enabling an agent to
self-organise their goal selection. This can significantly re-
duce the number of training examples required for an agent
to learn to reach its goals (Florensa et al., 2018, p. 1).
For example, in the Continual Universal Reinforcement
learning with Intrinsically mOtivated sUbstitutionS (CUR I-
OUS) system by Colas et al. (2019), instead of considering
task performance as distance from the goal, the system con-
siders its capability to achieve its goals. Specifically, th e
agent is motivated to pursue goals in certain parts of the sen -
sory space in which it believes to be increasingly or decreas -
ingly successful at achieving its goals, estimated by measu r-
ing how successful it has been in the recent past—that is, its
LP. The underpinning logic of the system is that such goals
will be of optimal difficulty, neither too easy nor too difficu lt.
In the CURIOUS experiments, an agent manipulates cubes by
learning to control a robotic arm (Colas et al., 2019, Supple -
mentary p. 3). Each observation includes information about
the gripper and the objects of interest (e.g., their positio n and
rotation), and each goal is a desired observation. The high-
dimensional observation space makes learning challenging .
Furthermore, some parts of the sensory space are irrelevant
for achieving certain goals—for example, if the goal is to ma -
Page 5:
nipulate one of several cubes, the agent might not need infor -
mation about the other cubes. To ignore such irrelevant in-
formation in its formulation of goals, CURIOUS modularises
the observation space, which is also the goal space, with eac h
module being a specific, hand-defined subspace of the space.
A CURIOUS agent estimates its own LP for each module,
and uses the absolute values of the LP estimates to compute
a probability distribution for selecting the module to gene r-
ate a goal within (Colas et al., 2019, p. 4). This mechanism
helps the agent avoid modules containing goals that are too
difficult or for which it already has sufficient skill, as well as
to return to skills it is forgetting (Colas et al., 2019, pp. 2 –
3). In this way, the selection of different goals is driven by
self-assessment of task performance.
The subjective LP, for a given module, LP(Mi), is defined
as the derivative of subjective competence ,C(Mi), with re-
spect to time, where C(Mi)is defined as the mean over results
from self-evaluated trials performed by the agent (see Cola s
et al., 2019, p. 4). The results are stored in a queue of preset
length, and each result is binary: either 1 for a successful t rial
or 0 for a failure. This can be viewed as the frequentist esti-
mate of the probability of success. Finally, the probabilit y of
choosing a module is computed as:
p(Mi):=ε×1
N+(1−ε)×|LP(Mi)|
∑N
j=1|LP(Mj)|, (5)
in which ε∈[0,1]is a hyperparameter that can be used to
ensure that even poorly performing modules are occasionall y
selected, and Ncorresponds to the total number of modules.
Asεapproaches 1, the probability distribution over modules
resembles a uniform distribution. When εis smaller than 1,
the modules with the highest absolute LP will have the high-
est probability of being selected.
In the same vein, an emerging ACL paradigm, Unsuper-
vised Environment Design (UED) (for a review see Ruther-
ford et al., 2024), formalises the problem of automatically
generating tasks for agents based on some criterion, for in-
stance, the difference between the current agent and (in pra c-
tice, an approximation of) an optimal agent (Rutherford et a l.,
2024, p. 1). In UED, environments can be thought of as goals
for an agent to pursue (each associated with their own re-
ward function). As part of one such system, Open-endedness
via Models of human Notions of Interestingness with Envi-
ronments Programmed in Code (OMNI-EPIC), agents self-
generate novel, interesting, and solvable goals, matched f or
an agent’s current capabilities (Faldor et al., 2024, p. 2).
While a significant proportion of ACL methods optimise
task difficulty or competence progress with respect to cur-
rent capabilities, ACL can also motivate agents to optimise
goal selection based on other criteria that can be interpret ed
as proxies for “skill level” (from C3.b). Pong et al. (2020) e x-
tend RIG (see C3.a) by weighting the selection of previously
visited states used in learning the goal-generation model, en-
couraging the agent to generate goals resembling less fre-
quently visited states. This weighting can be seen as a heuri s-tic for attending to tasks that are expected to be non-trivia l for
the agent. As a second example, Lintunen et al. (2024a) in-
troduce a method motivating agents to select goals based on
how much the agent believes pursuing them will diversify its
entire set of skills. As ACL systems often consider the level
of skill needed to perform well on the task (where the task
could defined as learning a diverse set of skills), autocurri c-
ula methods offer a good starting point for formalising C3.b .
Capacity growth (C4)
Competence, understood as capacity growth , can be inter-
preted in two different ways: observing gain in the strength
(C4.a) and/or range (C4.b) of one’s skills. While we saw that
a reward reflecting a derivative of some measure of perfor-
mance (e.g., LP) can direct the agent towards goals requir-
ing a certain level of skill (C3.b), the same derivative dire ctly
encourages capacity growth or improvement (C4.a). While
these two facets appear to have different meanings, from a
computational perspective, they can be closely aligned.
An example of a system that explores both C4.a and
C4.b—both repeatedly engaging in skills and extending the
agent’s skill repertoire—is Intrinsically Motivated Rein force-
ment Learning (IMRL) described by Barto et al. (2004) and
Singh et al. (2004). IMRL extends the agent’s repertoire of
skills (C4.b): each time a “salient” event is observed, the
agent adds a new skill associated with the salient event.3Af-
ter this first observation, the agent learns how to return to o b-
serve the salient event again from any other situation via ex -
perience over time—this is the new skill associated with the
salient event. Strictly speaking, the skill repertoire exp ands
as a matter of course, not because the agent is intrinsically
rewarded for expanding it.
The intrinsic reward used in the IMRL system (Singh et al.,
2004, p. 4) can be defined as:
Rt(st,st+1,gst+1):∝/braceleftbigg1−Pgst+1(st+1|st)ifst+1is salient
0 otherwise
(6)
where Pgst+1is what Precup and Sutton (1997) call a multi-
time model for skill gst+1.Pgst+1is not quite a probability
estimate for the transition from sttost+1given the agent is
following skill gst+1, but that is close to the correct intuition.
The agent only receives a non-zero intrinsic reward when
it reaches a salient state. The intrinsic reward here can be
thought of as modelling how surprising it is that the agent
reached st+1immediately from state st—which it has just
done—by following the skill gst+1. If the agent expected,
without uncertainty, to transition directly from sttost+1, then
Pgst+1is 1 and the reward is 0 (the transition is very unsurpris-
ing). If the agent did not expect it could transition from stto
st+1by following skill gst+1, then Pgst+1is 0 and the reward is
1 (maximally surprising). If the agent is uncertain, Pgst+1will
3Barto et al. (2004) leave to future work what salient means an d
how to determine whether an event is salient or not. They simp ly
hard-code events like turning on a light or ringing a bell as “ salient.”
Page 6:
be somewhere in between. However, if the agent expected
that it could reach st+1from stby following skill gst+1, just
that it would take more than a single step, Pgst+1will be dis-
counted based on the length of the path; this discounting is
part of how the multi-time model is constructed.
Another example of C4.b, that is, being motivated by grow-
ing the number of skills in the agent’s repertoire, is the afo re-
mentioned VIC (see C2.a). The VIC reward function moti-
vates the agent to grow its number of effective skills: in ex-
pectation, the second term of the reward function (Eq. 2, β)
increases as the distribution over skills becomes more uni-
form. In theory, this encourages the distribution over skil ls
not to collapse to a small number of skills. In practice, sinc e
there are numerous other factors that affect how agents lear n,
this does not always happen (for empirical results see Eysen -
bach et al., 2019, Appendix E, pp. 17–18).
Advancing the Theory of Competence in SDT
Why should human motivation researchers care about the
computational models we have identified? By identifying dis -
tinct ways the different facets of the need for competence in
SDT (Table 1) can be formalised, we better understand the
ways the theory is underspecified. For instance, LP-based
methods (discussed in C2, C3, C4) can exemplify skill use
as an opportunity to use a particular skill (C2.a), task per-
formance that requires a certain skill or skill level (C3.b) ,
and capacity growth in the sense of observing gains in the
strength of one’s skills (C4.a). Given that SDT does not ac-
knowledge distinct facets, finding such a formalism that fits
multiple facets supports the theory. On the other hand, some
facets are formally distinct. For example, it is unclear how
effectance (C1), being motivated by extending one’s reper-
toire of skills (C4.b), or being motivated by observing that
one performs well at an intended task (C3.a) relate to each
other and other facets. Different interpretations of the th eory
might lead to drastically different operationalisations o f the
need for competence. Our research can alleviate this prob-
lem by uncovering ways the theory conflates formally distinc t
concepts, thus prompting review of the theory.
Another reason human motivation researchers should care
about this work is that the computational formalisms reveal
underlying preconditions that SDT fails to make explicit,
such as that a competence-motivated agent is able to recog-
nise when actions cause effects (C1) and differentiate be-
tween distinct skills (C2, C3, C4). Similarly, concepts fro m
intrinsically motivated RL, such as novelty, diversity, an d
progress towards goals, underlie competence-driven learn ing
in computational contexts, yet are often overlooked in SDT.
Our research shines light on mechanisms used to computa-
tionally fulfil such preconditions, and these could be inves -
tigated for further insights. Additionally, our work can su p-
port a cycle of theory development by inspiring new compu-
tational models formalising aspects of the theory, which ca n
then be tested empirically to refine the theory.Future Work and Conclusion
For computational IM to be effective in informing theory on
motivation, more empirical work is needed in the comparison
and evaluation of computational IM. While some work exists
comparing behaviour induced by different intrinsic reward
functions (e.g., Santucci et al., 2012; Biehl et al., 2018; L inke
et al., 2020), as does work studying variables such as the
choice of representation (e.g., Burda et al., 2019), more re -
search is needed on the properties of different models, such as
how they behave in varying environments and bodies. Show-
ing that RL formalisms can be aligned with SDT uncovers a
wealth of resources for motivation research, such as existi ng
implementations of models, simulators, and evaluation met -
rics. These resources can enable large-scale simulation st ud-
ies to explore psychologically plausible computational IM s
and how they may explain human data, thus contributing to
formal competence-related theory. However, we need work
to decide how to effectively compare artificial and biologic al
agents (a project partially undertaken by researchers in repre-
sentational alignment ; e.g., Sucholutsky et al., 2024).
Further research is also required to understand the extent
to which computational IM research intersects with intrins i-
cally motivated learning as understood in psychology. Earl y
work on computational IM emerged at the intersection of de-
velopmental psychology and robotics, attempting to model
intrinsically motivated human behaviours, but a large port ion
of state-of-the-art intrinsically motivated RL is not nece ssar-
ily aligned with relevant goals of understanding the human
mind or developing safe (human-aligned) AI systems.
As most formalisms we identified relate to the pursuit of
goals, we need to better understand how humans represent
and generate goals (Colas et al., 2024, p. 1). The topic has
not been well studied for various reasons, including compu-
tational intractability (Byers et al., 2024, p. 1). Excitin gly,
goals have recently emerged as a multidisciplinary area of r e-
search, cutting across AI, cognitive science, and developm en-
tal psychology (e.g., Colas et al., 2022; Molinaro and Colli ns,
2023; Chu et al., 2024; Davidson et al., 2024). Recent work
by Molinaro and Collins (2023, p. 1150) suggests that goals
affect many agent dynamics, such as representing the envi-
ronment, choosing relevant actions, and evaluating reward s.
In summary, we build on recent research arguing how the
need for competence is at present poorly specified in SDT.
We do so by identifying computational formalisms to inspire
empirical work to resolve this issue. We show that the four
distinct facets of competence—effectance, skill use, task per-
formance, and capacity growth—can be aligned with exist-
ing formalisms from intrinsically motivated RL, highlight ing
the potential of computational models to provide deeper in-
sights into competence-motivated behaviours. While this i s
a promising first step, further research is required to empir -
ically study these computational models, inviting collabo ra-
tion from motivation researchers across disciplines.
Page 7:
Acknowledgments
We thank the members of the Intrinsically Motivation Open-
ended Learning (IMOL) community for engaging in the work
and for providing feedback in Paris (2023) and Vancouver
(2024). We also thank the members of the Autotelic In-
teraction Research (AIR) group for feedback throughout the
project. EML and CG received financial support from the
Research Council of Finland (NEXT-IM, grant 349036) and
NMA from the Helsinki Institute for Information Technology .
References
Ady, N. M., Shariff, R., G¨ unther, J., and Pilarski, P. M.
(2022). Five properties of specific curiosity you didn’t
know curious machines should have. arXiv preprint
arXiv:2212.00187 .
Aubret, A., Matignon, L., and Hassas, S. (2023). An
information-theoretic perspective on intrinsic motivati on in
reinforcement learning: A survey. Entropy , 25(2).
Baldassarre, G. (2022). Intrinsic motivations for open-en ded
learning. In Cognitive Robotics , page 251–270. The MIT
Press.
Barber, D. and Agakov, F. (2003). The IM algorithm: a varia-
tional approach to information maximization. In Advances
in Neural Information Processing Systems , pages 201–208.
Barto, A. G., Singh, S., and Chentanez, N. (2004). Intrin-
sically motivated learning of hierarchical collections of
skills. In International Conference on Development and
Learning , pages 112–120.
Berlyne, D. (1965). Structure and Direction in Thinking . John
Wiley & Sons.
Biehl, M., Guckelsberger, C., Salge, C., Smith, S. C., and
Polani, D. (2018). Expanding the active inference land-
scape: More intrinsic motivations in the perception-actio n
loop. Frontiers in Neurorobotics , 12.
Br¨ andle, F., Stocks, L. J., Tenenbaum, J. B., Gershman, S. J .,
and Schulz, E. (2023). Empowerment contributes to explo-
ration behaviour in a creative video game. Nature Human
Behaviour , 7(9):1481–1489.
Burda, Y ., Edwards, H., Pathak, D., Storkey, A., Darrell, T. ,
and Efros, A. A. (2019). Large-scale study of curiosity-
driven learning. In International Conference on Learning
Representations .
Byers, J. B., Zhao, B., and Niv, Y . (2024). Modeling goal se-
lection with program synthesis. In Intrinsically Motivated
Open-ended Learning workshop at NeurIPS 2024 .
Choi, J., Sharma, A., Lee, H., Levine, S., and Gu, S. S.
(2021). Variational empowerment as representation learn-
ing for goal-conditioned reinforcement learning. In Inter-
national Conference on Machine Learning , pages 1953–
1963.
Chu, J., Tenenbaum, J. B., and Schulz, L. E. (2024). In praise
of folly: flexible goals and human cognition. Trends in
Cognitive Sciences , 28(7):628–642.Colas, C., Chu, J., Molinaro, G., and Hawkins, R. (2024).
What should i do now? goal-centric outlooks on learning,
exploration, and communication. In Proceedings of the An-
nual Meeting of the Cognitive Science Society .
Colas, C., Fournier, P., Chetouani, M., Sigaud, O., and
Oudeyer, P.-Y . (2019). CURIOUS: Intrinsically moti-
vated modular multi-goal reinforcement learning. In Inter-
national Conference on Machine Learning , pages 1331–
1340.
Colas, C., Karch, T., Sigaud, O., and Oudeyer, P. (2022). Au-
totelic agents with intrinsically motivated goal-conditi oned
reinforcement learning: a short survey. Journal of Artificial
Intelligence Research , 74:1159–1199.
Davidson, G., Todd, G., Togelius, J., Gureckis, T. M., and
Lake, B. M. (2024). Goals as reward-producing programs.
arXiv preprint arXiv:2405.13242 .
Dayan, P. and Balleine, B. W. (2002). Reward, motivation,
and reinforcement learning. Neuron , 36(2):285–298.
Deci, E. L. and Ryan, R. M. (1985). Intrinsic Motivation and
Self-Determination in Human Behavior . Springer US.
Deterding, S., Lintunen, E. M., Guckelsberger, C., and Ady,
N. M. (2024). Why self-determination theory needs for-
mal modelling: The case of competence and balanced chal-
lenge. PsyArXiv preprint 10.31234/osf.io/n6x8s .
Eysenbach, B., Gupta, A., Ibarz, J., and Levine, S. (2019).
Diversity is all you need: Learning skills without a reward
function. In International Conference on Learning Repre-
sentations .
Faldor, M., Zhang, J., Cully, A., and Clune, J. (2024). Omni-
epic: Open-endedness via models of human notions of
interestingness with environments programmed in code.
arXiv preprint arXiv:2405.15568 .
Florensa, C., Held, D., Geng, X., and Abbeel, P. (2018). Au-
tomatic goal generation for reinforcement learning agents .
InInternational Conference on Machine Learning , pages
1515–1528.
Gershman, S. J. (2018). Deconstructing the human algo-
rithms for exploration. Cognition , 173:34–42.
Gregor, K., Rezende, D. J., and Wierstra, D. (2016). Varia-
tional intrinsic control. arXiv preprint arXiv:1611.07507 .
Kaelbling, L. P. (1993). Learning to achieve goals. In Inter-
national Joint Conference on Artificial Intelligence , pages
1094–8.
Kakade, S. and Dayan, P. (2002). Dopamine: generalization
and bonuses. Neural Networks , 15(4):549–559.
Lidayan, A., Dennis, M., and Russell, S. (2024). BAMDP
shaping: a unified theoretical framework for intrin-
sic motivation and reward shaping. arXiv preprint
arXiv:2409.05358 .
Linke, C., Ady, N. M., White, M., Degris, T., and White, A.
(2020). Adapting behavior via intrinsic reward: A survey
and empirical study. Journal of Artificial Intelligence Re-
search , 69:1287–1332.
Page 8:
Lintunen, E. M., Ady, N. M., and Guckelsberger, C. (2024a).
Diversity progress for goal selection in discriminability -
motivated RL. In Intrinsically Motivated Open-ended
Learning workshop at NeurIPS 2024 .
Lintunen, E. M., Ady, N. M., Guckelsberger, C., and De-
terding, S. (2024b). Advancing self-determination theory
with computational intrinsic motivation: The case of com-
petence. PsyArXiv preprint 10.31234/osf.io/7qy35 .
Liu, M., Zhu, M., and Zhang, W. (2022). Goal-conditioned
reinforcement learning: Problems and solutions. In Inter-
national Joint Conference on Artificial Intelligence , pages
5502–5511.
Marsella, S., Gratch, J., and Petta, P. (2010). Computation al
models of emotion. In A Blueprint for Affective Comput-
ing: A sourcebook and manual . Oxford University Press.
Merriam-Webster.com (2025). Retrieved from
https://www.merriam-webster.com.
Molinaro, G., Colas, C., Oudeyer, P.-Y ., and Collins, A.
(2024). Latent learning progress drives autonomous goal
selection in human reinforcement learning. In Advances in
Neural Information Processing Systems .
Molinaro, G. and Collins, A. G. (2023). A goal-centric
outlook on learning. Trends in Cognitive Sciences ,
27(12):1150–1164.
Murayama, K. (2022). A reward-learning framework of
knowledge acquisition: An integrated account of curios-
ity, interest, and intrinsic–extrinsic rewards. Psychological
Review , 129(1):175–198.
Nair, A. V ., Pong, V ., Dalal, M., Bahl, S., Lin, S., and Levine ,
S. (2018). Visual reinforcement learning with imagined
goals. In Advances in Neural Information Processing Sys-
tems, pages 9191–9200.
Oberauer, K. and Lewandowsky, S. (2019). Addressing the
theory crisis in psychology. Psychonomic Bulletin & Re-
view, 26(5):1596–1618.
Oudeyer, P.-Y . and Kaplan, F. (2007). What is intrinsic mo-
tivation? A typology of computational approaches. Fron-
tiers in Neurorobotics , 1:1–14. Article 6.
Oudeyer, P.-Y . and Kaplan, F. (2008). How can we define
intrinsic motivation? In International Conference on Epi-
genetic Robotics .
Oudeyer, P.-Y ., Kaplan, F., and Hafner, V . V . (2007). In-
trinsic motivation systems for autonomous mental devel-
opment. IEEE Transactions on Evolutionary Computation ,
11(2):265–286.
Pathak, D., Agrawal, P., Efros, A. A., and Darrell, T. (2017) .
Curiosity-driven exploration by self-supervised predict ion.
InInternational Conference on Machine Learning , pages
2778–2787.
Pong, V ., Dalal, M., Lin, S., Nair, A., Bahl, S., and Levine, S .
(2020). Skew-fit: State-covering self-supervised reinfor ce-
ment learning. In International Conference on Machine
Learning , pages 7783–7792.Portelas, R., Colas, C., Weng, L., Hofmann, K., and Oudeyer,
P.-Y . (2021). Automatic curriculum learning for deep rl: a
short survey. In International Joint Conference on Artifi-
cial Intelligence , pages 4819–4825.
Precup, D. and Sutton, R. S. (1997). Multi-time models for
temporally abstract planning. In Advances in Neural Infor-
mation Processing Systems , pages 1050–1056.
Raileanu, R. and Rockt¨ aschel, T. (2020). RIDE: Rewarding
impact-driven exploration for procedurally-generated en vi-
ronments. In International Conference on Learning Rep-
resentations .
Robinaugh, D. J., Haslbeck, J. M. B., Ryan, O., Fried, E. I.,
and Waldorp, L. J. (2021). Invisible Hands and Fine
Calipers: A Call to Use Formal Theory as a Toolkit for
Theory Construction. Perspectives on Psychological Sci-
ence, 16(4):725–743.
Rutherford, A., Beukman, M., Willi, T., Lacerda, B., Hawes,
N., and Foerster, J. N. (2024). No regrets: Investigating an d
improving regret approximations for curriculum discovery .
InAdvances in Neural Information Processing Systems .
Ryan, R. M. (2023). The Oxford Handbook of Self-
Determination Theory . Oxford University Press.
Ryan, R. M. and Deci, E. L. (2017). Self-determination the-
ory: Basic psychological needs in motivation, develop-
ment, and wellness . Guilford.
Salge, C., Glackin, C., and Polani, D. (2014). Empowerment–
an introduction. In Guided Self-Organization: Inception ,
pages 67–114. Springer Berlin Heidelberg.
Santucci, V . G., Baldassarre, G., and Mirolli, M. (2012). In -
trinsic motivation mechanisms for competence acquisition .
InInternational Conference on Development and Learning
and Epigenetic Robotics , pages 1–6.
Schaul, T., Horgan, D., Gregor, K., and Silver, D. (2015).
Universal value function approximators. In International
Conference on Machine Learning , pages 1312–1320.
Schmidhuber, J. (1991). Curious model-building control sy s-
tems. In IEEE International Joint Conference on Neural
Networks , pages 1458–1463.
Silver, D., Singh, S., Precup, D., and Sutton, R. S. (2021).
Reward is enough. Artificial Intelligence , 299.
Singh, S., Barto, A., and Chentanez, N. (2004). Intrinsical ly
motivated reinforcement learning. In Advances in Neural
Information Processing Systems , pages 1281–1288.
Sucholutsky, I., Muttenthaler, L., Weller, A., Peng, A., Bo bu,
A., Kim, B., Love, B. C., Cueva, C. J., Grant, E., Groen,
I., Achterberg, J., Tenenbaum, J. B., Collins, K. M., Her-
mann, K. L., Oktar, K., Greff, K., Hebart, M. N., Cloos,
N., Kriegeskorte, N., Jacoby, N., Zhang, Q., Marjieh, R.,
Geirhos, R., Chen, S., Kornblith, S., Rane, S., Konkle,
T., O’Connell, T. P., Unterthiner, T., Lampinen, A. K.,
M¨ uller, K.-R., Toneva, M., and Griffiths, T. L. (2024). Get-
ting aligned on representational alignment. arXiv preprint
arXiv:2310.13018 .
Page 9:
Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learn-
ing: An Introduction . The MIT Press, second edition.
Ten, A., Kaushik, P., Oudeyer, P.-Y ., and Gottlieb, J. (2021 ).
Humans monitor learning progress in curiosity-driven ex-
ploration. Nature communications , 12(1):5972.
Thompson, W. R. (1933). On the likelihood that one unknown
probability exceeds another in view of the evidence of two
samples. Biometrika , 25(3–4):285–294.
White, R. W. (1959). Motivation reconsidered: The concept
of competence. Psychological Review , 66(5):297–333.