Open AGI Codes | Your Codes Reflect!

Updates

Generating audio...

arxiv

Paper 2502.07423

Towards a Formal Theory of the Need for Competence via Computational Intrinsic Motivation

Authors: Erik M. Lintunen, Nadia M. Ady, Sebastian Deterding, Christian Guckelsberger

Published: 2025-02-11

Abstract:

Computational models offer powerful tools for formalising psychological theories, making them both testable and applicable in digital contexts. However, they remain little used in the study of motivation within psychology. We focus on the "need for competence", postulated as a key basic human need within Self-Determination Theory (SDT) -- arguably the most influential psychological framework for studying intrinsic motivation (IM). The need for competence is treated as a single construct across SDT texts. Yet, recent research has identified multiple, ambiguously defined facets of competence in SDT. We propose that these inconsistencies may be alleviated by drawing on computational models from the field of artificial intelligence, specifically from the domain of reinforcement learning (RL). By aligning the aforementioned facets of competence -- effectance, skill use, task performance, and capacity growth -- with existing RL formalisms, we provide a foundation for advancing competence-related theory in SDT and motivational psychology more broadly. The formalisms reveal underlying preconditions that SDT fails to make explicit, demonstrating how computational models can improve our understanding of IM. Additionally, our work can support a cycle of theory development by inspiring new computational models formalising aspects of the theory, which can then be tested empirically to refine the theory. While our research lays a promising foundation, empirical studies of these models in both humans and machines are needed, inviting collaboration across disciplines.

Paper Content:

Page 1: arXiv:2502.07423v1 [cs.AI] 11 Feb 2025Towards a Formal Theory of the Need for Competence via Computational Intrinsic Motivation Erik M. Lintunen (erik.lintunen@aalto.ﬁ), Nadia M. Ady (nadia.ady@aalto.ﬁ), Sebastian Deterding (s.deterding@imperial.ac.uk), Christian Guckelsberger (christian.guckelsberger@aalt o.ﬁ) Abstract Computational models offer powerful tools for formalising psychological theories, making them both testable and appl ica- ble in digital contexts. However, they remain little used in the study of motivation within psychology. We focus on the “need for competence”, postulated as a key basic human need within Self-Determination Theory (SDT)—arguably the most inﬂuen - tial psychological framework for studying intrinsic motiv ation (IM). The need for competence is treated as a single construc t across SDT texts. Yet, recent research has identiﬁed multi- ple, ambiguously deﬁned facets of competence in SDT. We propose that these inconsistencies may be alleviated by dra w- ing on computational models from the ﬁeld of artiﬁcial intel li- gence, speciﬁcally from the domain of reinforcement learni ng (RL). By aligning the aforementioned facets of competence— effectance, skill use, task performance, and capacity grow th— with existing RL formalisms, we provide a foundation for ad- vancing competence-related theory in SDT and motivational psychology more broadly. The formalisms reveal underlying preconditions that SDT fails to make explicit, demonstrati ng how computational models can improve our understanding of IM. Additionally, our work can support a cycle of theory de- velopment by inspiring new computational models formalisi ng aspects of the theory, which can then be tested empirically t o reﬁne the theory. While our research lays a promising foun- dation, empirical studies of these models in both humans and machines are needed, inviting collaboration across discip lines. Keywords: need for competence; self-determination theory; intrinsic motivation; motivational psychology; theory de velop- ment; intrinsically motivated reinforcement learning; co mpu- tational modelling Introduction Self-Determination Theory (SDT) (Deci and Ryan, 1985) is one of the most inﬂuential psychological theories of motiva - tion, including intrinsic motivation (IM). It has also insp ired artiﬁcial intelligence (AI) research on computational IM ( e.g., Oudeyer and Kaplan, 2007). Computational IM equips arti- ﬁcial agents with abilities for open-ended development, ta sk- agnostic learning, and dealing with the potential sparsity of rewards (Colas et al., 2022, p. 1161). Notably, SDT, like many psychological theories, is expressed in “soft” verbal propositions. In light of the current theory crisis (Oberau er and Lewandowsky, 2019), verbal theory has drawn increas- ing critique: theories that are not formally speciﬁed may fa il to produce hypotheses that can be rigorously tested. While many psychology researchers have advocated for computa- tional modelling as a paradigm for theory development (e.g. , Marsella et al., 2010; Oberauer and Lewandowsky, 2019; Robinaugh et al., 2021), formal speciﬁcation in the form of precisely deﬁned computational models remains largely ab- sent in SDT. Judging by the most recent integrative volume ofthe theory, the Oxford Handbook of Self-Determination The- ory(Ryan, 2023), methodological advances chieﬂy focus on new psychometric approaches and neuroscientiﬁc evidence. Here, we make the case that computational models of IM can beneﬁt the development of SDT. More speciﬁcally, De- terding et al. (2024) have shown that the need for compe- tence , one of the basic psychological needs underpinning IM in SDT, is poorly speciﬁed, with multiple deﬁnitions that la ck clear operationalisation. They derive four distinct facet s of competence by means of a detailed analysis of SDT texts spanning many decades (1985–2023), reproduced in Table 1 (Deterding et al., 2024, pp. 7–10). We focus on matching computational formalisms to each of these facets. Table 1: The four distinct facets of competence identiﬁed by Deterding et al. (2024, pp. 7–10). Effectance (C1) Observing that one’s action causes consequences, which can be unintended. Skill use (C2) Observing (a)an opportunity (an appropriate situation) to use a skill and/or (b)that one uses a particular capacity in the course of one’s action. Task performance (C3) Observing that one performs (a)well at an intended task, (b)to an extent that requires a certain skill or skill level. Capacity growth (C4) Observing gain in the (a) strength and/or (b)range of one’s skills. One reason computational models of IM offer a powerful paradigm for developing SDT is that translating a concept li ke competence into an algorithmic description induces a diffe r- ent way of thinking about theory: the process forces us to specify abstract and informal verbal theories to the point t hat they can be turned into a system that can actually produce observable behaviour. Observing how a theory behaves can “reveal implicit assumptions and hidden complexities” in v er- Page 2: bal theory, thus inspiring theory construction (Marsella e t al., 2010, p. 4). Murayama’s (2022) recent adoption of reinforce - ment learning (RL) as an underpinning framework for a the- ory of curiosity is a good case in point, as is Ady et al.’s work on speciﬁc machine curiosity (2022), to name a few. To advance formal development of SDT and competence- related theory in motivational psychology more broadly, a k ey challenge is identifying formalisms that sufﬁciently appr ox- imate the theory. In this paper, we align each distinct com- petence facet with examples of formalisms from the compu- tational literature. We then discuss how each example for- malises aspects of the theory. These formalisms not only lay a foundation for specifying the need for competence but also unveil a wealth of existing implementations, simulators, a nd evaluation metrics from the AI literature for use in psycho- logical motivation research. In future work, these will all ow us to test whether formalisations ﬁtting the facets, either in- dividually or in conjunction, produce empirical results th at align with propositions made by the theory. Background: Computational IM & RL We begin by justifying why RL is a suitable domain from which to draw computational models of competence. Addi- tionally, to support understanding of the models we discuss , we introduce some key concepts from intrinsically motivate d RL—most notably, goals andskills and their relationship to the reward function—and recommend further reading. The idea that IM might be driven by biological reward systems (Barto et al., 2004, p. 113, in reference to Kakade and Dayan, 2002; Dayan and Balleine, 2002) has driven the choice of methods centrally used in computational IM: RL (Baldassarre, 2022, p. 252). Due to their usefulness, intri nsi- cally motivated RL methods have been studied extensively in the computational RL literature at large (see reviews by Col as et al., 2022; Aubret et al., 2023; Lidayan et al., 2024). An as - sumption underlying much work in RL is the reward hypoth- esis: “all of what we mean by goals and purposes can well be thought of as the maximization of the expected value of the cumulative sum of a received scalar signal (called reward)” (Sutton and Barto, 2018, p. 53; cf., Silver et al., 2021, p. 4) . This hypothesis informs the formalisms used for competence - based intrinsic motivation (CB-IM), which tend to rely on th e assumption that every goal can be sufﬁciently deﬁned by a re- ward function. Intrinsically motivated RL is characterised by the use of task-agnostic intrinsic reward functions that co llate or compare agent-internal variables independently of thei r se- mantics (Oudeyer and Kaplan, 2008, p. 3; cf., Berlyne, 1965, p. 246 and Oudeyer and Kaplan, 2007). Many of our examples operate within the framework of goal-conditioned reinforcement learning (GCRL) (see revi ew by Liu et al., 2022), which extends the standard deﬁnition of a reward function to be conditioned on goals. A goal, or a goal-deﬁning variable , is a parameter to the reward function (cf., Colas et al., 2022, p. 1165; Aubret et al., 2023, p. 6). Then, a skill is a policy given a goal, optimising for the returnaccording to the reward conditioned on that goal. Examples of goal-deﬁning variables include indices (e.g., Eysenbac h et al., 2019) or elements drawn from a learned distribution (e.g., Nair et al., 2018). Their role is simply to indicate wh ich reward function the agent is aiming to maximise. Goals can be viewed as “a set of constraints ... that the agent seeks to respect” (Colas et al., 2022, p. 1165, emphasi s in original). While the most immediate intuition of a goal is often as a desired state for the agent to reach (e.g., Kaelbli ng, 1993, p. 1; Schaul et al., 2015, p. 2), the formalism allows for a more general set of constraints on behaviour (Lintunen et al., 2024b, pp. 22–23). In effect, any behaviour that can b e deﬁned by attempting to maximise some reward function on the environment can be formulated as a goal–skill pairing. Aligning Computational Models With SDT Based on a review of classic (Oudeyer and Kaplan, 2007) and state-of-the-art (Colas et al., 2022; Aubret et al., 2023) c om- putational IM, we next turn our focus to matching computa- tional formalisms to each of the competence facets (Table 1) . Our matching is non-exhaustive, and instead we focus on ex- emplifying how each facet could be formalised. Effectance (C1) Let us consider the case where competence in SDT refers toeffectance , i.e., motivation by observing that one’s own actions cause change in the environment. To recognise ef- fectance, an agent needs a mechanism to distinguish whether a change in the environment is caused by the agent itself or some external factor. An example of a computational model ﬁtting effectance is Rewarding Impact-Driven Exploration (RIDE) (Raileanu and Rockt¨ aschel, 2020), as part of which the agent is rewarded for the amount of change it causes in its state observation: the “impact” of the agent’s actions, mea - sured between two consecutive observations. Speciﬁcally, the impact-driven reward is deﬁned as: Rt(st,st+1):=/bardblφ(st+1)−φ(st)/bardbl2/radicalbig N(st+1), (1) where φ(st)andφ(st+1)are the learned representations of two consecutive states and N(st+1)represents or approximates the number (count) of times the latter state has been visited (Raileanu and Rockt¨ aschel, 2020, pp. 4–5). In RIDE, the agent computes its reward using the learned representation of the observation as opposed to the raw sen- sory space (e.g., pixels). Inspired by Pathak et al. (2017), the parameters of the model φare optimised so that an agent learns the features it can control—that is, it learns to igno re aspects of the environment it cannot affect. Decomposing th e reward function, the agent is rewarded for a combination of changes in those state features that it is able to consistent ly have some effect on (Eq. 1, numerator) and the novelty of the state into which its action leads (Eq. 1, denominator). The effect of the denominator is that the reward obtainable from particular states diminishes the more the agent observes th em, Page 3: even if the agent’s action has the same effective outcome as previously observed. In recent SDT texts, effectance moti- vation does not require novelty, but the original usage of th e term required observations to be novel (White, 1959, p. 322) . By having the reward given by RIDE wane the more a particu- lar situation is perceived, RIDE offers a simple formalisat ion of this original novelty aspect of effectance motivation. Skill use (C2) We next consider the case in which competence refers to skill useor the “exercise ...of one’s capacities” (Ryan and Deci, 2017, p. 86). Following Deterding et al. (2024, p. 6), we cove r the speciﬁc understanding of “exercise” as implementing or bringing something to bear (Merriam-Webster.com, 2025); a n alternative interpretation as engaging a skill repeatedly in or- der to grow it will be treated under Capacity growth (C4). As per Table 1, competence as skill-use motivation can be un- derstood in two different ways: observing an opportunity (a n appropriate situation) to use a skill (C2.a) and that one use s a particular capacity in the course of one’s action (C2.b). An opportunity (an appropriate situation) to use a skill (C2.a) Broadly, there are two sets of computational ap- proaches used to model an agent learning about opportunitie s to use particular skills: those in which the agent ignores th e current state of the world and those in which it does not. In both cases, hierarchy is used: an agent learns a distributio n over its set of policies (i.e., a high-level policy) to deter mine which skill to use. In the former case, an agent follows the same policy regardless of its state observation—in this cas e, an “appropriate” opportunity can be thought of as a learning opportunity, determined not by observing the environment, but from information about how the learning is proceeding. In the latter case, identifying an opportunity to use a parti cu- lar skill can also draw on observations of the environment. Speciﬁcally, the ﬁrst case of C2.a, in which an agent is learning about opportunities to use a skill while ignoring t he current state of the world, is well exempliﬁed by computa- tional approaches using Learning Progress (LP), which ap- proximates some derivative of performance over time (ini- tially proposed by Oudeyer et al., 2007, p. 7; see also relate d earlier work by Schmidhuber, 1991, p. 1461). In systems em- ploying LP, it is typically used as a measure to support the selection of intermediate-difﬁculty goals with respect to the changing skills of an agent over time. LP, as an intrinsic re- ward for goal selection, has been well studied across both machine and human reinforcement learners (e.g., Colas et al ., 2019; Ten et al., 2021; Molinaro et al., 2024), and could thus offer good starting points for formalising C2.a. While a reward function prioritises the choice of skills, an agent also needs a mechanism to select a skill from that prior i- tisation. One commonly used mechanism is Thompson sam- pling (Thompson, 1933), also known as probability matching , where an agent draws a “belief”—choosing a skill according to a probability distribution (learned from some reward sig - nal such as LP values) over its set of skills—and then actsgreedily with respect to the drawn skill until it has more “ev - idence” (e.g., a change in LP values) and its “belief” (i.e., skill-selection policy) is updated. Thompson sampling has been shown to explain aspects of human exploration in cer- tain tasks (Gershman, 2018), making it a candidate for a psy- chologically plausible formalism of human skill selection . An example of a system formalising the second case of C2.a, in which an agent learns about opportunities to use particular skills while accounting for the current state of the world, is Variational Intrinsic Control (VIC). In VIC, the a p- propriate skill for a given starting state is decided using a sep- arate learned model of how empowering the skill has been in the past. We mean empowering in the sense of its deﬁnition in the AI literature,1as the combined ability of an agent to con- trol its environment and to sense this control (cf., Salge et al., 2014, p. 2; Gregor et al., 2016, p. 2). The decision over skill s is conditional on the state of the world, in that the policy fo r choosing skills is updated based on how empowering the skill turned out to be for that given situation . The distribution over skills is updated during skill use, such that more empowerin g skills become more likely to be chosen (Gregor et al., 2016, p. 4), using the reward (Gregor et al., 2016, p. 2): Rt(s0,st,g):=logqφφφ(g|s0,st)/bracehtipupleft/bracehtipdownright/bracehtipdownleft/bracehtipupright (α)−logpθθθ(g|s0)/bracehtipupleft/bracehtipdownright/bracehtipdownleft/bracehtipupright (β). (2) (α)The discriminator, qφφφ, is an arbitrary variational distri- bution parametrised by φφφ(Barber and Agakov, 2003, p. 2). Given the ﬁrst ( s0) and last ( st) observations induced by the skill corresponding to the goal being pursued, qφφφdeﬁnes a probability distribution over goals. Successfully discri mi- nating goals in the observation space requires the agent to observe distinct regions of the state space. If a goal is not discriminable based on observations, two or more skills are producing overlapping behaviours (and therefore, the skil ls lack diversity); conversely, if a goal is discriminable, th en the corresponding skill is inducing trajectories unique to that skill. Thus, this reward term is maximised when there is no uncertainty in the prediction. (β)The probability model, pθθθ(g|s0), parametrised by θθθ, rep- resents the goal-selection policy. As part of the reward, th e purpose of this term is to reinforce high entropy: the agent should keep pursuing a wide range of goals, for each start- ing state, s0, as the reward is higher when the goal had a low probability of being selected. Given a starting state, s0, a goal, g, is selected according to the distribution deﬁned by qφφφ, which the agent pursues un- til the termination state st. Then, pθθθis reinforced based on how empowering the corresponding skill was (estimated us- ing Eq. 2). The ﬁnal state becomes the new starting state, s0←sf, and the process is repeated. This can motivate the 1“Empowerment” has also been studied in the context of human exploration, e.g., as “exploring options that enable the ge neration of as many more options as possible” (Br¨ andle et al., 2023, p. 1 482). Page 4: agent to follow skills that lead to distinct end states ( α) and learn a wide range of skills for any given situation ( β). That one uses a particular capacity in the course of one’s action (C2.b) While C2.a is more about ﬁnding suitable op- portunities to use particular skills, such as in the case of V IC learning a state-conditional policy over skills, C2.b is mo re about an agent being motivated by observing that a skill is in use at all. The ﬁrst term of the VIC reward function (Eq. 2, α) can be adapted to emphasise an agent’s ability to distinguis h that a particular skill is in use at a given time, by only condi - tioning on the current state. The reward for one such system, Diversity Is All You Need (DIAYN) (Eysenbach et al., 2019), can be deﬁned2as: Rt(st+1,g):=logqφφφ(g|st+1), (3) where qφφφis an arbitrary variational distribution parametrised byφφφ(refer back to the text following Eq. 2 for detail). An agent is rewarded for its ability to distinguish the current goal, g; thus, the overall return is maximised when the agent can perfectly infer which of its skills is in use at any given time . DIAYN, in further differentiation from VIC, ignores the state of the world in its choice of skill, instead choosing sk ills uniformly at random—in other words, learning skills that ar e empowering on average over the state space. For further ex- amples, one can turn to other variational empowerment meth- ods (see Choi et al., 2021, p. 3). The choice to ﬁx the distribu - tion over skills as uniform—although not an intuitive choic e for a psychologically plausible model of goal selection, ex - cept maybe in special cases such as the absence of prior information—is common in the CB-IM literature. This ap- proach ensures that all skills receive, in expectation, an e qual amount of training signal to improve. DIAYN learns a diverse set of skills across a wide array of situations, such as run- ning, walking, hopping, ﬂipping, or gliding to move through a space (Eysenbach et al., 2019, p. 5). It thus shows that a formalism motivated by skill usecan drive skill learning . Task performance (C3) Next, we discuss computational models for two possible ways of understanding task performance : observing that one per- forms well at an intended task (C3.a) and performing to an extent that requires a certain skill or skill level (C3.b). Observing that one performs well at an intended task (C3.a) Given an agent has set itself a goal (task), if an agent has a mechanism to estimate its progress towards a chosen goal, it can use that estimate as a proxy for its own perfor- mance on the task. An example system of this kind is Rein- forcement Learning With Imagined Goals (RIG) (Nair et al., 2018). RIG encodes the agent’s observations into a lower- dimensional latent space from which the agent samples goals to pursue. Deﬁning the reward as the negative distance be- tween the goal and the current state in latent space encourag es 2See Eysenbach et al., 2019, Appendix A, p. 12 for discussion on a constant term omitted here.the agent to move its observations closer to the goal: Rt(st+1,g):=−/bardblφ(st+1)−φ(g)/bardblA, (4) where φ(st+1)is the learned state representation of st+1,φ(g) denotes the goal drawn from the latent space, and Ais a ma- trix that can give more importance to some dimensions of the latent space over others (Nair et al., 2018, p. 5). In contrast to DIAYN, in which the agent learns to master a ﬁxed number of skills, in RIG, the agent selects from an in- ﬁnite number of goals (the space is continuous). This result s in a sort of smoothness of the space that the agent can beneﬁt from: if one goal is sampled near another that the agent has already learned to achieve, it may be able to generalise and use some parts of an existing policy to more efﬁciently learn to achieve the new goal. In the RIG system, goals take on the intuitive meaning as desired states for the agent to reac h (discussed in Background: Computational IM & RL), and “performing well” at an intended task is operationalised as minimising the semantic distance between the agent’s obser - vations and its self-generated goals. Over time, RIG agents learn to generate goals similar to the observations they mak e of their environment, while simultaneously learning the sk ills necessary to consistently perform well (reach the goals). Speciﬁcally to an extent that requires a certain skill or skill level (C3.b) While RIG captures the idea of motiva- tion by performance on a self-selected task (C3.a), it does n ot take into account the level of skill needed to perform well on the task (C3.b). For this, we can look to other system designs inspired by the idea of preferring to focus on goals or skills at an appropriate level of difﬁculty, given what the agent alre ady knows. To identify skills of appropriate difﬁculty, such sy s- tems can make use of Automatic Curriculum Learning (ACL) (for a review see Portelas et al., 2021), enabling an agent to self-organise their goal selection. This can signiﬁcantly re- duce the number of training examples required for an agent to learn to reach its goals (Florensa et al., 2018, p. 1). For example, in the Continual Universal Reinforcement learning with Intrinsically mOtivated sUbstitutionS (CUR I- OUS) system by Colas et al. (2019), instead of considering task performance as distance from the goal, the system con- siders its capability to achieve its goals. Speciﬁcally, th e agent is motivated to pursue goals in certain parts of the sen - sory space in which it believes to be increasingly or decreas - ingly successful at achieving its goals, estimated by measu r- ing how successful it has been in the recent past—that is, its LP. The underpinning logic of the system is that such goals will be of optimal difﬁculty, neither too easy nor too difﬁcu lt. In the CURIOUS experiments, an agent manipulates cubes by learning to control a robotic arm (Colas et al., 2019, Supple - mentary p. 3). Each observation includes information about the gripper and the objects of interest (e.g., their positio n and rotation), and each goal is a desired observation. The high- dimensional observation space makes learning challenging . Furthermore, some parts of the sensory space are irrelevant for achieving certain goals—for example, if the goal is to ma - Page 5: nipulate one of several cubes, the agent might not need infor - mation about the other cubes. To ignore such irrelevant in- formation in its formulation of goals, CURIOUS modularises the observation space, which is also the goal space, with eac h module being a speciﬁc, hand-deﬁned subspace of the space. A CURIOUS agent estimates its own LP for each module, and uses the absolute values of the LP estimates to compute a probability distribution for selecting the module to gene r- ate a goal within (Colas et al., 2019, p. 4). This mechanism helps the agent avoid modules containing goals that are too difﬁcult or for which it already has sufﬁcient skill, as well as to return to skills it is forgetting (Colas et al., 2019, pp. 2 – 3). In this way, the selection of different goals is driven by self-assessment of task performance. The subjective LP, for a given module, LP(Mi), is deﬁned as the derivative of subjective competence ,C(Mi), with re- spect to time, where C(Mi)is deﬁned as the mean over results from self-evaluated trials performed by the agent (see Cola s et al., 2019, p. 4). The results are stored in a queue of preset length, and each result is binary: either 1 for a successful t rial or 0 for a failure. This can be viewed as the frequentist esti- mate of the probability of success. Finally, the probabilit y of choosing a module is computed as: p(Mi):=ε×1 N+(1−ε)×|LP(Mi)| ∑N j=1|LP(Mj)|, (5) in which ε∈[0,1]is a hyperparameter that can be used to ensure that even poorly performing modules are occasionall y selected, and Ncorresponds to the total number of modules. Asεapproaches 1, the probability distribution over modules resembles a uniform distribution. When εis smaller than 1, the modules with the highest absolute LP will have the high- est probability of being selected. In the same vein, an emerging ACL paradigm, Unsuper- vised Environment Design (UED) (for a review see Ruther- ford et al., 2024), formalises the problem of automatically generating tasks for agents based on some criterion, for in- stance, the difference between the current agent and (in pra c- tice, an approximation of) an optimal agent (Rutherford et a l., 2024, p. 1). In UED, environments can be thought of as goals for an agent to pursue (each associated with their own re- ward function). As part of one such system, Open-endedness via Models of human Notions of Interestingness with Envi- ronments Programmed in Code (OMNI-EPIC), agents self- generate novel, interesting, and solvable goals, matched f or an agent’s current capabilities (Faldor et al., 2024, p. 2). While a signiﬁcant proportion of ACL methods optimise task difﬁculty or competence progress with respect to cur- rent capabilities, ACL can also motivate agents to optimise goal selection based on other criteria that can be interpret ed as proxies for “skill level” (from C3.b). Pong et al. (2020) e x- tend RIG (see C3.a) by weighting the selection of previously visited states used in learning the goal-generation model, en- couraging the agent to generate goals resembling less fre- quently visited states. This weighting can be seen as a heuri s-tic for attending to tasks that are expected to be non-trivia l for the agent. As a second example, Lintunen et al. (2024a) in- troduce a method motivating agents to select goals based on how much the agent believes pursuing them will diversify its entire set of skills. As ACL systems often consider the level of skill needed to perform well on the task (where the task could deﬁned as learning a diverse set of skills), autocurri c- ula methods offer a good starting point for formalising C3.b . Capacity growth (C4) Competence, understood as capacity growth , can be inter- preted in two different ways: observing gain in the strength (C4.a) and/or range (C4.b) of one’s skills. While we saw that a reward reﬂecting a derivative of some measure of perfor- mance (e.g., LP) can direct the agent towards goals requir- ing a certain level of skill (C3.b), the same derivative dire ctly encourages capacity growth or improvement (C4.a). While these two facets appear to have different meanings, from a computational perspective, they can be closely aligned. An example of a system that explores both C4.a and C4.b—both repeatedly engaging in skills and extending the agent’s skill repertoire—is Intrinsically Motivated Rein force- ment Learning (IMRL) described by Barto et al. (2004) and Singh et al. (2004). IMRL extends the agent’s repertoire of skills (C4.b): each time a “salient” event is observed, the agent adds a new skill associated with the salient event.3Af- ter this ﬁrst observation, the agent learns how to return to o b- serve the salient event again from any other situation via ex - perience over time—this is the new skill associated with the salient event. Strictly speaking, the skill repertoire exp ands as a matter of course, not because the agent is intrinsically rewarded for expanding it. The intrinsic reward used in the IMRL system (Singh et al., 2004, p. 4) can be deﬁned as: Rt(st,st+1,gst+1):∝/braceleftbigg1−Pgst+1(st+1|st)ifst+1is salient 0 otherwise (6) where Pgst+1is what Precup and Sutton (1997) call a multi- time model for skill gst+1.Pgst+1is not quite a probability estimate for the transition from sttost+1given the agent is following skill gst+1, but that is close to the correct intuition. The agent only receives a non-zero intrinsic reward when it reaches a salient state. The intrinsic reward here can be thought of as modelling how surprising it is that the agent reached st+1immediately from state st—which it has just done—by following the skill gst+1. If the agent expected, without uncertainty, to transition directly from sttost+1, then Pgst+1is 1 and the reward is 0 (the transition is very unsurpris- ing). If the agent did not expect it could transition from stto st+1by following skill gst+1, then Pgst+1is 0 and the reward is 1 (maximally surprising). If the agent is uncertain, Pgst+1will 3Barto et al. (2004) leave to future work what salient means an d how to determine whether an event is salient or not. They simp ly hard-code events like turning on a light or ringing a bell as “ salient.” Page 6: be somewhere in between. However, if the agent expected that it could reach st+1from stby following skill gst+1, just that it would take more than a single step, Pgst+1will be dis- counted based on the length of the path; this discounting is part of how the multi-time model is constructed. Another example of C4.b, that is, being motivated by grow- ing the number of skills in the agent’s repertoire, is the afo re- mentioned VIC (see C2.a). The VIC reward function moti- vates the agent to grow its number of effective skills: in ex- pectation, the second term of the reward function (Eq. 2, β) increases as the distribution over skills becomes more uni- form. In theory, this encourages the distribution over skil ls not to collapse to a small number of skills. In practice, sinc e there are numerous other factors that affect how agents lear n, this does not always happen (for empirical results see Eysen - bach et al., 2019, Appendix E, pp. 17–18). Advancing the Theory of Competence in SDT Why should human motivation researchers care about the computational models we have identiﬁed? By identifying dis - tinct ways the different facets of the need for competence in SDT (Table 1) can be formalised, we better understand the ways the theory is underspeciﬁed. For instance, LP-based methods (discussed in C2, C3, C4) can exemplify skill use as an opportunity to use a particular skill (C2.a), task per- formance that requires a certain skill or skill level (C3.b) , and capacity growth in the sense of observing gains in the strength of one’s skills (C4.a). Given that SDT does not ac- knowledge distinct facets, ﬁnding such a formalism that ﬁts multiple facets supports the theory. On the other hand, some facets are formally distinct. For example, it is unclear how effectance (C1), being motivated by extending one’s reper- toire of skills (C4.b), or being motivated by observing that one performs well at an intended task (C3.a) relate to each other and other facets. Different interpretations of the th eory might lead to drastically different operationalisations o f the need for competence. Our research can alleviate this prob- lem by uncovering ways the theory conﬂates formally distinc t concepts, thus prompting review of the theory. Another reason human motivation researchers should care about this work is that the computational formalisms reveal underlying preconditions that SDT fails to make explicit, such as that a competence-motivated agent is able to recog- nise when actions cause effects (C1) and differentiate be- tween distinct skills (C2, C3, C4). Similarly, concepts fro m intrinsically motivated RL, such as novelty, diversity, an d progress towards goals, underlie competence-driven learn ing in computational contexts, yet are often overlooked in SDT. Our research shines light on mechanisms used to computa- tionally fulﬁl such preconditions, and these could be inves - tigated for further insights. Additionally, our work can su p- port a cycle of theory development by inspiring new compu- tational models formalising aspects of the theory, which ca n then be tested empirically to reﬁne the theory.Future Work and Conclusion For computational IM to be effective in informing theory on motivation, more empirical work is needed in the comparison and evaluation of computational IM. While some work exists comparing behaviour induced by different intrinsic reward functions (e.g., Santucci et al., 2012; Biehl et al., 2018; L inke et al., 2020), as does work studying variables such as the choice of representation (e.g., Burda et al., 2019), more re - search is needed on the properties of different models, such as how they behave in varying environments and bodies. Show- ing that RL formalisms can be aligned with SDT uncovers a wealth of resources for motivation research, such as existi ng implementations of models, simulators, and evaluation met - rics. These resources can enable large-scale simulation st ud- ies to explore psychologically plausible computational IM s and how they may explain human data, thus contributing to formal competence-related theory. However, we need work to decide how to effectively compare artiﬁcial and biologic al agents (a project partially undertaken by researchers in repre- sentational alignment ; e.g., Sucholutsky et al., 2024). Further research is also required to understand the extent to which computational IM research intersects with intrins i- cally motivated learning as understood in psychology. Earl y work on computational IM emerged at the intersection of de- velopmental psychology and robotics, attempting to model intrinsically motivated human behaviours, but a large port ion of state-of-the-art intrinsically motivated RL is not nece ssar- ily aligned with relevant goals of understanding the human mind or developing safe (human-aligned) AI systems. As most formalisms we identiﬁed relate to the pursuit of goals, we need to better understand how humans represent and generate goals (Colas et al., 2024, p. 1). The topic has not been well studied for various reasons, including compu- tational intractability (Byers et al., 2024, p. 1). Excitin gly, goals have recently emerged as a multidisciplinary area of r e- search, cutting across AI, cognitive science, and developm en- tal psychology (e.g., Colas et al., 2022; Molinaro and Colli ns, 2023; Chu et al., 2024; Davidson et al., 2024). Recent work by Molinaro and Collins (2023, p. 1150) suggests that goals affect many agent dynamics, such as representing the envi- ronment, choosing relevant actions, and evaluating reward s. In summary, we build on recent research arguing how the need for competence is at present poorly speciﬁed in SDT. We do so by identifying computational formalisms to inspire empirical work to resolve this issue. We show that the four distinct facets of competence—effectance, skill use, task per- formance, and capacity growth—can be aligned with exist- ing formalisms from intrinsically motivated RL, highlight ing the potential of computational models to provide deeper in- sights into competence-motivated behaviours. While this i s a promising ﬁrst step, further research is required to empir - ically study these computational models, inviting collabo ra- tion from motivation researchers across disciplines. Page 7: Acknowledgments We thank the members of the Intrinsically Motivation Open- ended Learning (IMOL) community for engaging in the work and for providing feedback in Paris (2023) and Vancouver (2024). We also thank the members of the Autotelic In- teraction Research (AIR) group for feedback throughout the project. EML and CG received ﬁnancial support from the Research Council of Finland (NEXT-IM, grant 349036) and NMA from the Helsinki Institute for Information Technology . References Ady, N. M., Shariff, R., G¨ unther, J., and Pilarski, P. M. (2022). Five properties of speciﬁc curiosity you didn’t know curious machines should have. arXiv preprint arXiv:2212.00187 . Aubret, A., Matignon, L., and Hassas, S. (2023). An information-theoretic perspective on intrinsic motivati on in reinforcement learning: A survey. Entropy , 25(2). Baldassarre, G. (2022). Intrinsic motivations for open-en ded learning. In Cognitive Robotics , page 251–270. The MIT Press. Barber, D. and Agakov, F. (2003). The IM algorithm: a varia- tional approach to information maximization. In Advances in Neural Information Processing Systems , pages 201–208. Barto, A. G., Singh, S., and Chentanez, N. (2004). Intrin- sically motivated learning of hierarchical collections of skills. In International Conference on Development and Learning , pages 112–120. Berlyne, D. (1965). Structure and Direction in Thinking . John Wiley & Sons. Biehl, M., Guckelsberger, C., Salge, C., Smith, S. C., and Polani, D. (2018). Expanding the active inference land- scape: More intrinsic motivations in the perception-actio n loop. Frontiers in Neurorobotics , 12. Br¨ andle, F., Stocks, L. J., Tenenbaum, J. B., Gershman, S. J ., and Schulz, E. (2023). Empowerment contributes to explo- ration behaviour in a creative video game. Nature Human Behaviour , 7(9):1481–1489. Burda, Y ., Edwards, H., Pathak, D., Storkey, A., Darrell, T. , and Efros, A. A. (2019). Large-scale study of curiosity- driven learning. In International Conference on Learning Representations . Byers, J. B., Zhao, B., and Niv, Y . (2024). Modeling goal se- lection with program synthesis. In Intrinsically Motivated Open-ended Learning workshop at NeurIPS 2024 . Choi, J., Sharma, A., Lee, H., Levine, S., and Gu, S. S. (2021). Variational empowerment as representation learn- ing for goal-conditioned reinforcement learning. In Inter- national Conference on Machine Learning , pages 1953– 1963. Chu, J., Tenenbaum, J. B., and Schulz, L. E. (2024). In praise of folly: ﬂexible goals and human cognition. Trends in Cognitive Sciences , 28(7):628–642.Colas, C., Chu, J., Molinaro, G., and Hawkins, R. (2024). What should i do now? goal-centric outlooks on learning, exploration, and communication. In Proceedings of the An- nual Meeting of the Cognitive Science Society . Colas, C., Fournier, P., Chetouani, M., Sigaud, O., and Oudeyer, P.-Y . (2019). CURIOUS: Intrinsically moti- vated modular multi-goal reinforcement learning. In Inter- national Conference on Machine Learning , pages 1331– 1340. Colas, C., Karch, T., Sigaud, O., and Oudeyer, P. (2022). Au- totelic agents with intrinsically motivated goal-conditi oned reinforcement learning: a short survey. Journal of Artiﬁcial Intelligence Research , 74:1159–1199. Davidson, G., Todd, G., Togelius, J., Gureckis, T. M., and Lake, B. M. (2024). Goals as reward-producing programs. arXiv preprint arXiv:2405.13242 . Dayan, P. and Balleine, B. W. (2002). Reward, motivation, and reinforcement learning. Neuron , 36(2):285–298. Deci, E. L. and Ryan, R. M. (1985). Intrinsic Motivation and Self-Determination in Human Behavior . Springer US. Deterding, S., Lintunen, E. M., Guckelsberger, C., and Ady, N. M. (2024). Why self-determination theory needs for- mal modelling: The case of competence and balanced chal- lenge. PsyArXiv preprint 10.31234/osf.io/n6x8s . Eysenbach, B., Gupta, A., Ibarz, J., and Levine, S. (2019). Diversity is all you need: Learning skills without a reward function. In International Conference on Learning Repre- sentations . Faldor, M., Zhang, J., Cully, A., and Clune, J. (2024). Omni- epic: Open-endedness via models of human notions of interestingness with environments programmed in code. arXiv preprint arXiv:2405.15568 . Florensa, C., Held, D., Geng, X., and Abbeel, P. (2018). Au- tomatic goal generation for reinforcement learning agents . InInternational Conference on Machine Learning , pages 1515–1528. Gershman, S. J. (2018). Deconstructing the human algo- rithms for exploration. Cognition , 173:34–42. Gregor, K., Rezende, D. J., and Wierstra, D. (2016). Varia- tional intrinsic control. arXiv preprint arXiv:1611.07507 . Kaelbling, L. P. (1993). Learning to achieve goals. In Inter- national Joint Conference on Artiﬁcial Intelligence , pages 1094–8. Kakade, S. and Dayan, P. (2002). Dopamine: generalization and bonuses. Neural Networks , 15(4):549–559. Lidayan, A., Dennis, M., and Russell, S. (2024). BAMDP shaping: a uniﬁed theoretical framework for intrin- sic motivation and reward shaping. arXiv preprint arXiv:2409.05358 . Linke, C., Ady, N. M., White, M., Degris, T., and White, A. (2020). Adapting behavior via intrinsic reward: A survey and empirical study. Journal of Artiﬁcial Intelligence Re- search , 69:1287–1332. Page 8: Lintunen, E. M., Ady, N. M., and Guckelsberger, C. (2024a). Diversity progress for goal selection in discriminability - motivated RL. In Intrinsically Motivated Open-ended Learning workshop at NeurIPS 2024 . Lintunen, E. M., Ady, N. M., Guckelsberger, C., and De- terding, S. (2024b). Advancing self-determination theory with computational intrinsic motivation: The case of com- petence. PsyArXiv preprint 10.31234/osf.io/7qy35 . Liu, M., Zhu, M., and Zhang, W. (2022). Goal-conditioned reinforcement learning: Problems and solutions. In Inter- national Joint Conference on Artiﬁcial Intelligence , pages 5502–5511. Marsella, S., Gratch, J., and Petta, P. (2010). Computation al models of emotion. In A Blueprint for Affective Comput- ing: A sourcebook and manual . Oxford University Press. Merriam-Webster.com (2025). Retrieved from https://www.merriam-webster.com. Molinaro, G., Colas, C., Oudeyer, P.-Y ., and Collins, A. (2024). Latent learning progress drives autonomous goal selection in human reinforcement learning. In Advances in Neural Information Processing Systems . Molinaro, G. and Collins, A. G. (2023). A goal-centric outlook on learning. Trends in Cognitive Sciences , 27(12):1150–1164. Murayama, K. (2022). A reward-learning framework of knowledge acquisition: An integrated account of curios- ity, interest, and intrinsic–extrinsic rewards. Psychological Review , 129(1):175–198. Nair, A. V ., Pong, V ., Dalal, M., Bahl, S., Lin, S., and Levine , S. (2018). Visual reinforcement learning with imagined goals. In Advances in Neural Information Processing Sys- tems, pages 9191–9200. Oberauer, K. and Lewandowsky, S. (2019). Addressing the theory crisis in psychology. Psychonomic Bulletin & Re- view, 26(5):1596–1618. Oudeyer, P.-Y . and Kaplan, F. (2007). What is intrinsic mo- tivation? A typology of computational approaches. Fron- tiers in Neurorobotics , 1:1–14. Article 6. Oudeyer, P.-Y . and Kaplan, F. (2008). How can we deﬁne intrinsic motivation? In International Conference on Epi- genetic Robotics . Oudeyer, P.-Y ., Kaplan, F., and Hafner, V . V . (2007). In- trinsic motivation systems for autonomous mental devel- opment. IEEE Transactions on Evolutionary Computation , 11(2):265–286. Pathak, D., Agrawal, P., Efros, A. A., and Darrell, T. (2017) . Curiosity-driven exploration by self-supervised predict ion. InInternational Conference on Machine Learning , pages 2778–2787. Pong, V ., Dalal, M., Lin, S., Nair, A., Bahl, S., and Levine, S . (2020). Skew-ﬁt: State-covering self-supervised reinfor ce- ment learning. In International Conference on Machine Learning , pages 7783–7792.Portelas, R., Colas, C., Weng, L., Hofmann, K., and Oudeyer, P.-Y . (2021). Automatic curriculum learning for deep rl: a short survey. In International Joint Conference on Artiﬁ- cial Intelligence , pages 4819–4825. Precup, D. and Sutton, R. S. (1997). Multi-time models for temporally abstract planning. In Advances in Neural Infor- mation Processing Systems , pages 1050–1056. Raileanu, R. and Rockt¨ aschel, T. (2020). RIDE: Rewarding impact-driven exploration for procedurally-generated en vi- ronments. In International Conference on Learning Rep- resentations . Robinaugh, D. J., Haslbeck, J. M. B., Ryan, O., Fried, E. I., and Waldorp, L. J. (2021). Invisible Hands and Fine Calipers: A Call to Use Formal Theory as a Toolkit for Theory Construction. Perspectives on Psychological Sci- ence, 16(4):725–743. Rutherford, A., Beukman, M., Willi, T., Lacerda, B., Hawes, N., and Foerster, J. N. (2024). No regrets: Investigating an d improving regret approximations for curriculum discovery . InAdvances in Neural Information Processing Systems . Ryan, R. M. (2023). The Oxford Handbook of Self- Determination Theory . Oxford University Press. Ryan, R. M. and Deci, E. L. (2017). Self-determination the- ory: Basic psychological needs in motivation, develop- ment, and wellness . Guilford. Salge, C., Glackin, C., and Polani, D. (2014). Empowerment– an introduction. In Guided Self-Organization: Inception , pages 67–114. Springer Berlin Heidelberg. Santucci, V . G., Baldassarre, G., and Mirolli, M. (2012). In - trinsic motivation mechanisms for competence acquisition . InInternational Conference on Development and Learning and Epigenetic Robotics , pages 1–6. Schaul, T., Horgan, D., Gregor, K., and Silver, D. (2015). Universal value function approximators. In International Conference on Machine Learning , pages 1312–1320. Schmidhuber, J. (1991). Curious model-building control sy s- tems. In IEEE International Joint Conference on Neural Networks , pages 1458–1463. Silver, D., Singh, S., Precup, D., and Sutton, R. S. (2021). Reward is enough. Artiﬁcial Intelligence , 299. Singh, S., Barto, A., and Chentanez, N. (2004). Intrinsical ly motivated reinforcement learning. In Advances in Neural Information Processing Systems , pages 1281–1288. Sucholutsky, I., Muttenthaler, L., Weller, A., Peng, A., Bo bu, A., Kim, B., Love, B. C., Cueva, C. J., Grant, E., Groen, I., Achterberg, J., Tenenbaum, J. B., Collins, K. M., Her- mann, K. L., Oktar, K., Greff, K., Hebart, M. N., Cloos, N., Kriegeskorte, N., Jacoby, N., Zhang, Q., Marjieh, R., Geirhos, R., Chen, S., Kornblith, S., Rane, S., Konkle, T., O’Connell, T. P., Unterthiner, T., Lampinen, A. K., M¨ uller, K.-R., Toneva, M., and Grifﬁths, T. L. (2024). Get- ting aligned on representational alignment. arXiv preprint arXiv:2310.13018 . Page 9: Sutton, R. S. and Barto, A. G. (2018). Reinforcement Learn- ing: An Introduction . The MIT Press, second edition. Ten, A., Kaushik, P., Oudeyer, P.-Y ., and Gottlieb, J. (2021 ). Humans monitor learning progress in curiosity-driven ex- ploration. Nature communications , 12(1):5972. Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika , 25(3–4):285–294. White, R. W. (1959). Motivation reconsidered: The concept of competence. Psychological Review , 66(5):297–333.