Page 1:
The Influence of Prior Discourse on Conversational Agent-Driven
Decision-Making
STEPHEN PILLI, University College Dublin, Ireland
VIVEK NALLUR, University College Dublin, Ireland
Persuasion through conversation has been the focus of much research. Nudging is a popular strategy to influence decision-making in
physical and digital settings. However, conversational agents employing “nudging” have not received significant attention. We explore
the manifestation of cognitive biases—the underlying psychological mechanisms of nudging—and investigate how the complexity of
prior dialogue tasks impacts decision-making facilitated by conversational agents. Our research used a between-group experimental
design, involving 756 participants randomly assigned to either a simple or complex task before encountering a decision-making
scenario. Three scenarios were adapted from Samuelsonś classic experiments on status-quo bias, the underlying mechanism of default
nudges. Our results aligned with previous studies in two out of three simple-task scenarios. Increasing task complexity consistently
shifted effect-sizes toward our hypothesis, though bias was significant in only one case. These findings inform conversational nudging
strategies and highlight inherent biases relevant to behavioural economics.
CCS Concepts: •Do Not Use This Code →Generate the Correct Terms for Your Paper ;Generate the Correct Terms for Your
Paper ; Generate the Correct Terms for Your Paper; Generate the Correct Terms for Your Paper.
Additional Key Words and Phrases: Do, Not, Us, This, Code, Put, the, Correct, Terms, for, Your, Paper
ACM Reference Format:
Stephen Pilli and Vivek Nallur. 2018. The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making. In Proceedings
of Make sure to enter the correct conference title from your rights confirmation email (Conference acronym ’XX). ACM, New York, NY,
USA, 20 pages. https://doi.org/XXXXXXX.XXXXXXX
1 Introduction
To say that computers permeate every aspect of our modern life, is now trite. Digital interfaces mediate interactions with
loved ones, transport, medical checkups, entertainment, and sometimes even food and sexual choices. The function and
form of these interfaces influence not only our view of the world, but also how we influence the world. Our decisions
on which action to take, which option to ignore, and what aspect of a problem to pay attention to, are all affected
by the contextual elements within which the decision scenario appears. When these contextual elements are created,
enhanced, elided, magnified, or deleted by digital interfaces, it would be reasonable to say (we hope) that the digital
interface is critically affecting the decision itself.
In its simplest form, decision-making requires a decision scenario, and set of alternatives to choose from. The
presentation of alternatives invariably influences the decision-maker, and therefore even a random or ‘unthoughtful’
Authors’ Contact Information: Stephen Pilli, stephen.pilli@ucdconnect.ie, University College Dublin, Dublin, Ireland; Vivek Nallur, vivek.nallur@ucd.ie,
University College Dublin, Dublin, Ireland.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components
of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on
servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org.
©2018 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Manuscript submitted to ACM
Manuscript submitted to ACM 1arXiv:2503.04692v1 [cs.HC] 6 Mar 2025
Page 2:
2 Pilli et al.
presentation creates an impact (whether intended or not). This has led to terms like ‘choice architecture’ being used to
characterise the presentation of alternatives in decision scenarios [ 29]. An etiology of decision-making reveals that
the alternatives we pay attention to, are impacted by the cognitive biases that all human beings have [ 31]. These have
often been called shortcuts in decision-making [ 16]. At this juncture, it is important to note that these shortcuts are not
necessarily good or bad, in a traditional sense of evaluating decisions. Rather they pertain to the speed with which
certain pathways in the brain get activated in particular decision-making contexts. These shortcuts arise from human
experiences, emotions, beliefs, and preferences, with different kinds of decisions being affected by different biases.
Other common factors include cognitive limitations (such as memory), information limitation, emotions, feelings, and
social norms [ 11]. Over a hundred cognitive biases have been catalogued over the years, in the field of behavioural
economics [ 5]. For example, status quo bias is the preference for the maintenance of oneś current or previous state of
affairs, or a preference to not undertake any action to change this current or previous state [ 25]. An instance of where
such a bias has been used in the real world is the mobile payment app Square, where customers are required to actively
choose a “no tipping” option if they decide not to leave a tip. Such a choice architecture has resulted in increased
tipping, particularly in situations where tipping has traditionally been minimal or uncommon [ 4]. Such an arrangement
of alternatives exploits individuals’ resistance to change, and thereby leading them to stick with the default suggested
tip amount.
A particular form of digital interface that has seen a rapid rise in popularity is the AI-enhanced conversational
agent, also called a chatbot. Chatbots have seen adoption in domains as diverse as financial decision-making to ca-
reer counselling to psychometric testing [ 10] [7] [19]. Fluency in language generation, using LLMs, has significantly
advanced the acceptance of chatbots as a viable digital interface. In conversational choice environments like task-
oriented conversational agents ( i.e., a chatbot built for a specific task), decision-making occurs in the context of a
dialogue. A user is cognitively involved in making a decision and depends on the chatbot, both for informational
context as well as articulation of alternatives. In these contexts, it is trivial for the chatbot to use particular kinds
of utterances to exploit the likelihood of a cognitive bias. We use ‘likelihood’ advisedly, since all human beings are
not affected by the same set of biases. Nor are they activated in the same situations. Again, we wish to reiterate that
we make no value-judgement on these interactions; the chatbot could very well use utterances that diminish the
likelihood of the bias, or use the bias to guide the human towards a good decision. An utterance, or a sequence of
utterances, could use multiple factors such as emotional valence or dialogue complexity to activate (or diminish) the bias.
Based on this, we pose the following research question.
How does cognitive load, induced by the task complexity, influence susceptibility to cognitive biases in
decision-making scenarios, presented by a task-oriented chatbot?
To address this research question, we create a prototype of a task-oriented chatbot that can engage with individuals
on a preference elicitation task. The chatbot is designed to present decision scenarios that are adapted from Samuelson’s
classic experiments with status-quo bias [ 25]. First, to establish a baseline, we attempt to replicate status-quo bias
using a chatbot. A simple preference elicitation task is employed for the prior dialogues. Subsequently, in our second
set of experiments we increase the task complexity of the prior dialogue and observe how it influences the status-
quo bias against the baseline. We repeat the experiments with three distinct decision scenarios: budget allocation,
investment decision-making, and college job selection. These scenarios have been previously replicated by Xiao et
Manuscript submitted to ACM
Page 3:
The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making 3
al. [33], establishing their suitability as testcases. Through these experiments, we establish a replicable relation between
dialogue task complexity of a task-oriented chatbot and the manifestation of status-quo bias in subsequent decision
scenarios.
The rest of this paper is structured as follows: the Background section introduces key concepts such as decision-
making in conversational choice environments, status-quo bias, and cognitive load. The Related Work section reviews
previous research and experiment in the intersection of cognitive biases and conversational agents, highlighting gaps in
existing studies. The Hypothesis Development section presents hypotheses, focusing on the influence of task complexity
on decision-making. The Methodology section explains the experimental design, including chatbot implementation,
decision scenarios, and cognitive load measurement. It also describes data collection and quality control procedures.
The Results section presents findings from the experiments, showing how task complexity affects cognitive load and
decision-making. The Conclusion and Future Work section summarises key insights and suggests areas for further
research, such as refining chatbot designs, testing different biases, and improving experimental methods.
2 Background
We begin by introducing the Conversational Choice Environment , where task-oriented dialogue systems facilitate decision-
making through structured interactions. Next, we examine Status Quo Bias , a cognitive bias that leads individuals to
favour existing states over change, even when better alternatives are available. We discuss foundational experiments
demonstrating the presence of this bias and its implications in various decision-making contexts. Following this, we
introduce Cognitive Load , which refers to the mental effort required to process information. We explore its impact on
task performance and decision-making, particularly in chatbot interactions. We review studies on how cognitive load
can be manipulated through dialogue design and how it affects user experience.
2.1 Conversational Choice Environment
Task-oriented dialogue systems are designed to have algorithmic yet natural language interaction with humans.
Primarily, the dialogue is designed to accomplish a defined set of goals or tasks, with various factors contributing to its
success. A plethora of studies suggest that aspects of a chatbot dialogue such as, anthropomorphism, personality, trust,
and many other factors play a key role improving human-like dialogue [6].
AConversational Choice Environment is specific instance within a dialogue facilitated by the conversational decision-
maker interface technologies - such as chatbots, personal assistants, and virtual assistants - that require people to make
judgements or decisions. A CCE primarily consists of two parts: the first is the decision scenario1, and the second is
the dialogue that leads up to the decision scenario which we call the decision context2. A decision scenario comprises
of a real-word or a hypothetical scenario with a finite list of alternatives. Decision scenarios naturally occur in any
task-oriented dialogue such as flight booking, restaurant selection, trip planning. Implementation of the task-oriented
dialogue is traditionally achieved by task-oriented dialogue systems (TODs).In recent times TODs have evolved with
advancements like ANYTOD [ 36] and TOD-BERT [ 32], enabling zero-shot learning and improved dialogue management.
Large Language Models (LLMs) offer greater flexibility, as seen in SGP-TOD, achieving state-of-the-art performance [ 35].
The second part of a Conversational Choice Environment is decision context. Decision context can be defined as the
dialogue that leads to the decision scenario. Alternatively, it is the dialogue or the discourse prior to the decision
1The words “decision scenario” and “choice problem” are used interchangeably in literature and in this paper.
2The words “decision context”, “prior dialogue”, and “prior discourse” are also used interchangeably.
Manuscript submitted to ACM
Page 4:
4 Pilli et al.
scenario. A personality test for creating an e-portfolio, or a scenario based question before suggesting an investment
plan are examples of a decision context.
A formal representation of a dialogue with decision context and choice problem is given as follows:
D={𝑢𝑠𝑦𝑠
1,𝑢𝑢𝑠𝑟
2,𝑢𝑠𝑦𝑠
3,...𝑢𝑠𝑦𝑠
𝑡−𝑘,...,𝑢𝑢𝑠𝑟
𝑡−1,
| {z }
Decision context𝑢𝑠𝑦𝑠
𝑡,𝑢𝑢𝑠𝑟
𝑡+1| {z }
Decision
Scenario
and
Response...}
A dialogueDconsists of utterances of dialogue system ( sys) and user ( usr). The utterances between 𝑡−𝑘and𝑡act
as the decision context to the decision scenario at 𝑡.𝑘−1is the length of the decision context.
2.2 Status Quo Bias
Status Quo Bias is defined to be present in decision-making, when individuals favour maintaining current conditions
over change, regardless of rationality. For example, in a restaurant recommendation scenario, users may prefer the
chatbot’s default suggestion over exploring other options. Rational choice theory posits that decisions are made to
maximise utility; however, status-quo bias contradicts this, as individuals opt for familiarity despite potential gains [ 18].
In decisions under uncertainty, status-quo bias manifests as a preference for familiar outcomes, even if alternative
options promise superior outcomes [ 30]. Similarly, in decisions under certainty, individuals exhibit a reluctance to
deviate from existing circumstances, even when objectively advantageous alternatives exist. Thus, status-quo bias
impedes rational decision-making by anchoring individuals to current states, hindering exploration of potentially
beneficial alternatives.
Over the last few decades, various experiments have shown strong evidence for status-quo bias. One of the founda-
tional experiments for status-quo bias was conducted by Samuelson and Zeckhauser [ 25]. They performed controlled
experiments using a questionnaire, containing a series of eight choice problems which they called decision scenarios. The
participants in the experiments were students enrolled in economics classes at Boston University School of Management
and at the Kennedy School of Government at Harvard University. A total of 486students took part in the study. The
results of this experiments showed pronounced status-quo bias. In a recent experiment, Qinyu Xiao et.al [ 33] replicated
four decision scenarios from Samuelson and Zeckhauser [ 25] study. They found strong empirical support for the status
quo bias in three decision scenarios which includes budget allocation investment portfolios and college jobs.
2.3 Cognitive Load
Cognitive load refers to the amount of mental effort required to process information [ 27]. Mental effort represents the
portion of cognitive capacity that is actively utilised to meet the demands of a task. Consequently, it can be seen as an
indicator of the actual cognitive load experienced [20].
Intrinsic, Extraneous, and Germane are various categories of cognitive load [ 20]. Intrinsic cognitive load arises from
the inherent complexity of the information. This complexity is determined by the number of element interactions that
must be understood simultaneously. Element interactions involve how different pieces of information relate to each
other within a task [ 28]. Cognitive load in a piece of information can be controlled by the number of elements and
interactions between the elements. When element interactions are numerous and complex, intrinsic cognitive load
increases. For instance, understanding a complex mathematical equation involves multiple steps and relationships,
Manuscript submitted to ACM
Page 5:
The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making 5
leading to high cognitive load. Conversely, simple tasks with few element interactions generate low intrinsic cognitive
load.
Cognitive load is inversely related to task performance. Brachten et al. [ 2] demonstrated that chatbots can reduce the
cognitive load needed to complete various tasks. One effective strategy to minimise cognitive load is to avoid presenting
long responses or requiring users to provide complex inputs. Similarly, Schmidhuber et al. [ 26] explored the way in
which the use of a chatbot affect users’ mental effort when interacting with a new software product and also to what
extend does the use of a chatbot affect users’ productivity. The results showed that chatbot users experienced less
cognitive load. Fadhil [ 12] suggested that the interactions should be concise and direct, utilising short texts or images
for responses and restricted input options, such as buttons, where feasible. This approach simplifies user interactions,
making it easier to process information and complete tasks efficiently. By implementing these methods, chatbots can
enhance user experience and improve task performance by reducing the cognitive demands placed on users. Therefore,
the cognitive load induced by the dialogue is dependent upon the dialogue design. By increasing the elements and
interactions between the elements mental effort increases in-turn leading to higher cognitive load.
In the experimental setting the cognitive load is measured in various ways. Some common ways are, physiological,
behavioural, performance, self-reporting, and subjective measure. One of the common way to measure cognitive load
is recording individuals response to NASA-TLX questioner at the end of the task. NASA’s Task Load Index (TLX)
scales, initially developed for aerospace applications, is a prominent subjective measure within the Cognitive Load
Theory (CLT) framework. Despite its origins, the NASA-TLX remains one of the most widely recognised and established
cognitive load measures.
3 Related Work
The study of cognitive biases in human decision-making has been a central focus in psychology, behavioural economics,
and human-computer interaction. With the increasing prevalence of conversational agents, researchers have explored
their potential in both leveraging and fighting the cognitive biases [ 3]. This section reviews previous work that has
examined how conversational agents interact with individuals’ cognitive biases, targetting for behavioural influence,
nudging, or decision-making interventions.
The early contributions by Ali et al. [ 1], examined the susceptibility of children to nudging strategies employed by
conversational agents and robots. Their study incorporated a modified version of the Dictator Game to investigate
whether children’s generosity could be influenced by different interlocutors. The findings demonstrated that children
were more likely to be influenced by conversational agents and robots than by human interlocutors, suggesting the
presence of authority bias and social influence bias. These results provided early empirical evidence that chatbots and
social robots could effectively exploit cognitive biases in decision-making processes, particularly among vulnerable
populations.
Pilli et al. [ 23] investigated how conversational agents can serve as tools to measure cognitive biases, specifically
focusing on the framing in language and loss aversion. The study employed chatbots to engage participants in structured
decision-making tasks and found that framing in language and loss-aversion, the tendency to prefer avoiding losses
over acquiring equivalent gains, was also observed in user responses, demonstrating that conversational agents could
play a crucial role in understanding individual susceptibility to biases.
Dubiel et al. [ 9] explored the impact of voice fidelity in synthetic speech on decision-making, focusing on how auditory
cues could exploit cognitive biases and heuristics. Their study investigated whether variations in voice characteristics,
such as speech pace and pitch, could alter participants’ choices. The results confirmed that higher-fidelity voices could
Manuscript submitted to ACM
Page 6:
6 Pilli et al.
influence decisions, highlighting the role of source credibility bias and affect heuristic, a mental shortcut in which
people rely on their emotions to make quick decisions rather than engaging in detailed analysis. Participants were more
likely to perceive and trust high-fidelity synthetic voices, raising concerns about how conversational agents could be
designed to subtly nudge user decisions. This research underscored the importance of voice modulation as a persuasive
tool in chatbot interactions, expanding the understanding of how non-verbal elements contribute to cognitive biases.
Building upon the works of Ali et al. [ 1], Kalashnikova et al. [ 17] examined linguistic nudges and their effectiveness
in promoting ecological behaviour change. Their study categorized nudges into those based on reflection and those
based on emotions, leveraging biases such as the status quo bias, regret aversion, and social conformity. The research
found that chatbots and robots were more effective than human interlocutors in influencing participants’ opinions
on environmental issues. These findings reinforced previous studies by demonstrating that cognitive biases could be
systematically leveraged in chatbot design to encourage behavioural shifts in various domains.
Most recently, Yamamoto [ 34] introduced the concept of “suggestive ending” in chatbot responses. Their study
built on the Ovsiankina effect, a cognitive bias where individuals feel compelled to complete unfinished tasks. By
designing chatbots that leave responses open-ended or with subtle prompts, users were encouraged to engage in deeper
decision-making and information-seeking behaviour. Their study found that users interacting with suggestive chatbots
asked more follow-up questions and engaged in prolonged decision-making processes, indicating increased cognitive
involvement. This work demonstrated that chatbots could not only influence decision outcomes but also shape the
depth of user engagement with a given topic, further expanding the role of conversational agents in cognitive bias
manipulation.
Ji et al. [ 15] in their latest study explored cognitive biases in spoken conversational search (SCS), emphasising the
absence of visual cues and the complexity of unstructured dialogue. Their investigates biases such as confirmation bias,
anchoring bias, and exposure effect in SCS interactions. While their work highlights the challenges of detecting cognitive
biases in voice-based searches and proposes a multimodal experimental framework, it remains largely theoretical.
Nonetheless, it provides a foundation for future research into mitigating biases in conversational AI systems.
These studies have explored various aspects of conversational agents like linguistic features, the affect caused
by dialogue voice modulation, and length of dialogue and their role in influencing the decision-making. However,
these studies focus on individual decision points or specific nudging strategies, they do not account for how previous
interactions, or conversational context, influence the subsequent decision-making. Conversational agents engage
users in multi-turn interactions, where the complexity of the dialogue itself imposes cognitive load that influences
later decisions. In our work, we investigate how the cognitive load induced by conversational task affects subsequent
decision-making processes. By examining this dimension, we aim to contribute to a deeper understanding of how
conversational agents can influence cognitive biases and nudging strategies.
4 Hypothesis Development
The objective of this study is to examine how the cognitive load induced by the task complexity of a task-oriented
chatbot’s dialogue influence individuals’ susceptibility to cognitive biases over a subsequent decision-making scenarios.
First, we investigate whether decision scenarios presented through a chatbot can replicate the findings of previous
studies on status quo bias. Establishing this baseline is crucial to validating the chatbot’s effectiveness in eliciting similar
behavioural patterns observed in traditional experiments. To achieve this, we design a simple task as the decision
context and subsequently introduce decision scenarios that adapted from classic status-quo bias studies.
Manuscript submitted to ACM
Page 7:
The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making 7
As an initial step, we hypothesise the following, which is adapted from the original hypothesis proposed in Samuelson
and Zeckhauser (1988) and its subsequent replications:
H1Participants presented with an alternative framed as the status-quo using chatbot will be more likely to choose
that alternative than if the alternative was the non-status-quo alternative.
Next, we examine how increasing task complexity influences individuals’ susceptibility to status quo bias in sub-
sequent decision-making scenarios. Specifically, we replace the simple task used in the baseline experiment with a
more complex task, increasing cognitive load before the decision scenario. By doing so, we aim to explore whether
heightened cognitive effort makes individuals more reliant on heuristic-driven decision-making, thereby amplifying the
status quo effect. This leads to our second hypothesis:
H2Participants who undergo a complex task before the decision scenario will exhibit a stronger status quo effect
compared to those who undergo a simple task.
By formulating these hypotheses, we aim to systematically investigate the relationship between task complexity in
chatbot interactions and status quo bias in decision-making. Establishing a baseline through H1 allows us to confirm
whether chatbot-based decision scenarios align with prior studies on status quo bias. Expanding on this, H2 enables us
to explore how increased cognitive load influences individuals’ susceptibility to this bias.
The following section details the methodology used to test these hypotheses, including the decision scenarios, task
manipulations, experimental design, and data collection procedures.
5 Method
This section outlines the methodology used to investigate how task complexity in chatbot interactions influences status
quo bias in subsequent decision-making. Our approach is structured into multiple experimental conditions designed
to test our hypotheses systematically. We begin by detailing the decision scenarios, which serve as the core choice
problems in our chatbot-based interactions and discusses how we carefully integrate these scenarios into our chatbot
system while ensuring consistency with prior research. Next, we describe the decision context tasks, which introduce
either simple or complex preference elicitation tasks before the decision scenario. The details of the tasks are discussed
in detail along with survey involved to measure necessary aspects from the tasks such as memory recall and cognitive
load. We then discuss our experimental design, including a-priori power analysis to determine an adequate sample size,
data collection procedures, and quality control measures such as attention checks, seriousness screening, and open
science framework registration. Our study is conducted as a between-subjects experiment with multiple conditions
across three decision scenarios and two decision context conditions.
5.1 Decision Scenarios
The original study [ 25]experimented with eight scenarios and five conditions. The replication study phase I experi-
mented two scenarios while phase II experimented with four. These scenarios are experimented in different choice set
configuration. For instance, budget allocation scenarios from the original study was explored under two alternatives
(pairs), three alternatives (triples), and four alternatives (quads) in the choice set. Our experiments adapted three
scenarios. The three scenarios are budget allocation, investment portfolios, and college jobs. The rationale behind
this decision is that these three scenarios have shown to have pronounced status-quo effect both in original and the
replication study at least in quads configuration. The chatbot is designed to have have an introductory utterances like
greetings which is followed by decision context and which then is followed by a decision scenario. We took utmost
Manuscript submitted to ACM
Page 8:
8 Pilli et al.
care while integrating these decision scenarios into the conversational agent. The decision scenarios and the list of
alternatives are are present in supplementary material.
As outlined in the original study, the neutral condition presents two alternatives without designating either as the
status-quo, while in the status-quo conditions, one of the alternatives is explicitly framed as the status-quo. Following
both the original and replication studies, we adopt the same three-condition design: a neutral condition, where no
option is framed as the status quo, and two status quo conditions, where each alternative is separately framed as the
status-quo in different trials.
The following is one of the decision scenario and conditions.
College Jobs Scenario Neutral Condition:
Having just completed your graduate degree, you have two offers of teaching jobs in hand. When evaluating
teaching job offers, people typically consider the salary, the reputation of the school, the location of the
school, and the likelihood of getting tenure (tenure is permanent job contract that can only be terminated
for cause or under extraordinary circumstances). Your choices are:
•College A: east coast, very prestigious school, high salary, fair chance of tenure.
•College B: west coast, low prestige school, high salary, good chance of tenure.
College Jobs Scenario Status-quo Condition (College A):
As previously mentioned, we experimented with three different decision scenarios. The alternatives of each scenarios
are as follows.
You are currently an assistant professor at College A in the east coast. Recently, you have been approached
by colleague at other university with job opportunity. When evaluating teaching job offers, people typically
consider the salary, the reputation of the school, the location of the school, and the likelihood of getting
tenure (tenure is permanent job contract that can only be terminated for cause or under extraordinary
circumstances). Your choices are:
•Remain at College A: east coast, very prestigious school, high salary, fair chance of tenure.
•Move to College B: west coast, low prestige school, high salary, good chance of tenure.
Similarly, budget allocation decision scenarios involve allocating the National Highway Safety Commission’s budget
between automobile safety and highway safety programs, with different conditions. The alternatives presented vary
based on a neutral condition, where both allocation options are presented equally, and status quo conditions, where
the current allocation either 60% auto safety / 40% highway safety (60A40H) or 50% auto safety / 50% highway safety
(50A50H) influences whether respondents choose to maintain the existing budget or shift funds between programs.
In the investment decision making scenario, choices are presented between a moderate-risk investment (Mod. Risk/
Company A) and a high-risk investment (High Risk/ Company B). Under neutral conditions, both options are presented
without any prior commitments, while in status quo conditions, the investor inherits a portfolio already allocated
to either moderate-risk or high-risk investments, potentially influencing their preference to maintain the existing
investment or switch. The details of the choice problems are made available in supplementary material.
5.1.1 Manipulations. We made few adjustments to the decision scenarios to address gaps in the original study and
ensure consistency across different configurations. Below, we outline the key modifications made to the phrasing,
structure, and presentation of the scenarios.
Manuscript submitted to ACM
Page 9:
The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making 9
Original study didn’t provide phrasing for decision scenarios for pairs configuration, therefore we modified them
based on the triples and quads. In the neutral condition of the college job scenarios, as shown above, the modification
made to the original scenario involves reducing the number of teaching job offers from four to two. Similarly, in the
status quo condition of the college jobs scenario, the modification made to the original scenario involves changing
“colleagues at other universities with job opportunities.” to “colleague at other university with job opportunity”. In all
the decision scenarios, we moved from ordered list presentation of the alternatives to bullet points.
The replication study by Xiao et al. [ 33] included a method to assess whether participants understood the decision
scenario. Participants were first shown the scenario and asked various related questions before being presented with
the decision alternatives. In our study, we adopted a different approach to maintain experimental validity. We asked the
comprehension questions later in the survey but not during the conversational agents interaction. We assessed the
accuracy of each individual’s recall, referring to it as their Decision Scenario Recall Task Performance.
5.2 Prior Discourse
The prior discourse incorporates two different tasks. The first task (labelled as Simple Task) adopts a preference
elicitation task where five attributes are elicited from the user through closed yes or no questions. The design for the
condition follows a conservative dialogue strategy and is not cognitively demanding. The second task (labelled as
Complex Task) uses a preference elicitation task where arithmetic comparison between the attributes and memorisation
of the outcomes requires additional mental effort. Since the design of the task is cognitively demanding we hypothesise
that such design will results in the increased cognitive load.
All the condition adopts a preference elicitation task in six different domains of the Schema Guided Dialogue (SGD)
dataset [ 24], these domains are primarily properties (real estate), music, movies, calendar, banks, and messaging apps.
These domains are carefully selected to have no influence on the subsequent decision scenario. This design decision is
to make sure that choices made over the decision scenarios are only influenced by the affect of cognitive load due to
decision context but strictly not by the domain of the task.
The design of the Simple preference elicitation task is straight forward. Related attributes of each domain and the
closed questions of each attribute are made available in supplementary material. Whereas, the complex task requires
some exposition with an example as it consists of complex design structure.
In the complex task, users are presented with a scenario where they select from various property recommendations.
Similar structures are used in other domains, including artist recommendations, streaming services, calendar apps,
and banking options. Each property is characterized by three attributes, such as number of bedrooms, square footage,
and star rating. For the first property, real values are explicitly provided (e.g., three bedrooms, 2000 sq. ft., and a 4-star
rating). However, in subsequent interactions, users must perform arithmetic calculations to determine the actual values
of the attributes for other properties. This approach follows the standard approach to increase cognitive load, which
involves a combination of arithmetic reasoning and memory recall tasks [ 8]. By progressively increasing the number
of required calculations and memory dependencies, the cognitive effort demanded by the task also increases, making
decision-making more cognitively taxing.
Manuscript submitted to ACM
Page 10:
10 Pilli et al.
U1. In the following scenario choose from various property recommendations.
U2.The first property has three bedrooms, 2000 square feet, and a 4-star rating. The second property has twice the
number of bedrooms and with the same size and rating. Which one do you prefer, and why?
U3.The third property has the same number of bedrooms as the second one but is half the size of the first one, with
the same rating as the first. Which one do you prefer, and why?
U4.The fourth property has the same number of bedrooms as the second, the same size as the third, but one less star
rating than the first. Which one do you prefer, and why?
U5.Remember the details of the fourth property. Specific information related to the property will be requested later.
In the above example, while responding to the third dialogue utterance (U3), the user is required to put mental
effort to calculate the real value of the attribute bed room as it requires user to compare the second property with
the first property and deduce a real value for third property. By definition the interactions between the element is
that which additional mental effort resulting in cognitive load. Therefore the elements here are the attributes and
the interactions between these elements is the arithmetic performed by the individual. Refer to the supplementary
material for other domains of preference elicitation task. Number of recursive calculations increases with each utterance
therefore the cognitive load is expected to increase after each utterance. This results in the mental effort required along
with progression of dialogue. Finally, the user is asked to memorise the deduced attribute information related to the
property In theory, the task design must substantially increase the cognitive load of the individual. To test the cognitive
load, standard NASA-TLX [22] survey adopted for all the interactions.
5.3 A-priori Power Analysis
We conducted power analysis using the software programme 𝐺∗𝑃𝑜𝑤𝑒𝑟 [13] to determine the required sample size. Our
objective was to achieve a power of 𝛽of 0.80 to detect a medium effect size ( 𝐶𝑜ℎ𝑒𝑛′𝑠𝜔) of 0.3 (Our study investigates
the difference between the effect size under various conditions rather than emphasising on establishing a statistically
effect for each condition. Therefore, any where between medium to large effect size is desirable.) with a standard alpha
error probability 𝛼of 0.05, and with 1 degree of freedom [ 21]. Based on the analysis, the target sample size for each
condition was calculated to be 42 participants. Given that our study includes two decision context conditions (simple
task and complex task), three decision scenarios (budget allocation, investment decision making, college jobs), and
three decision scenario conditions (one neutral and two status quo conditions), we require a sample 756 participants for
a between-subjects experimental design.
5.4 Data Collection, Quality, and Integrity
To ensure high-quality data collection, we implemented rigorous compensation, integrity, and quality control mecha-
nisms throughout the study.
5.4.1 Compensation and Participant Recruitment. Participants were recruited through Prolific, a widely used platform
known for ensuring data quality and participant reliability. The estimated completion time for the survey was eight
minutes, and participants were compensated according to Prolific’s recommended minimum rate of $8 per hour,
receiving more than $0.80 after successful completion of the survey. A total of 756 participants were recruited, and
measures were put in place to prevent duplicate participation. Additionally, demographic information was obtained
from Prolific’s database, allowing for diversity verification and eligibility confirmation.
Manuscript submitted to ACM
Page 11:
The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making 11
5.4.2 Data Quality. To maintain data quality participants were asked whether they had previously encountered
the decision scenarios, this is to assess familiarity and ensure prior exposure did not bias their responses. Similarly,
domain familiarity was elicited verify participants’ understanding of the subject matter and determine relation with the
outcomes. To further ensure data reliability, participants were asked to rate their seriousness at the end of the survey. If
a participant indicated a lack of engagement or provided responses suggesting inattentiveness, their data was flagged
for review. In cases where engagement was deemed insufficient, a data recollection process was initiated to replace
unreliable responses with high-quality data. (have to provide the stats for all these) To verify attentiveness, a memory
recall task was included, where participants were asked to recall specific details from their interaction with the chatbot.
This helped confirm whether participants had actively engaged with the task or merely responded at random. To ensure
that participants processed cognitive load naturally, they were explicitly asked not to use external aids, such as pen and
paper, during cognitive tasks. This measure was necessary to maintain the intended cognitive load effects and prevent
external assistance from influencing the results.
5.4.3 Data Integrity. An automated data integrity mechanism was also incorporated into the system. Upon submission,
participant responses, captured as a Json file, were automatically emailed to both the study authors and a publicly
accessible email address. This system was designed to enhance transparency and accountability, ensuring that data was
securely logged and accessible for verification. The dataset will be made available for read-only public access, with
access codes provided upon request.
5.4.4 Data Cleaning and Rejection Criteria. Participants were not automatically rejected based on a single failed
attention check. Instead, a dual-layer filtering approach was applied to distinguish between minor lapses and systematic
inattention. If a participant failed a Decision Context Attention Check (DCAC) but provided otherwise high-quality
responses, their data was retained. However, participants who failed both attention and quality checks were removed
from the dataset. Rejected responses were replaced through recollection procedures to maintain the required sample
size while ensuring the integrity of the dataset. By implementing these measures, we ensured that the final dataset was
robust, reliable, and free from low-quality responses, enhancing the validity of our findings.
5.4.5 OSF Pre-Registration. The hypotheses and the experimental design were preregistered in the Open Science
Framework (OSF) to ensure transparency and reproducibility. This preregistration includes detailed documentation of
our study’s objectives, methodology, decision scenarios, experimental conditions, and analysis plans. By publicly sharing
our hypotheses in advance, we mitigate the risks of confirmation bias and p-hacking, ensuring that our research follows
a predefined analytical framework. The preregistration also includes a-priori power analysis, justifications for sample
size selection, and all materials used in the study. The complete pre-registration can be accessed at https://osf.io/psxvf,
where all study materials, analysis scripts, and methodological disclosures are available for verification and replication.
5.4.6 IRB. The ethics of conducting research involving human participants is of utmost importance. This study was
approved by the Institutional Review Board (IRB) and adheres to the ethical guidelines established by the declaration.
Human Research Ethics Committee – Sciences (HREC-LS): LS-C-25-001-Pilli-Nallur.
5.5 Procedure
The entire experiment was developed as an interactive Streamlit application [ 14] and is hosted on Google Cloud
Platform (GCP). The conversational user interface (CUI) is built using the Streamlit chat interface. On the back end, the
Manuscript submitted to ACM
Page 12:
12 Pilli et al.
Fig. 1. Chatbot interaction conditions, survey, and Cognitive load survey.
OpenAI API powers the chatbot. Carefully crafted prompts are designed to simulate various experimental conditions.
Additionally, the source code and condition-specific prompts are provided in the supplementary materials.
Participants began the interaction by reviewing an information sheet detailing the study’s purpose, procedures, and
ethical considerations. Upon completion, the participants are allowed to access the experiment’s homepage displaying
their prolific ID and consent form. The checkbox for consent was initially unchecked, requiring participants to actively
select "Yes" to indicate their agreement before proceeding. Only after providing informed consent were participants
allowed to continue to the main experiment.
The chatbot initiates the conversation with a greeting, followed by a decision context task that was assigned based
on the participant’s experimental condition. Participants in the simple task condition engages in a straightforward
preference elicitation task, while in the complex task condition undergoes a more cognitively demanding task involving
arithmetic comparisons and memory recall as described in the section 5.2. Upon completion, subsequently in the same
chat interaction, participants undergo a random decision scenario (discussed in previous section), where they were
required to choose between a set of alternatives.
In the simple task condition, participants engage with both the simple task and the complex task; however, the
complex task is introduced after the decision scenario. Towards the end of the survey, participants are first presented
with the transcript of their interaction from the simple task, followed by the NASA-TLX survey to assess their cognitive
load. Subsequently, they are shown the transcript of the complex task interaction, after which they complete another
NASA-TLX survey. This sequential approach ensures that cognitive load is measured separately for each task, allowing
for a more precise evaluation of the effect of task complexity on decision-making. Conversely, in the complex task
condition, participants completed both the simple and complex tasks before being presented with the decision scenario
as show in Figure 1. This was a deliberate design, to make participant take complex task NASA-TLX survey in relation to
simple task. This approach ensures greater accuracy compared to administering the surveys independently, minimizing
recall bias and providing a more precise assessment of cognitive load.
After completing their chatbot interaction, participants were redirected to a survey page. This survey included a
memory recall task to assess attentiveness, along with attention check questions to ensure data integrity.
Manuscript submitted to ACM
Page 13:
The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making 13
6 Results
This section presents the findings of the study, focusing on the impact of task complexity on cognitive load following
the influence of task complexity on simple task and complex task. We begin with descriptive statistics, providing an
overview of participant demographics and data quality assessments to ensure the validity of our dataset. Next, we
analyse task complexity and cognitive load, examining how different levels of cognitive effort influenced participants’
responses. Finally, we report the statistical significance and effect sizes of status-quo bias under simple and complex
tasks.
6.1 Descriptive Statistics
6.1.1 Demographics. The dataset comprises 756 individuals (mean age = 41.81 years, range 18–81) who took an average
of 10.20 minutes (with SD=4.82) to complete the study. The sample is slightly skewed toward female participants (53%
female, 47% male) and is predominantly White (82.1%), with most participants residing in either the United Kingdom
(71.2%) or the United States (27.3%). The majority reported English as their primary language (99.7%). Regarding student
status, 76.1% were non-students, while 11.8% were students, with some missing data. In terms of employment, 44.8%
were in full-time roles, 17.8% in part-time work, and 6.4% were unemployed and seeking jobs, while others fell into
various categories, including homemakers, retirees, or those due to start a new job soon.
6.1.2 Data Quality. To ensure the experimental validity, several measures were implemented. These checks aimed
to verify participant engagement, familiarity with the decision context, attentiveness, and authenticity of responses,
minimising potential biases that could affect the results.
The familiarity with the decision context domain was generally high, with a mean score significantly above the
midpoint (𝑡= 17.09,𝑝< 0.001), indicating that participants were well-acquainted with the subject matter. The majority
of participants (698 out of 756) reported that they had not seen the decision scenarios before, while others were unsure,
and the distribution was statistically significant ( 𝑝= 0.00). Additionally, none of the participants reported using external
tools or expressed a need for them, confirming that decision-making relied on internal cognitive resources. Seriousness
scores were high, with a mean score significantly above the midpoint ( 𝑡= 42.79,𝑝< 0.001), suggesting that participants
engaged seriously with the survey tasks. These findings, along with Decision Context Attention Checks (DCAC),
ensured that participants were attentive, engaged, and provided reliable data for hypothesis testing.
To identify and handle outliers in response times, we used the Interquartile Range (IQR) method. This involved
calculating the first quartile (Q1) and third quartile (Q3) and determining the IQR as the difference between them.
Response times falling below the lower bound (Q1 minus 1.5 times the IQR) or above the upper bound (Q3 plus 1.5
times the IQR) were flagged as potential outliers. Using this method, we initially identified 53 responses for exclusion.
However, some extreme outliers appeared due to technical issues, with unusually large response times (e.g., 1,329,910.313
s and 1,331,160.924 s; 𝑛= 22). To ensure these were not mistakenly excluded, we cross-checked them against recorded
start and completion times from Prolific and retained those that aligned with expected survey durations. Additionally,
some responses had implausible negative or zero values, indicating system serious technical errors ( 𝑛= 5). These cases
were removed excluded as well. Importantly, including or excluding these erroneous data points did not significantly
impact the overall findings. To maintain a conservative approach, we excluded these unreliable responses ( 𝑛= 31) from
the final analysis.
Manuscript submitted to ACM
Page 14:
14 Pilli et al.
NASA-TLX
DimensionsSimple Task
vs
Complex TaskSC1 SC3 SC4
Mental DemandA -2.144*** -2.232*** -1.820***
B -2.255*** -2.429*** -2.570***
EffortA -1.235*** -1.290*** -0.978**
B -0.962** -1.512*** -1.514***
PerformanceA -0.546* -0.689** -0.675**
B -0.389 -1.124*** -0.677**
FrustrationA -0.956** -0.692** -0.400
B -0.750** -0.919** -1.174***
Temporal DemandA -0.593* -0.455 -0.439
B -0.817** -0.824** -0.489
Physical DemandA -0.610* -0.324 -0.233
B -0.561 -0.429 -0.302
Table 1. NASA-TLX Test Results with Effect Sizes and p-value Significance. Significance levels: *** p <0.001, ** p <0.01, * p <0.05, + p
<0.10
6.2 Task Complexity and Cognitive Load
To investigate the impact of task complexity on cognitive load, we analysed participants’ responses to the NASA-TLX
survey after engaging with both simple and complex tasks during the chatbot’s interactions. For more details please
refer to the Section 5.5.
The results show that Mental Demand had the largest differences between simple and complex tasks across all
scenarios (budget allocation, investment decision making, college jobs). The effect sizes, as tabulated in Table 1 were
strong (ranging from -0.57 to -1.161), and most results were highly significant ( 𝑝< 0.001,𝑝< 0.01, or𝑝< 0.05). This
suggests that participants experienced much higher mental workload when performing complex tasks. Effort also
showed significant differences in multiple scenarios, especially in budget allocation and investment decision making,
with effect sizes up to -0.864 and p-values below 0.001 as shown in Figure 2. This indicates that participants had to put
in more effort for complex tasks. For Performance, Frustration, and Temporal Demand, some significant effects were
found, but the effect sizes were smaller compared to Mental Demand and Effort. Physical Demand showed the least
significant differences, suggesting that physical effort was not a major factor in these tasks. Overall, Mental Demand, as
shown in the Figure 2 was the most affected by task complexity, making it the most important factor.
6.3 Task Complexity and Status Quo Bias
Our study investigated the influence of task complexity on bias detection across three decision-making contexts: budget
allocation (BA), investment portfolio (IDM), and college jobs (CJ). We compared our findings to the original study
(Samuelson[25]).
6.3.1 Budget Allocation Scenario. In the Samuelson study, when 60A40H was the status quo alternative, 61% (11/18) of
participants chose it. However, when 50A50H was framed as the status quo, 76% (13/17) of participants preferred it.
This difference was statistically significant ( 𝑝= 0.025), with a moderate to large effect size ( 𝐶𝑜ℎ𝑒𝑛′𝑠ℎ= 0.78). However,
the confidence interval ( 𝐶𝐼:-0.26 to 1.82) was wide, suggesting some uncertainty about the true effect size.
Manuscript submitted to ACM
Page 15:
The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making 15
Mental DemandEffort
FrustrationPerformance
T emporal DemandPhysical Demand
NASA-TLX Dimension0.00.51.01.52.0Mean Absolute Effect SizeMean Effect Sizes for Each NASA-TLX Dimension
Fig. 2. Mean Effect Sizes for Each NASA-TLX Dimension
AlternativesChoice rates Status quo framing vs. non-status quo framing
SQ N NSQ 𝜒2p Odds ratio(95% CI) Cohen’s h(95%CI)
Scenario 1: Budget allocation ratios
60A40H 16/41 (0.39) 12/42 (0.29) 4/42 (0.1) 9.87 0.002* 6.08 [1.82, 20.31] 0.72 [-0.13, 1.58]
50A50H 38/42 (0.9) 30/42 (0.71) 25/41 (0.61)
Scenario 2: Investment portfolios
Mod. Risk 32/42 (0.76) 32/40 (0.8) 33/42 (0.79) 0.07 0.794 0.87 [0.31, 2.43] -0.06 [-0.78, 0.67]
High Risk 9/42 (0.21) 8/40 (0.2) 10/42 (0.24)
Scenario 3: College jobs
College A 25/41 (0.61) 19/41 (0.46) 21/41 (0.51) 0.79 0.373 1.49 [0.62, 3.58] 0.2 [-0.42, 0.82]
College B 20/41 (0.49) 22/41 (0.54) 16/41 (0.39)
Table 2. Simple Task - Status quo framing vs. non-status quo framing
Similarly, in our simple task condition, 39% (16/41) of participants preferred 60A40H, while 90% (38/42) chose 50A50H
when it was the status quo. This difference was statistically significant ( 𝑝= 0.002), with an effect size of ℎ= 0.72 as
show in the Table 2. While the confidence interval ( 𝐶𝐼:-0.13 to 1.58) remained wide, the effect size closely matched
that of the Samuelson’s study, confirming that status quo bias was successfully detected in this condition.
In the complex task condition, 54% (20/37) of participants preferred 60A40H, while 83% (34/41) selected 50A50H
as the status quo alternative as show in the Table 3. This difference was statistically significant ( 𝑝= 0.001), with the
Manuscript submitted to ACM
Page 16:
16 Pilli et al.
AlternativesChoice rates Status quo framing vs. non-status quo framing
SQ N NSQ 𝜒2p Odds ratio(95% CI) Cohen’s h(95%CI)
Scenario 1: Budget allocation ratios
60A40H 20/37 (0.54) 9/39 (0.23) 7/41 (0.17) 11.75 0.001* 5.71 [2.02, 16.15] 0.8 [0.07, 1.53]
50A50H 34/41 (0.83) 30/39 (0.77) 17/37 (0.46)
Scenario 2: Investment portfolios
Mod. Risk 33/39 (0.85) 35/41 (0.85) 27/36 (0.75) 1.08 0.298 1.83 [0.58, 5.8] 0.24 [-0.57, 1.06]
High Risk 9/36 (0.25) 6/41 (0.15) 6/39 (0.15)
Scenario 3: College jobs
College A 24/40 (0.6) 20/39 (0.51) 19/40 (0.48) 1.26 0.262 1.66 [0.68, 4.02] 0.25 [-0.38, 0.88]
College B 21/40 (0.52) 19/39 (0.49) 16/40 (0.4)
Table 3. Complex Task - Status quo framing vs. non-status quo framing
Study
Samuelson− BA
Simple Task − BA
Complex Task − BA
Samuelson− IDM
Simple Task − IDM
Complex Task − IDM
Samuelson− CJ
Simple Task − CJ
Complex Task − CJStatus Quo
First Alternative
11/18
16/41
20/37
27/43
32/42
33/39
16/20
25/41
24/40Status Quo
Second Alternative
13/17
38/42
34/41
27/48
9/42
9/36
30/38
20/41
21/40p−value
0.025
0.002
0.001
0.069
0.794
0.298
0
0.373
0.262Cohen's h
0.78
0.72
0.8
0.38
−0.06
0.24
1.26
0.2
0.25
−0.5 0 0.5 1 1.5 2
Fig. 3. Comparison of Effect-sizes across various studies. Size of the block indicates the size of the sample.
largest effect size ( ℎ= 0.80) observed in our study. Notably, the confidence interval ( 𝐶𝐼:0.07 to 1.53) did not include
zero, providing stronger evidence that higher task complexity increases susceptibility to status quo bias.
The effect sizes in our study closely aligned with those in the Samuelson study, confirming the presence of status
quo bias, particularly in the simple task condition. However, while the confidence intervals in both the Samuelson
and simple task conditions included zero, indicating some uncertainty, the complex task condition demonstrated the
strongest effect size with a confidence interval that excluded zero, providing some evidence that higher task complexity
influences status-quo bias.
Manuscript submitted to ACM
Page 17:
The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making 17
6.3.2 Investment Portfolio Scenario. In the Samuelson IDM condition, 62.8% (27/43) of participants chose the moderate-
risk alternative when framed as the status quo, while 56.3% (27/48) preferred the high-risk alternative when it was the
status quo alternative. The effect size was small to moderate ( ℎ= 0.38), and although the p-value = 0.069 indicated a
trend toward significance, the confidence interval ( 𝐶𝐼:-0.21 to 0.98) suggests some variability in the effect.
In our simple task condition, 76.2% (32/42) of participants chose the moderate-risk alternative when it was the status
quo, whereas 21.4% (9/42) selected the high-risk alternative when it was framed as the status-quo alternative. However,
the effect size was negligible ( ℎ= -0.06), and the p-value = 0.794, indicating no significant difference between conditions.
The confidence interval ( 𝐶𝐼:-0.78 to 0.67) was wide, suggesting a high level of uncertainty in the effect.
In contrast, the complex task condition showed a positive shift toward the baseline effect observed in the Samuelson
study. Here, 84.6% (33/39) of participants chose the moderate-risk alternative as the status quo, while 25.0% (9/36)
selected the high-risk alternative when framed as the default. The effect size was small but larger than in the simple
task condition ( ℎ= 0.24), suggesting a trend toward increased bias under higher task complexity. Although the p-value
= 0.298 did not reach statistical significance, the effect size’s movement closer to the Samuelson’s baseline ( ℎ= 0.38)
as shown in the forest plot Figure 3. This provides some evidence that increased task complexity may strengthen
status-quo bias, supporting our hypothesis to a certain extent.
The effect sizes in our study suggest a directional trend toward status quo bias in investment decision-making, with
the complex task condition moving closer to the baseline effect observed in the Samuelson study. While the confidence
intervals in both the simple and complex task conditions remained wide and included zero, indicating uncertainty, the
increasing effect size under higher task complexity provides some evidence that cognitive load may enhance status-quo
bias in investment choices.
6.3.3 College Jobs Scenario. The Samuelson study demonstrated a large and highly significant status-quo bias effect,
with 80.0% (16/20) of participants selecting the East Coast (College A) alternative when framed as the status quo, and
78.9% (30/38) choosing the West Coast (College B) alternative when it was in the status-quo position. The effect was
highly significant ( 𝑝= 0.000,ℎ= 1.26), and the confidence interval ( 𝐶𝐼:0.31 to 2.21) confirmed a strong and reliable
bias effect, reinforcing the tendency to prefer the status-quo option.
In our Simple Task condition, 61.0% (25/41) of participants chose the East Coast alternative when presented as the
status quo, while 48.8% (20/41) preferred the West Coast alternative when framed as status-quo alternative. The effect
size was small ( ℎ= 0.2), and the result was not statistically significant ( 𝑝= 0.373). The confidence interval ( 𝐶𝐼:-0.42 to
0.82) was broad, indicating variability in the observed effect.
In the Complex Task condition, a negligible but directional trend towards hypothesis was observed. 60.0% (24/40) of
participants chose the East Coast alternative when it was the status quo, while 52.5% (21/40) selected the West Coast
alternative when framed as the default. The effect size ( ℎ= 0.25) was slightly higher than in the Simple Task condition,
suggesting a minor increase in bias under higher task complexity. However, the p-value ( 𝑝= 0.262) did not reach
statistical significance, and the confidence interval ( 𝐶𝐼:-0.38 to 0.88) remained wide, indicating continued uncertainty
about the effect.
The effect sizes in our study suggest a much weaker status quo bias in the college jobs scenario compared to the
Samuelson study, where the effect was large and highly significant. In both the simple and complex task conditions, the
effect sizes were small, and the confidence intervals remained wide and included zero, indicating substantial uncertainty.
While the complex task condition showed a slight increase in effect size compared to the simple task, the absence
Manuscript submitted to ACM
Page 18:
18 Pilli et al.
of statistical significance suggests that task complexity did not meaningfully amplify status quo bias in this decision
context.
One possible explanation for this outcome is the demographic composition of our sample, as the majority of
participants were from the UK, while the decision scenario was framed for U.S. geography. Given that participants may
not have had strong preferences or familiarity with the locations, could have caused this.
6.4 Response Times and Recall Task Performances
We further investigated the average response times and recall task performance of the decision scenarios. Participants in
the complex task took significantly longer to respond compared to the simple task, reflecting increased cognitive effort.
Despite this, their recall performance remained high and unchanged across conditions, indicating full engagement with
the decision scenarios regardless of cognitive load.
The average response time for the simple task was 53.58s, while participants in the complex task took significantly
longer which is 76.24s. Across all decision scenarios, response times in the complex task were consistently higher than
in the simple task ( 𝑡= 5.504,𝑝<0.001). These response times also aligned with cognitive load survey, indicating that
participants experienced greater cognitive effort under increased task complexity. Despite the longer response time and
higher cognitive load, decision scenario recall task performance remained high in both conditions, with no significant
difference between the simple task ( 𝑀=0.875,𝑆𝐷=0.069) and the complex task ( 𝑀=0.858,𝑆𝐷=0.063). This suggests
that participants were fully engaged with the decision scenarios regardless of cognitive load, though they required
more time to complete decisions under higher complexity.
6.5 Inherent Bias in Decision Scenario Alternatives
In the budget allocation scenario, there was a clear preference for the 50A50H alternative, even when 60A40H was set as
the status quo. Specifically, 61% of participants still preferred 50A50H, indicating an inherent bias toward equal budget
distribution as shown in Table 2. However, this preference diminished under the complex task, where the difference
between choices narrowed to 46%. The neutral condition revealed the strongest bias, with a 29% vs. 71% split, further
highlighting the natural inclination toward 50A50H (Refer Table 3). Importantly, in the complex task condition, this
inherent bias remained relatively stable (23% vs. 77%), suggesting that increasing task complexity did not significantly
alter this trend.
In the investment decision-making scenario, participants showed a strong bias toward Moderate Risk investments.
When High Risk was set as the status quo, 75% still preferred Moderate Risk, indicating a strong tendency to avoid
higher-risk choices. This bias was even more pronounced in the neutral condition, where over 80% of participants chose
Moderate Risk in both the simple and complex task conditions. Moreover, increasing task complexity did not significantly
alter this preference, suggesting that the bias toward Moderate Risk investments is robust and largely unaffected by
task difficulty. In contrast, we did not observe a similar inherent bias in the college jobs scenario. Preferences between
College A and College B remained more balanced across conditions, with no strong tendency toward one alternative.
7 Conclusion
This study examined the influence of prior discourse complexity in conversational agents on subsequent decision-
making, particularly its influence on status quo bias. Across all decision scenarios, we observed a common trend: while
increasing task complexity consistently shifted the effect size towards our hypothesis, the change was not statistically
significant in most cases. However, in the budget allocation scenario, the complex task condition resulted in a reliable
Manuscript submitted to ACM
Page 19:
The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making 19
detection of status quo bias, as its confidence intervals did not include zero. This suggests that cognitive load can
meaningfully reinforce biases under specific conditions. In the investment decision-making scenario, the shift from no
effect in the simple task to a small-to-medium effect size in the complex task highlights the need for further exploration.
While the effect did not reach statistical significance, the consistent directional movement suggests that heightened
cognitive load may play a role in exacerbating status-quo bias.
These findings have implications for digital nudging and behavioural economics. Conversational agents are increas-
ingly used to guide users in various decision-making tasks, and our results suggest that dialogue complexity may shape
the way biases manifest. If cognitive load strengthens biases rather than mitigating them, designers of digital nudges
must be cautious in structuring interactions to avoid unintended manipulations.
Our findings highlight the presence of inherent biases in decision scenarios, where certain alternatives naturally
attract preference regardless of status quo framing. For example, in the budget allocation scenario, participants exhibited
a strong preference for equal distribution (50A50H), even when 60A40H was framed as the default. Similarly, in the
investment decision-making scenario, moderate-risk options were favoured regardless of framing. These inherent biases
can impact experimental outcomes by masking or exaggerating the effects, making it crucial for the future experiments
on these decision scenarios carefully design experiments that take baseline preferences into account.
Future research should investigate the threshold at which cognitive load meaningfully impacts bias expression and
explore additional cognitive biases such as framing, anchoring, and loss-aversion. More rigorous experimental designs,
including larger sample sizes, could provide further insights into the interaction between prior dialogue complexity and
decision-making.
Acknowledgments
This work was conducted with the financial support of the Research Ireland Centre for Research Training in Digitally-
Enhanced Reality (d-real) under Grant No. 18/CRT/6224. For the purpose of Open Access, the author has applied a CC
BY public copyright licence to any Author Accepted Manuscript version arising from this submission
References
[1]Hugues Ali Mehenni, Sofiya Kobylyanskaya, Ioana Vasilescu, and Laurence Devillers. 2021. Nudges with Conversational Agents and Social Robots:
A First Experiment with Children at a Primary School. In Conversational Dialogue Systems for the Next Decade , Luis Fernando D’Haro, Zoraida
Callejas, and Satoshi Nakamura (Eds.). Springer, Singapore, 257–270. doi:10.1007/978-981-15-8395-7_19
[2]Florian Brachten, Felix Brünker, Nicholas RJ Frick, Björn Ross, and Stefan Stieglitz. 2020. On the ability of virtual agents to decrease cognitive load:
an experimental study. Information Systems and e-Business Management 18, 2 (2020), 187–207.
[3]Ana Caraban, Evangelos Karapanos, Daniel Gonçalves, and Pedro Campos. 2019. 23 Ways to Nudge: A Review of Technology-Mediated Nudging
in Human-Computer Interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19) . Association for
Computing Machinery, New York, NY, USA, 1–15. doi:10.1145/3290605.3300733
[4]Austin Carr. 2013. How Square Register’s UI Guilts You Into Leaving Tips. https://www.fastcompany.com/3022182/how-square-registers-ui-guilts-
you-into-leaving-tips
[5]Craig R Carter, Lutz Kaufmann, and Alex Michel. 2007. Behavioral supply management: a taxonomy of judgment and decision-making biases.
International Journal of Physical Distribution & Logistics Management 37, 8 (2007), 631–669.
[6]Ana Paula Chaves and Marco Aurelio Gerosa. 2021. How should my chatbot interact? A survey on social characteristics in human–chatbot
interaction design. International Journal of Human–Computer Interaction 37, 8 (2021), 729–758.
[7]Miruna Conţ, Aurelia Ciupe, Bogdan Orza, Irina Cohuţ, and Georgiana Niţu. 2022. Career counseling chatbot using microsoft bot frameworks. In
2022 IEEE Global Engineering Education Conference (EDUCON) . IEEE, 1387–1392.
[8]Cary Deck and Salar Jahedi. 2015. The effect of cognitive load on economic decision making: A survey and new experiments. European Economic
Review 78 (2015), 97–119.
[9]Mateusz Dubiel, Anastasia Sergeeva, and Luis A. Leiva. 2024. Impact of Voice Fidelity on Decision Making: A Potential Dark Pattern?. In Proceedings
of the 29th International Conference on Intelligent User Interfaces . ACM, Greenville SC USA, 181–194. doi:10.1145/3640543.3645202
Manuscript submitted to ACM
Page 20:
20 Pilli et al.
[10] Godson D’Silva, Megh Jani, Vipul Jadhav, Amit Bhoir, and Prithvi Amin. 2020. Career counselling chatbot using cognitive science and artificial
intelligence. In Advanced Computing Technologies and Applications: Proceedings of 2nd International Conference on Advanced Computing Technologies
and Applications—ICACTA 2020 . Springer, 1–9.
[11] Scott Eidelman and Christian S. Crandall. 2012. Bias in Favor of the Status Quo. Social and Personality Psychology Compass 6, 3 (2012), 270–281.
doi:10.1111/j.1751-9004.2012.00427.x _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1751-9004.2012.00427.x.
[12] Ahmed Fadhil. 2018. Domain specific design patterns: designing for conversational user interfaces. arXiv preprint arXiv:1802.09055 (2018).
[13] Franz Faul, Edgar Erdfelder, Axel Buchner, and Albert-Georg Lang. 2009. Statistical power analyses using G* Power 3.1: Tests for correlation and
regression analyses. Behavior research methods 41, 4 (2009), 1149–1160.
[14] Streamlit Inc. 2019. Streamlit: A Faster Way to Build and Share Data Apps. Available at https://streamlit.io/. Accessed: 2025-02-27.
[15] Kaixin Ji, Sachin Pathiyan Cherumanal, Johanne R. Trippas, Danula Hettiachchi, Flora D. Salim, Falk Scholer, and Damiano Spina. 2024. Towards
Detecting and Mitigating Cognitive Bias in Spoken Conversational Search. In 26th International Conference on Mobile Human-Computer Interaction .
ACM, Melbourne VIC Australia, 1–10. doi:10.1145/3640471.3680245
[16] Daniel Kahneman. 2011. Thinking, fast and slow . Farrar, Straus and Giroux, New York, NY, US. Pages: 499.
[17] Natalia Kalashnikova, Ioana Vasilescu, and Laurence Devillers. [n. d.]. Linguistic Nudges and Verbal Interaction with Robots, Smart-Speakers, and
Humans. ([n. d.]).
[18] Yusufcan Masatlioglu and Efe A. Ok. 2014. A Canonical Model of Choice with Initial Endowments. The Review of Economic Studies 81, 2 (April 2014),
851–883. doi:10.1093/restud/rdt037
[19] Rifqi Muhammad. 2023. Barriers and effectiveness to counselling careers with Artificial Intelligence: A systematic literature review. Ricerche di
Pedagogia e Didattica. Journal of Theories and Research in Education 18, 3 (2023), 143–164.
[20] Fred Paas, Juhani E. Tuovinen, Huib Tabbers, and Pascal W. M. Van Gerven. 2003. Cognitive Load Measurement as a Means to Advance
Cognitive Load Theory. Educational Psychologist 38, 1 (Jan. 2003), 63–71. doi:10.1207/S15326985EP3801_8 Publisher: Routledge _eprint:
https://doi.org/10.1207/S15326985EP3801_8.
[21] Bhavna Pancholi, Mark Dunne, and Richard Armstrong. 2009. Sample size estimation and statistical power analyses. 16 (11 2009).
[22] Vinoth Pandian Sermuga Pandian and Sarah Suleri. 2020. NASA-TLX Web App: An Online Tool to Analyse Subjective Workload. http://arxiv.org/
abs/2001.09963 arXiv:2001.09963 [cs].
[23] Stephen Pilli. 2024. Exploring Conversational Agents as an Effective Tool for Measuring Cognitive Biases in Decision-Making. doi:10.48550/arXiv.
2401.06686 arXiv:2401.06686 [cs].
[24] Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta, and Pranav Khaitan. 2020. Towards Scalable Multi-Domain Conversational
Agents: The Schema-Guided Dialogue Dataset. Proceedings of the AAAI Conference on Artificial Intelligence 34, 05 (April 2020), 8689–8696.
doi:10.1609/aaai.v34i05.6394 Number: 05.
[25] William Samuelson and Richard Zeckhauser. 1988. Status quo bias in decision making. J Risk Uncertainty 1, 1 (March 1988), 7–59. doi:10.1007/
BF00055564
[26] Johanna Schmidhuber, Stephan Schlögl, and Christian Ploder. 2021. Cognitive Load and Productivity Implications in Human-Chatbot Interaction. In
2021 IEEE 2nd International Conference on Human-Machine Systems (ICHMS) . 1–6. doi:10.1109/ICHMS53169.2021.9582445
[27] John Sweller. 1988. Cognitive load during problem solving: Effects on learning. Cognitive science 12, 2 (1988), 257–285.
[28] John Sweller. 2011. CHAPTER TWO - Cognitive Load Theory. In Psychology of Learning and Motivation , Jose P. Mestre and Brian H. Ross (Eds.).
Vol. 55. Academic Press, 37–76. doi:10.1016/B978-0-12-387691-1.00002-8
[29] Richard H. Thaler, Cass R. Sunstein, and John P. Balz. 2010. Choice Architecture. doi:10.2139/ssrn.1583509
[30] Amos Tversky and Daniel Kahneman. 1974. Judgment under Uncertainty: Heuristics and Biases. 185 (1974).
[31] Hans Van Eyghen. 2022. Cognitive Bias: Phylogenesis or Ontogenesis? Frontiers in Psychology 13 (2022). doi:10.3389/fpsyg.2022.892829
[32] Chien-Sheng Wu, Steven C.H. Hoi, Richard Socher, and Caiming Xiong. 2020. TOD-BERT: Pre-trained Natural Language Understanding for
Task-Oriented Dialogue. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Association for
Computational Linguistics, Online, 917–929. doi:10.18653/v1/2020.emnlp-main.66
[33] Qinyu Xiao, Emma Lam, Muhrajan Piara, and Gilad Feldman. 2021. Revisiting status quo bias: Replication of Samuelson and Zeckhauser (1988).
Meta-Psychology 5 (Feb. 2021). doi:10.15626/MP.2020.2470
[34] Yusuke Yamamoto. 2024. Suggestive answers strategy in human-chatbot interaction: a route to engaged critical decision making. Frontiers in
Psychology 15 (March 2024), 1382234. doi:10.3389/fpsyg.2024.1382234
[35] Xiaoying Zhang, Baolin Peng, Kun Li, Jingyan Zhou, and Helen Meng. 2023. SGP-TOD: Building Task Bots Effortlessly via Schema-Guided
LLM Prompting. In Findings of the Association for Computational Linguistics: EMNLP 2023 . Association for Computational Linguistics, Singapore,
13348–13369. doi:10.18653/v1/2023.findings-emnlp.891
[36] Jeffrey Zhao, Yuan Cao, Raghav Gupta, Harrison Lee, Abhinav Rastogi, Mingqiu Wang, Hagen Soltau, Izhak Shafran, and Yonghui Wu. 2023.
AnyTOD: A Programmable Task-Oriented Dialog System. http://arxiv.org/abs/2212.09939 arXiv:2212.09939 [cs].
Received 20 February 2007; revised 12 March 2009; accepted 5 June 2009
Manuscript submitted to ACM