loader
Generating audio...
Extracting PDF content...

arxiv

Paper 2503.04692

The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making

Authors: Stephen Pilli, Vivek Nallur

Published: 2025-03-06

Abstract:

Persuasion through conversation has been the focus of much research. Nudging is a popular strategy to influence decision-making in physical and digital settings. However, conversational agents employing "nudging" have not received significant attention. We explore the manifestation of cognitive biases-the underlying psychological mechanisms of nudging-and investigate how the complexity of prior dialogue tasks impacts decision-making facilitated by conversational agents. Our research used a between-group experimental design, involving 756 participants randomly assigned to either a simple or complex task before encountering a decision-making scenario. Three scenarios were adapted from Samuelson's classic experiments on status-quo bias, the underlying mechanism of default nudges. Our results aligned with previous studies in two out of three simple-task scenarios. Increasing task complexity consistently shifted effect-sizes toward our hypothesis, though bias was significant in only one case. These findings inform conversational nudging strategies and highlight inherent biases relevant to behavioural economics.

Paper Content: on Alphaxiv
Page 1: The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making STEPHEN PILLI, University College Dublin, Ireland VIVEK NALLUR, University College Dublin, Ireland Persuasion through conversation has been the focus of much research. Nudging is a popular strategy to influence decision-making in physical and digital settings. However, conversational agents employing “nudging” have not received significant attention. We explore the manifestation of cognitive biases—the underlying psychological mechanisms of nudging—and investigate how the complexity of prior dialogue tasks impacts decision-making facilitated by conversational agents. Our research used a between-group experimental design, involving 756 participants randomly assigned to either a simple or complex task before encountering a decision-making scenario. Three scenarios were adapted from Samuelsonś classic experiments on status-quo bias, the underlying mechanism of default nudges. Our results aligned with previous studies in two out of three simple-task scenarios. Increasing task complexity consistently shifted effect-sizes toward our hypothesis, though bias was significant in only one case. These findings inform conversational nudging strategies and highlight inherent biases relevant to behavioural economics. CCS Concepts: •Do Not Use This Code →Generate the Correct Terms for Your Paper ;Generate the Correct Terms for Your Paper ; Generate the Correct Terms for Your Paper; Generate the Correct Terms for Your Paper. Additional Key Words and Phrases: Do, Not, Us, This, Code, Put, the, Correct, Terms, for, Your, Paper ACM Reference Format: Stephen Pilli and Vivek Nallur. 2018. The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making. In Proceedings of Make sure to enter the correct conference title from your rights confirmation email (Conference acronym ’XX). ACM, New York, NY, USA, 20 pages. https://doi.org/XXXXXXX.XXXXXXX 1 Introduction To say that computers permeate every aspect of our modern life, is now trite. Digital interfaces mediate interactions with loved ones, transport, medical checkups, entertainment, and sometimes even food and sexual choices. The function and form of these interfaces influence not only our view of the world, but also how we influence the world. Our decisions on which action to take, which option to ignore, and what aspect of a problem to pay attention to, are all affected by the contextual elements within which the decision scenario appears. When these contextual elements are created, enhanced, elided, magnified, or deleted by digital interfaces, it would be reasonable to say (we hope) that the digital interface is critically affecting the decision itself. In its simplest form, decision-making requires a decision scenario, and set of alternatives to choose from. The presentation of alternatives invariably influences the decision-maker, and therefore even a random or ‘unthoughtful’ Authors’ Contact Information: Stephen Pilli, stephen.pilli@ucdconnect.ie, University College Dublin, Dublin, Ireland; Vivek Nallur, vivek.nallur@ucd.ie, University College Dublin, Dublin, Ireland. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. ©2018 Copyright held by the owner/author(s). Publication rights licensed to ACM. Manuscript submitted to ACM Manuscript submitted to ACM 1arXiv:2503.04692v1 [cs.HC] 6 Mar 2025 Page 2: 2 Pilli et al. presentation creates an impact (whether intended or not). This has led to terms like ‘choice architecture’ being used to characterise the presentation of alternatives in decision scenarios [ 29]. An etiology of decision-making reveals that the alternatives we pay attention to, are impacted by the cognitive biases that all human beings have [ 31]. These have often been called shortcuts in decision-making [ 16]. At this juncture, it is important to note that these shortcuts are not necessarily good or bad, in a traditional sense of evaluating decisions. Rather they pertain to the speed with which certain pathways in the brain get activated in particular decision-making contexts. These shortcuts arise from human experiences, emotions, beliefs, and preferences, with different kinds of decisions being affected by different biases. Other common factors include cognitive limitations (such as memory), information limitation, emotions, feelings, and social norms [ 11]. Over a hundred cognitive biases have been catalogued over the years, in the field of behavioural economics [ 5]. For example, status quo bias is the preference for the maintenance of oneś current or previous state of affairs, or a preference to not undertake any action to change this current or previous state [ 25]. An instance of where such a bias has been used in the real world is the mobile payment app Square, where customers are required to actively choose a “no tipping” option if they decide not to leave a tip. Such a choice architecture has resulted in increased tipping, particularly in situations where tipping has traditionally been minimal or uncommon [ 4]. Such an arrangement of alternatives exploits individuals’ resistance to change, and thereby leading them to stick with the default suggested tip amount. A particular form of digital interface that has seen a rapid rise in popularity is the AI-enhanced conversational agent, also called a chatbot. Chatbots have seen adoption in domains as diverse as financial decision-making to ca- reer counselling to psychometric testing [ 10] [7] [19]. Fluency in language generation, using LLMs, has significantly advanced the acceptance of chatbots as a viable digital interface. In conversational choice environments like task- oriented conversational agents ( i.e., a chatbot built for a specific task), decision-making occurs in the context of a dialogue. A user is cognitively involved in making a decision and depends on the chatbot, both for informational context as well as articulation of alternatives. In these contexts, it is trivial for the chatbot to use particular kinds of utterances to exploit the likelihood of a cognitive bias. We use ‘likelihood’ advisedly, since all human beings are not affected by the same set of biases. Nor are they activated in the same situations. Again, we wish to reiterate that we make no value-judgement on these interactions; the chatbot could very well use utterances that diminish the likelihood of the bias, or use the bias to guide the human towards a good decision. An utterance, or a sequence of utterances, could use multiple factors such as emotional valence or dialogue complexity to activate (or diminish) the bias. Based on this, we pose the following research question. How does cognitive load, induced by the task complexity, influence susceptibility to cognitive biases in decision-making scenarios, presented by a task-oriented chatbot? To address this research question, we create a prototype of a task-oriented chatbot that can engage with individuals on a preference elicitation task. The chatbot is designed to present decision scenarios that are adapted from Samuelson’s classic experiments with status-quo bias [ 25]. First, to establish a baseline, we attempt to replicate status-quo bias using a chatbot. A simple preference elicitation task is employed for the prior dialogues. Subsequently, in our second set of experiments we increase the task complexity of the prior dialogue and observe how it influences the status- quo bias against the baseline. We repeat the experiments with three distinct decision scenarios: budget allocation, investment decision-making, and college job selection. These scenarios have been previously replicated by Xiao et Manuscript submitted to ACM Page 3: The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making 3 al. [33], establishing their suitability as testcases. Through these experiments, we establish a replicable relation between dialogue task complexity of a task-oriented chatbot and the manifestation of status-quo bias in subsequent decision scenarios. The rest of this paper is structured as follows: the Background section introduces key concepts such as decision- making in conversational choice environments, status-quo bias, and cognitive load. The Related Work section reviews previous research and experiment in the intersection of cognitive biases and conversational agents, highlighting gaps in existing studies. The Hypothesis Development section presents hypotheses, focusing on the influence of task complexity on decision-making. The Methodology section explains the experimental design, including chatbot implementation, decision scenarios, and cognitive load measurement. It also describes data collection and quality control procedures. The Results section presents findings from the experiments, showing how task complexity affects cognitive load and decision-making. The Conclusion and Future Work section summarises key insights and suggests areas for further research, such as refining chatbot designs, testing different biases, and improving experimental methods. 2 Background We begin by introducing the Conversational Choice Environment , where task-oriented dialogue systems facilitate decision- making through structured interactions. Next, we examine Status Quo Bias , a cognitive bias that leads individuals to favour existing states over change, even when better alternatives are available. We discuss foundational experiments demonstrating the presence of this bias and its implications in various decision-making contexts. Following this, we introduce Cognitive Load , which refers to the mental effort required to process information. We explore its impact on task performance and decision-making, particularly in chatbot interactions. We review studies on how cognitive load can be manipulated through dialogue design and how it affects user experience. 2.1 Conversational Choice Environment Task-oriented dialogue systems are designed to have algorithmic yet natural language interaction with humans. Primarily, the dialogue is designed to accomplish a defined set of goals or tasks, with various factors contributing to its success. A plethora of studies suggest that aspects of a chatbot dialogue such as, anthropomorphism, personality, trust, and many other factors play a key role improving human-like dialogue [6]. AConversational Choice Environment is specific instance within a dialogue facilitated by the conversational decision- maker interface technologies - such as chatbots, personal assistants, and virtual assistants - that require people to make judgements or decisions. A CCE primarily consists of two parts: the first is the decision scenario1, and the second is the dialogue that leads up to the decision scenario which we call the decision context2. A decision scenario comprises of a real-word or a hypothetical scenario with a finite list of alternatives. Decision scenarios naturally occur in any task-oriented dialogue such as flight booking, restaurant selection, trip planning. Implementation of the task-oriented dialogue is traditionally achieved by task-oriented dialogue systems (TODs).In recent times TODs have evolved with advancements like ANYTOD [ 36] and TOD-BERT [ 32], enabling zero-shot learning and improved dialogue management. Large Language Models (LLMs) offer greater flexibility, as seen in SGP-TOD, achieving state-of-the-art performance [ 35]. The second part of a Conversational Choice Environment is decision context. Decision context can be defined as the dialogue that leads to the decision scenario. Alternatively, it is the dialogue or the discourse prior to the decision 1The words “decision scenario” and “choice problem” are used interchangeably in literature and in this paper. 2The words “decision context”, “prior dialogue”, and “prior discourse” are also used interchangeably. Manuscript submitted to ACM Page 4: 4 Pilli et al. scenario. A personality test for creating an e-portfolio, or a scenario based question before suggesting an investment plan are examples of a decision context. A formal representation of a dialogue with decision context and choice problem is given as follows: D={𝑢𝑠𝑦𝑠 1,𝑢𝑢𝑠𝑟 2,𝑢𝑠𝑦𝑠 3,...𝑢𝑠𝑦𝑠 𝑡−𝑘,...,𝑢𝑢𝑠𝑟 𝑡−1, | {z } Decision context𝑢𝑠𝑦𝑠 𝑡,𝑢𝑢𝑠𝑟 𝑡+1| {z } Decision Scenario and Response...} A dialogueDconsists of utterances of dialogue system ( sys) and user ( usr). The utterances between 𝑡−𝑘and𝑡act as the decision context to the decision scenario at 𝑡.𝑘−1is the length of the decision context. 2.2 Status Quo Bias Status Quo Bias is defined to be present in decision-making, when individuals favour maintaining current conditions over change, regardless of rationality. For example, in a restaurant recommendation scenario, users may prefer the chatbot’s default suggestion over exploring other options. Rational choice theory posits that decisions are made to maximise utility; however, status-quo bias contradicts this, as individuals opt for familiarity despite potential gains [ 18]. In decisions under uncertainty, status-quo bias manifests as a preference for familiar outcomes, even if alternative options promise superior outcomes [ 30]. Similarly, in decisions under certainty, individuals exhibit a reluctance to deviate from existing circumstances, even when objectively advantageous alternatives exist. Thus, status-quo bias impedes rational decision-making by anchoring individuals to current states, hindering exploration of potentially beneficial alternatives. Over the last few decades, various experiments have shown strong evidence for status-quo bias. One of the founda- tional experiments for status-quo bias was conducted by Samuelson and Zeckhauser [ 25]. They performed controlled experiments using a questionnaire, containing a series of eight choice problems which they called decision scenarios. The participants in the experiments were students enrolled in economics classes at Boston University School of Management and at the Kennedy School of Government at Harvard University. A total of 486students took part in the study. The results of this experiments showed pronounced status-quo bias. In a recent experiment, Qinyu Xiao et.al [ 33] replicated four decision scenarios from Samuelson and Zeckhauser [ 25] study. They found strong empirical support for the status quo bias in three decision scenarios which includes budget allocation investment portfolios and college jobs. 2.3 Cognitive Load Cognitive load refers to the amount of mental effort required to process information [ 27]. Mental effort represents the portion of cognitive capacity that is actively utilised to meet the demands of a task. Consequently, it can be seen as an indicator of the actual cognitive load experienced [20]. Intrinsic, Extraneous, and Germane are various categories of cognitive load [ 20]. Intrinsic cognitive load arises from the inherent complexity of the information. This complexity is determined by the number of element interactions that must be understood simultaneously. Element interactions involve how different pieces of information relate to each other within a task [ 28]. Cognitive load in a piece of information can be controlled by the number of elements and interactions between the elements. When element interactions are numerous and complex, intrinsic cognitive load increases. For instance, understanding a complex mathematical equation involves multiple steps and relationships, Manuscript submitted to ACM Page 5: The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making 5 leading to high cognitive load. Conversely, simple tasks with few element interactions generate low intrinsic cognitive load. Cognitive load is inversely related to task performance. Brachten et al. [ 2] demonstrated that chatbots can reduce the cognitive load needed to complete various tasks. One effective strategy to minimise cognitive load is to avoid presenting long responses or requiring users to provide complex inputs. Similarly, Schmidhuber et al. [ 26] explored the way in which the use of a chatbot affect users’ mental effort when interacting with a new software product and also to what extend does the use of a chatbot affect users’ productivity. The results showed that chatbot users experienced less cognitive load. Fadhil [ 12] suggested that the interactions should be concise and direct, utilising short texts or images for responses and restricted input options, such as buttons, where feasible. This approach simplifies user interactions, making it easier to process information and complete tasks efficiently. By implementing these methods, chatbots can enhance user experience and improve task performance by reducing the cognitive demands placed on users. Therefore, the cognitive load induced by the dialogue is dependent upon the dialogue design. By increasing the elements and interactions between the elements mental effort increases in-turn leading to higher cognitive load. In the experimental setting the cognitive load is measured in various ways. Some common ways are, physiological, behavioural, performance, self-reporting, and subjective measure. One of the common way to measure cognitive load is recording individuals response to NASA-TLX questioner at the end of the task. NASA’s Task Load Index (TLX) scales, initially developed for aerospace applications, is a prominent subjective measure within the Cognitive Load Theory (CLT) framework. Despite its origins, the NASA-TLX remains one of the most widely recognised and established cognitive load measures. 3 Related Work The study of cognitive biases in human decision-making has been a central focus in psychology, behavioural economics, and human-computer interaction. With the increasing prevalence of conversational agents, researchers have explored their potential in both leveraging and fighting the cognitive biases [ 3]. This section reviews previous work that has examined how conversational agents interact with individuals’ cognitive biases, targetting for behavioural influence, nudging, or decision-making interventions. The early contributions by Ali et al. [ 1], examined the susceptibility of children to nudging strategies employed by conversational agents and robots. Their study incorporated a modified version of the Dictator Game to investigate whether children’s generosity could be influenced by different interlocutors. The findings demonstrated that children were more likely to be influenced by conversational agents and robots than by human interlocutors, suggesting the presence of authority bias and social influence bias. These results provided early empirical evidence that chatbots and social robots could effectively exploit cognitive biases in decision-making processes, particularly among vulnerable populations. Pilli et al. [ 23] investigated how conversational agents can serve as tools to measure cognitive biases, specifically focusing on the framing in language and loss aversion. The study employed chatbots to engage participants in structured decision-making tasks and found that framing in language and loss-aversion, the tendency to prefer avoiding losses over acquiring equivalent gains, was also observed in user responses, demonstrating that conversational agents could play a crucial role in understanding individual susceptibility to biases. Dubiel et al. [ 9] explored the impact of voice fidelity in synthetic speech on decision-making, focusing on how auditory cues could exploit cognitive biases and heuristics. Their study investigated whether variations in voice characteristics, such as speech pace and pitch, could alter participants’ choices. The results confirmed that higher-fidelity voices could Manuscript submitted to ACM Page 6: 6 Pilli et al. influence decisions, highlighting the role of source credibility bias and affect heuristic, a mental shortcut in which people rely on their emotions to make quick decisions rather than engaging in detailed analysis. Participants were more likely to perceive and trust high-fidelity synthetic voices, raising concerns about how conversational agents could be designed to subtly nudge user decisions. This research underscored the importance of voice modulation as a persuasive tool in chatbot interactions, expanding the understanding of how non-verbal elements contribute to cognitive biases. Building upon the works of Ali et al. [ 1], Kalashnikova et al. [ 17] examined linguistic nudges and their effectiveness in promoting ecological behaviour change. Their study categorized nudges into those based on reflection and those based on emotions, leveraging biases such as the status quo bias, regret aversion, and social conformity. The research found that chatbots and robots were more effective than human interlocutors in influencing participants’ opinions on environmental issues. These findings reinforced previous studies by demonstrating that cognitive biases could be systematically leveraged in chatbot design to encourage behavioural shifts in various domains. Most recently, Yamamoto [ 34] introduced the concept of “suggestive ending” in chatbot responses. Their study built on the Ovsiankina effect, a cognitive bias where individuals feel compelled to complete unfinished tasks. By designing chatbots that leave responses open-ended or with subtle prompts, users were encouraged to engage in deeper decision-making and information-seeking behaviour. Their study found that users interacting with suggestive chatbots asked more follow-up questions and engaged in prolonged decision-making processes, indicating increased cognitive involvement. This work demonstrated that chatbots could not only influence decision outcomes but also shape the depth of user engagement with a given topic, further expanding the role of conversational agents in cognitive bias manipulation. Ji et al. [ 15] in their latest study explored cognitive biases in spoken conversational search (SCS), emphasising the absence of visual cues and the complexity of unstructured dialogue. Their investigates biases such as confirmation bias, anchoring bias, and exposure effect in SCS interactions. While their work highlights the challenges of detecting cognitive biases in voice-based searches and proposes a multimodal experimental framework, it remains largely theoretical. Nonetheless, it provides a foundation for future research into mitigating biases in conversational AI systems. These studies have explored various aspects of conversational agents like linguistic features, the affect caused by dialogue voice modulation, and length of dialogue and their role in influencing the decision-making. However, these studies focus on individual decision points or specific nudging strategies, they do not account for how previous interactions, or conversational context, influence the subsequent decision-making. Conversational agents engage users in multi-turn interactions, where the complexity of the dialogue itself imposes cognitive load that influences later decisions. In our work, we investigate how the cognitive load induced by conversational task affects subsequent decision-making processes. By examining this dimension, we aim to contribute to a deeper understanding of how conversational agents can influence cognitive biases and nudging strategies. 4 Hypothesis Development The objective of this study is to examine how the cognitive load induced by the task complexity of a task-oriented chatbot’s dialogue influence individuals’ susceptibility to cognitive biases over a subsequent decision-making scenarios. First, we investigate whether decision scenarios presented through a chatbot can replicate the findings of previous studies on status quo bias. Establishing this baseline is crucial to validating the chatbot’s effectiveness in eliciting similar behavioural patterns observed in traditional experiments. To achieve this, we design a simple task as the decision context and subsequently introduce decision scenarios that adapted from classic status-quo bias studies. Manuscript submitted to ACM Page 7: The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making 7 As an initial step, we hypothesise the following, which is adapted from the original hypothesis proposed in Samuelson and Zeckhauser (1988) and its subsequent replications: H1Participants presented with an alternative framed as the status-quo using chatbot will be more likely to choose that alternative than if the alternative was the non-status-quo alternative. Next, we examine how increasing task complexity influences individuals’ susceptibility to status quo bias in sub- sequent decision-making scenarios. Specifically, we replace the simple task used in the baseline experiment with a more complex task, increasing cognitive load before the decision scenario. By doing so, we aim to explore whether heightened cognitive effort makes individuals more reliant on heuristic-driven decision-making, thereby amplifying the status quo effect. This leads to our second hypothesis: H2Participants who undergo a complex task before the decision scenario will exhibit a stronger status quo effect compared to those who undergo a simple task. By formulating these hypotheses, we aim to systematically investigate the relationship between task complexity in chatbot interactions and status quo bias in decision-making. Establishing a baseline through H1 allows us to confirm whether chatbot-based decision scenarios align with prior studies on status quo bias. Expanding on this, H2 enables us to explore how increased cognitive load influences individuals’ susceptibility to this bias. The following section details the methodology used to test these hypotheses, including the decision scenarios, task manipulations, experimental design, and data collection procedures. 5 Method This section outlines the methodology used to investigate how task complexity in chatbot interactions influences status quo bias in subsequent decision-making. Our approach is structured into multiple experimental conditions designed to test our hypotheses systematically. We begin by detailing the decision scenarios, which serve as the core choice problems in our chatbot-based interactions and discusses how we carefully integrate these scenarios into our chatbot system while ensuring consistency with prior research. Next, we describe the decision context tasks, which introduce either simple or complex preference elicitation tasks before the decision scenario. The details of the tasks are discussed in detail along with survey involved to measure necessary aspects from the tasks such as memory recall and cognitive load. We then discuss our experimental design, including a-priori power analysis to determine an adequate sample size, data collection procedures, and quality control measures such as attention checks, seriousness screening, and open science framework registration. Our study is conducted as a between-subjects experiment with multiple conditions across three decision scenarios and two decision context conditions. 5.1 Decision Scenarios The original study [ 25]experimented with eight scenarios and five conditions. The replication study phase I experi- mented two scenarios while phase II experimented with four. These scenarios are experimented in different choice set configuration. For instance, budget allocation scenarios from the original study was explored under two alternatives (pairs), three alternatives (triples), and four alternatives (quads) in the choice set. Our experiments adapted three scenarios. The three scenarios are budget allocation, investment portfolios, and college jobs. The rationale behind this decision is that these three scenarios have shown to have pronounced status-quo effect both in original and the replication study at least in quads configuration. The chatbot is designed to have have an introductory utterances like greetings which is followed by decision context and which then is followed by a decision scenario. We took utmost Manuscript submitted to ACM Page 8: 8 Pilli et al. care while integrating these decision scenarios into the conversational agent. The decision scenarios and the list of alternatives are are present in supplementary material. As outlined in the original study, the neutral condition presents two alternatives without designating either as the status-quo, while in the status-quo conditions, one of the alternatives is explicitly framed as the status-quo. Following both the original and replication studies, we adopt the same three-condition design: a neutral condition, where no option is framed as the status quo, and two status quo conditions, where each alternative is separately framed as the status-quo in different trials. The following is one of the decision scenario and conditions. College Jobs Scenario Neutral Condition: Having just completed your graduate degree, you have two offers of teaching jobs in hand. When evaluating teaching job offers, people typically consider the salary, the reputation of the school, the location of the school, and the likelihood of getting tenure (tenure is permanent job contract that can only be terminated for cause or under extraordinary circumstances). Your choices are: •College A: east coast, very prestigious school, high salary, fair chance of tenure. •College B: west coast, low prestige school, high salary, good chance of tenure. College Jobs Scenario Status-quo Condition (College A): As previously mentioned, we experimented with three different decision scenarios. The alternatives of each scenarios are as follows. You are currently an assistant professor at College A in the east coast. Recently, you have been approached by colleague at other university with job opportunity. When evaluating teaching job offers, people typically consider the salary, the reputation of the school, the location of the school, and the likelihood of getting tenure (tenure is permanent job contract that can only be terminated for cause or under extraordinary circumstances). Your choices are: •Remain at College A: east coast, very prestigious school, high salary, fair chance of tenure. •Move to College B: west coast, low prestige school, high salary, good chance of tenure. Similarly, budget allocation decision scenarios involve allocating the National Highway Safety Commission’s budget between automobile safety and highway safety programs, with different conditions. The alternatives presented vary based on a neutral condition, where both allocation options are presented equally, and status quo conditions, where the current allocation either 60% auto safety / 40% highway safety (60A40H) or 50% auto safety / 50% highway safety (50A50H) influences whether respondents choose to maintain the existing budget or shift funds between programs. In the investment decision making scenario, choices are presented between a moderate-risk investment (Mod. Risk/ Company A) and a high-risk investment (High Risk/ Company B). Under neutral conditions, both options are presented without any prior commitments, while in status quo conditions, the investor inherits a portfolio already allocated to either moderate-risk or high-risk investments, potentially influencing their preference to maintain the existing investment or switch. The details of the choice problems are made available in supplementary material. 5.1.1 Manipulations. We made few adjustments to the decision scenarios to address gaps in the original study and ensure consistency across different configurations. Below, we outline the key modifications made to the phrasing, structure, and presentation of the scenarios. Manuscript submitted to ACM Page 9: The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making 9 Original study didn’t provide phrasing for decision scenarios for pairs configuration, therefore we modified them based on the triples and quads. In the neutral condition of the college job scenarios, as shown above, the modification made to the original scenario involves reducing the number of teaching job offers from four to two. Similarly, in the status quo condition of the college jobs scenario, the modification made to the original scenario involves changing “colleagues at other universities with job opportunities.” to “colleague at other university with job opportunity”. In all the decision scenarios, we moved from ordered list presentation of the alternatives to bullet points. The replication study by Xiao et al. [ 33] included a method to assess whether participants understood the decision scenario. Participants were first shown the scenario and asked various related questions before being presented with the decision alternatives. In our study, we adopted a different approach to maintain experimental validity. We asked the comprehension questions later in the survey but not during the conversational agents interaction. We assessed the accuracy of each individual’s recall, referring to it as their Decision Scenario Recall Task Performance. 5.2 Prior Discourse The prior discourse incorporates two different tasks. The first task (labelled as Simple Task) adopts a preference elicitation task where five attributes are elicited from the user through closed yes or no questions. The design for the condition follows a conservative dialogue strategy and is not cognitively demanding. The second task (labelled as Complex Task) uses a preference elicitation task where arithmetic comparison between the attributes and memorisation of the outcomes requires additional mental effort. Since the design of the task is cognitively demanding we hypothesise that such design will results in the increased cognitive load. All the condition adopts a preference elicitation task in six different domains of the Schema Guided Dialogue (SGD) dataset [ 24], these domains are primarily properties (real estate), music, movies, calendar, banks, and messaging apps. These domains are carefully selected to have no influence on the subsequent decision scenario. This design decision is to make sure that choices made over the decision scenarios are only influenced by the affect of cognitive load due to decision context but strictly not by the domain of the task. The design of the Simple preference elicitation task is straight forward. Related attributes of each domain and the closed questions of each attribute are made available in supplementary material. Whereas, the complex task requires some exposition with an example as it consists of complex design structure. In the complex task, users are presented with a scenario where they select from various property recommendations. Similar structures are used in other domains, including artist recommendations, streaming services, calendar apps, and banking options. Each property is characterized by three attributes, such as number of bedrooms, square footage, and star rating. For the first property, real values are explicitly provided (e.g., three bedrooms, 2000 sq. ft., and a 4-star rating). However, in subsequent interactions, users must perform arithmetic calculations to determine the actual values of the attributes for other properties. This approach follows the standard approach to increase cognitive load, which involves a combination of arithmetic reasoning and memory recall tasks [ 8]. By progressively increasing the number of required calculations and memory dependencies, the cognitive effort demanded by the task also increases, making decision-making more cognitively taxing. Manuscript submitted to ACM Page 10: 10 Pilli et al. U1. In the following scenario choose from various property recommendations. U2.The first property has three bedrooms, 2000 square feet, and a 4-star rating. The second property has twice the number of bedrooms and with the same size and rating. Which one do you prefer, and why? U3.The third property has the same number of bedrooms as the second one but is half the size of the first one, with the same rating as the first. Which one do you prefer, and why? U4.The fourth property has the same number of bedrooms as the second, the same size as the third, but one less star rating than the first. Which one do you prefer, and why? U5.Remember the details of the fourth property. Specific information related to the property will be requested later. In the above example, while responding to the third dialogue utterance (U3), the user is required to put mental effort to calculate the real value of the attribute bed room as it requires user to compare the second property with the first property and deduce a real value for third property. By definition the interactions between the element is that which additional mental effort resulting in cognitive load. Therefore the elements here are the attributes and the interactions between these elements is the arithmetic performed by the individual. Refer to the supplementary material for other domains of preference elicitation task. Number of recursive calculations increases with each utterance therefore the cognitive load is expected to increase after each utterance. This results in the mental effort required along with progression of dialogue. Finally, the user is asked to memorise the deduced attribute information related to the property In theory, the task design must substantially increase the cognitive load of the individual. To test the cognitive load, standard NASA-TLX [22] survey adopted for all the interactions. 5.3 A-priori Power Analysis We conducted power analysis using the software programme 𝐺∗𝑃𝑜𝑤𝑒𝑟 [13] to determine the required sample size. Our objective was to achieve a power of 𝛽of 0.80 to detect a medium effect size ( 𝐶𝑜ℎ𝑒𝑛′𝑠𝜔) of 0.3 (Our study investigates the difference between the effect size under various conditions rather than emphasising on establishing a statistically effect for each condition. Therefore, any where between medium to large effect size is desirable.) with a standard alpha error probability 𝛼of 0.05, and with 1 degree of freedom [ 21]. Based on the analysis, the target sample size for each condition was calculated to be 42 participants. Given that our study includes two decision context conditions (simple task and complex task), three decision scenarios (budget allocation, investment decision making, college jobs), and three decision scenario conditions (one neutral and two status quo conditions), we require a sample 756 participants for a between-subjects experimental design. 5.4 Data Collection, Quality, and Integrity To ensure high-quality data collection, we implemented rigorous compensation, integrity, and quality control mecha- nisms throughout the study. 5.4.1 Compensation and Participant Recruitment. Participants were recruited through Prolific, a widely used platform known for ensuring data quality and participant reliability. The estimated completion time for the survey was eight minutes, and participants were compensated according to Prolific’s recommended minimum rate of $8 per hour, receiving more than $0.80 after successful completion of the survey. A total of 756 participants were recruited, and measures were put in place to prevent duplicate participation. Additionally, demographic information was obtained from Prolific’s database, allowing for diversity verification and eligibility confirmation. Manuscript submitted to ACM Page 11: The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making 11 5.4.2 Data Quality. To maintain data quality participants were asked whether they had previously encountered the decision scenarios, this is to assess familiarity and ensure prior exposure did not bias their responses. Similarly, domain familiarity was elicited verify participants’ understanding of the subject matter and determine relation with the outcomes. To further ensure data reliability, participants were asked to rate their seriousness at the end of the survey. If a participant indicated a lack of engagement or provided responses suggesting inattentiveness, their data was flagged for review. In cases where engagement was deemed insufficient, a data recollection process was initiated to replace unreliable responses with high-quality data. (have to provide the stats for all these) To verify attentiveness, a memory recall task was included, where participants were asked to recall specific details from their interaction with the chatbot. This helped confirm whether participants had actively engaged with the task or merely responded at random. To ensure that participants processed cognitive load naturally, they were explicitly asked not to use external aids, such as pen and paper, during cognitive tasks. This measure was necessary to maintain the intended cognitive load effects and prevent external assistance from influencing the results. 5.4.3 Data Integrity. An automated data integrity mechanism was also incorporated into the system. Upon submission, participant responses, captured as a Json file, were automatically emailed to both the study authors and a publicly accessible email address. This system was designed to enhance transparency and accountability, ensuring that data was securely logged and accessible for verification. The dataset will be made available for read-only public access, with access codes provided upon request. 5.4.4 Data Cleaning and Rejection Criteria. Participants were not automatically rejected based on a single failed attention check. Instead, a dual-layer filtering approach was applied to distinguish between minor lapses and systematic inattention. If a participant failed a Decision Context Attention Check (DCAC) but provided otherwise high-quality responses, their data was retained. However, participants who failed both attention and quality checks were removed from the dataset. Rejected responses were replaced through recollection procedures to maintain the required sample size while ensuring the integrity of the dataset. By implementing these measures, we ensured that the final dataset was robust, reliable, and free from low-quality responses, enhancing the validity of our findings. 5.4.5 OSF Pre-Registration. The hypotheses and the experimental design were preregistered in the Open Science Framework (OSF) to ensure transparency and reproducibility. This preregistration includes detailed documentation of our study’s objectives, methodology, decision scenarios, experimental conditions, and analysis plans. By publicly sharing our hypotheses in advance, we mitigate the risks of confirmation bias and p-hacking, ensuring that our research follows a predefined analytical framework. The preregistration also includes a-priori power analysis, justifications for sample size selection, and all materials used in the study. The complete pre-registration can be accessed at https://osf.io/psxvf, where all study materials, analysis scripts, and methodological disclosures are available for verification and replication. 5.4.6 IRB. The ethics of conducting research involving human participants is of utmost importance. This study was approved by the Institutional Review Board (IRB) and adheres to the ethical guidelines established by the declaration. Human Research Ethics Committee – Sciences (HREC-LS): LS-C-25-001-Pilli-Nallur. 5.5 Procedure The entire experiment was developed as an interactive Streamlit application [ 14] and is hosted on Google Cloud Platform (GCP). The conversational user interface (CUI) is built using the Streamlit chat interface. On the back end, the Manuscript submitted to ACM Page 12: 12 Pilli et al. Fig. 1. Chatbot interaction conditions, survey, and Cognitive load survey. OpenAI API powers the chatbot. Carefully crafted prompts are designed to simulate various experimental conditions. Additionally, the source code and condition-specific prompts are provided in the supplementary materials. Participants began the interaction by reviewing an information sheet detailing the study’s purpose, procedures, and ethical considerations. Upon completion, the participants are allowed to access the experiment’s homepage displaying their prolific ID and consent form. The checkbox for consent was initially unchecked, requiring participants to actively select "Yes" to indicate their agreement before proceeding. Only after providing informed consent were participants allowed to continue to the main experiment. The chatbot initiates the conversation with a greeting, followed by a decision context task that was assigned based on the participant’s experimental condition. Participants in the simple task condition engages in a straightforward preference elicitation task, while in the complex task condition undergoes a more cognitively demanding task involving arithmetic comparisons and memory recall as described in the section 5.2. Upon completion, subsequently in the same chat interaction, participants undergo a random decision scenario (discussed in previous section), where they were required to choose between a set of alternatives. In the simple task condition, participants engage with both the simple task and the complex task; however, the complex task is introduced after the decision scenario. Towards the end of the survey, participants are first presented with the transcript of their interaction from the simple task, followed by the NASA-TLX survey to assess their cognitive load. Subsequently, they are shown the transcript of the complex task interaction, after which they complete another NASA-TLX survey. This sequential approach ensures that cognitive load is measured separately for each task, allowing for a more precise evaluation of the effect of task complexity on decision-making. Conversely, in the complex task condition, participants completed both the simple and complex tasks before being presented with the decision scenario as show in Figure 1. This was a deliberate design, to make participant take complex task NASA-TLX survey in relation to simple task. This approach ensures greater accuracy compared to administering the surveys independently, minimizing recall bias and providing a more precise assessment of cognitive load. After completing their chatbot interaction, participants were redirected to a survey page. This survey included a memory recall task to assess attentiveness, along with attention check questions to ensure data integrity. Manuscript submitted to ACM Page 13: The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making 13 6 Results This section presents the findings of the study, focusing on the impact of task complexity on cognitive load following the influence of task complexity on simple task and complex task. We begin with descriptive statistics, providing an overview of participant demographics and data quality assessments to ensure the validity of our dataset. Next, we analyse task complexity and cognitive load, examining how different levels of cognitive effort influenced participants’ responses. Finally, we report the statistical significance and effect sizes of status-quo bias under simple and complex tasks. 6.1 Descriptive Statistics 6.1.1 Demographics. The dataset comprises 756 individuals (mean age = 41.81 years, range 18–81) who took an average of 10.20 minutes (with SD=4.82) to complete the study. The sample is slightly skewed toward female participants (53% female, 47% male) and is predominantly White (82.1%), with most participants residing in either the United Kingdom (71.2%) or the United States (27.3%). The majority reported English as their primary language (99.7%). Regarding student status, 76.1% were non-students, while 11.8% were students, with some missing data. In terms of employment, 44.8% were in full-time roles, 17.8% in part-time work, and 6.4% were unemployed and seeking jobs, while others fell into various categories, including homemakers, retirees, or those due to start a new job soon. 6.1.2 Data Quality. To ensure the experimental validity, several measures were implemented. These checks aimed to verify participant engagement, familiarity with the decision context, attentiveness, and authenticity of responses, minimising potential biases that could affect the results. The familiarity with the decision context domain was generally high, with a mean score significantly above the midpoint (𝑡= 17.09,𝑝< 0.001), indicating that participants were well-acquainted with the subject matter. The majority of participants (698 out of 756) reported that they had not seen the decision scenarios before, while others were unsure, and the distribution was statistically significant ( 𝑝= 0.00). Additionally, none of the participants reported using external tools or expressed a need for them, confirming that decision-making relied on internal cognitive resources. Seriousness scores were high, with a mean score significantly above the midpoint ( 𝑡= 42.79,𝑝< 0.001), suggesting that participants engaged seriously with the survey tasks. These findings, along with Decision Context Attention Checks (DCAC), ensured that participants were attentive, engaged, and provided reliable data for hypothesis testing. To identify and handle outliers in response times, we used the Interquartile Range (IQR) method. This involved calculating the first quartile (Q1) and third quartile (Q3) and determining the IQR as the difference between them. Response times falling below the lower bound (Q1 minus 1.5 times the IQR) or above the upper bound (Q3 plus 1.5 times the IQR) were flagged as potential outliers. Using this method, we initially identified 53 responses for exclusion. However, some extreme outliers appeared due to technical issues, with unusually large response times (e.g., 1,329,910.313 s and 1,331,160.924 s; 𝑛= 22). To ensure these were not mistakenly excluded, we cross-checked them against recorded start and completion times from Prolific and retained those that aligned with expected survey durations. Additionally, some responses had implausible negative or zero values, indicating system serious technical errors ( 𝑛= 5). These cases were removed excluded as well. Importantly, including or excluding these erroneous data points did not significantly impact the overall findings. To maintain a conservative approach, we excluded these unreliable responses ( 𝑛= 31) from the final analysis. Manuscript submitted to ACM Page 14: 14 Pilli et al. NASA-TLX DimensionsSimple Task vs Complex TaskSC1 SC3 SC4 Mental DemandA -2.144*** -2.232*** -1.820*** B -2.255*** -2.429*** -2.570*** EffortA -1.235*** -1.290*** -0.978** B -0.962** -1.512*** -1.514*** PerformanceA -0.546* -0.689** -0.675** B -0.389 -1.124*** -0.677** FrustrationA -0.956** -0.692** -0.400 B -0.750** -0.919** -1.174*** Temporal DemandA -0.593* -0.455 -0.439 B -0.817** -0.824** -0.489 Physical DemandA -0.610* -0.324 -0.233 B -0.561 -0.429 -0.302 Table 1. NASA-TLX Test Results with Effect Sizes and p-value Significance. Significance levels: *** p <0.001, ** p <0.01, * p <0.05, + p <0.10 6.2 Task Complexity and Cognitive Load To investigate the impact of task complexity on cognitive load, we analysed participants’ responses to the NASA-TLX survey after engaging with both simple and complex tasks during the chatbot’s interactions. For more details please refer to the Section 5.5. The results show that Mental Demand had the largest differences between simple and complex tasks across all scenarios (budget allocation, investment decision making, college jobs). The effect sizes, as tabulated in Table 1 were strong (ranging from -0.57 to -1.161), and most results were highly significant ( 𝑝< 0.001,𝑝< 0.01, or𝑝< 0.05). This suggests that participants experienced much higher mental workload when performing complex tasks. Effort also showed significant differences in multiple scenarios, especially in budget allocation and investment decision making, with effect sizes up to -0.864 and p-values below 0.001 as shown in Figure 2. This indicates that participants had to put in more effort for complex tasks. For Performance, Frustration, and Temporal Demand, some significant effects were found, but the effect sizes were smaller compared to Mental Demand and Effort. Physical Demand showed the least significant differences, suggesting that physical effort was not a major factor in these tasks. Overall, Mental Demand, as shown in the Figure 2 was the most affected by task complexity, making it the most important factor. 6.3 Task Complexity and Status Quo Bias Our study investigated the influence of task complexity on bias detection across three decision-making contexts: budget allocation (BA), investment portfolio (IDM), and college jobs (CJ). We compared our findings to the original study (Samuelson[25]). 6.3.1 Budget Allocation Scenario. In the Samuelson study, when 60A40H was the status quo alternative, 61% (11/18) of participants chose it. However, when 50A50H was framed as the status quo, 76% (13/17) of participants preferred it. This difference was statistically significant ( 𝑝= 0.025), with a moderate to large effect size ( 𝐶𝑜ℎ𝑒𝑛′𝑠ℎ= 0.78). However, the confidence interval ( 𝐶𝐼:-0.26 to 1.82) was wide, suggesting some uncertainty about the true effect size. Manuscript submitted to ACM Page 15: The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making 15 Mental DemandEffort FrustrationPerformance T emporal DemandPhysical Demand NASA-TLX Dimension0.00.51.01.52.0Mean Absolute Effect SizeMean Effect Sizes for Each NASA-TLX Dimension Fig. 2. Mean Effect Sizes for Each NASA-TLX Dimension AlternativesChoice rates Status quo framing vs. non-status quo framing SQ N NSQ 𝜒2p Odds ratio(95% CI) Cohen’s h(95%CI) Scenario 1: Budget allocation ratios 60A40H 16/41 (0.39) 12/42 (0.29) 4/42 (0.1) 9.87 0.002* 6.08 [1.82, 20.31] 0.72 [-0.13, 1.58] 50A50H 38/42 (0.9) 30/42 (0.71) 25/41 (0.61) Scenario 2: Investment portfolios Mod. Risk 32/42 (0.76) 32/40 (0.8) 33/42 (0.79) 0.07 0.794 0.87 [0.31, 2.43] -0.06 [-0.78, 0.67] High Risk 9/42 (0.21) 8/40 (0.2) 10/42 (0.24) Scenario 3: College jobs College A 25/41 (0.61) 19/41 (0.46) 21/41 (0.51) 0.79 0.373 1.49 [0.62, 3.58] 0.2 [-0.42, 0.82] College B 20/41 (0.49) 22/41 (0.54) 16/41 (0.39) Table 2. Simple Task - Status quo framing vs. non-status quo framing Similarly, in our simple task condition, 39% (16/41) of participants preferred 60A40H, while 90% (38/42) chose 50A50H when it was the status quo. This difference was statistically significant ( 𝑝= 0.002), with an effect size of ℎ= 0.72 as show in the Table 2. While the confidence interval ( 𝐶𝐼:-0.13 to 1.58) remained wide, the effect size closely matched that of the Samuelson’s study, confirming that status quo bias was successfully detected in this condition. In the complex task condition, 54% (20/37) of participants preferred 60A40H, while 83% (34/41) selected 50A50H as the status quo alternative as show in the Table 3. This difference was statistically significant ( 𝑝= 0.001), with the Manuscript submitted to ACM Page 16: 16 Pilli et al. AlternativesChoice rates Status quo framing vs. non-status quo framing SQ N NSQ 𝜒2p Odds ratio(95% CI) Cohen’s h(95%CI) Scenario 1: Budget allocation ratios 60A40H 20/37 (0.54) 9/39 (0.23) 7/41 (0.17) 11.75 0.001* 5.71 [2.02, 16.15] 0.8 [0.07, 1.53] 50A50H 34/41 (0.83) 30/39 (0.77) 17/37 (0.46) Scenario 2: Investment portfolios Mod. Risk 33/39 (0.85) 35/41 (0.85) 27/36 (0.75) 1.08 0.298 1.83 [0.58, 5.8] 0.24 [-0.57, 1.06] High Risk 9/36 (0.25) 6/41 (0.15) 6/39 (0.15) Scenario 3: College jobs College A 24/40 (0.6) 20/39 (0.51) 19/40 (0.48) 1.26 0.262 1.66 [0.68, 4.02] 0.25 [-0.38, 0.88] College B 21/40 (0.52) 19/39 (0.49) 16/40 (0.4) Table 3. Complex Task - Status quo framing vs. non-status quo framing Study Samuelson− BA Simple Task − BA Complex Task − BA Samuelson− IDM Simple Task − IDM Complex Task − IDM Samuelson− CJ Simple Task − CJ Complex Task − CJStatus Quo First Alternative 11/18 16/41 20/37 27/43 32/42 33/39 16/20 25/41 24/40Status Quo Second Alternative 13/17 38/42 34/41 27/48 9/42 9/36 30/38 20/41 21/40p−value 0.025 0.002 0.001 0.069 0.794 0.298 0 0.373 0.262Cohen's h 0.78 0.72 0.8 0.38 −0.06 0.24 1.26 0.2 0.25 −0.5 0 0.5 1 1.5 2 Fig. 3. Comparison of Effect-sizes across various studies. Size of the block indicates the size of the sample. largest effect size ( ℎ= 0.80) observed in our study. Notably, the confidence interval ( 𝐶𝐼:0.07 to 1.53) did not include zero, providing stronger evidence that higher task complexity increases susceptibility to status quo bias. The effect sizes in our study closely aligned with those in the Samuelson study, confirming the presence of status quo bias, particularly in the simple task condition. However, while the confidence intervals in both the Samuelson and simple task conditions included zero, indicating some uncertainty, the complex task condition demonstrated the strongest effect size with a confidence interval that excluded zero, providing some evidence that higher task complexity influences status-quo bias. Manuscript submitted to ACM Page 17: The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making 17 6.3.2 Investment Portfolio Scenario. In the Samuelson IDM condition, 62.8% (27/43) of participants chose the moderate- risk alternative when framed as the status quo, while 56.3% (27/48) preferred the high-risk alternative when it was the status quo alternative. The effect size was small to moderate ( ℎ= 0.38), and although the p-value = 0.069 indicated a trend toward significance, the confidence interval ( 𝐶𝐼:-0.21 to 0.98) suggests some variability in the effect. In our simple task condition, 76.2% (32/42) of participants chose the moderate-risk alternative when it was the status quo, whereas 21.4% (9/42) selected the high-risk alternative when it was framed as the status-quo alternative. However, the effect size was negligible ( ℎ= -0.06), and the p-value = 0.794, indicating no significant difference between conditions. The confidence interval ( 𝐶𝐼:-0.78 to 0.67) was wide, suggesting a high level of uncertainty in the effect. In contrast, the complex task condition showed a positive shift toward the baseline effect observed in the Samuelson study. Here, 84.6% (33/39) of participants chose the moderate-risk alternative as the status quo, while 25.0% (9/36) selected the high-risk alternative when framed as the default. The effect size was small but larger than in the simple task condition ( ℎ= 0.24), suggesting a trend toward increased bias under higher task complexity. Although the p-value = 0.298 did not reach statistical significance, the effect size’s movement closer to the Samuelson’s baseline ( ℎ= 0.38) as shown in the forest plot Figure 3. This provides some evidence that increased task complexity may strengthen status-quo bias, supporting our hypothesis to a certain extent. The effect sizes in our study suggest a directional trend toward status quo bias in investment decision-making, with the complex task condition moving closer to the baseline effect observed in the Samuelson study. While the confidence intervals in both the simple and complex task conditions remained wide and included zero, indicating uncertainty, the increasing effect size under higher task complexity provides some evidence that cognitive load may enhance status-quo bias in investment choices. 6.3.3 College Jobs Scenario. The Samuelson study demonstrated a large and highly significant status-quo bias effect, with 80.0% (16/20) of participants selecting the East Coast (College A) alternative when framed as the status quo, and 78.9% (30/38) choosing the West Coast (College B) alternative when it was in the status-quo position. The effect was highly significant ( 𝑝= 0.000,ℎ= 1.26), and the confidence interval ( 𝐶𝐼:0.31 to 2.21) confirmed a strong and reliable bias effect, reinforcing the tendency to prefer the status-quo option. In our Simple Task condition, 61.0% (25/41) of participants chose the East Coast alternative when presented as the status quo, while 48.8% (20/41) preferred the West Coast alternative when framed as status-quo alternative. The effect size was small ( ℎ= 0.2), and the result was not statistically significant ( 𝑝= 0.373). The confidence interval ( 𝐶𝐼:-0.42 to 0.82) was broad, indicating variability in the observed effect. In the Complex Task condition, a negligible but directional trend towards hypothesis was observed. 60.0% (24/40) of participants chose the East Coast alternative when it was the status quo, while 52.5% (21/40) selected the West Coast alternative when framed as the default. The effect size ( ℎ= 0.25) was slightly higher than in the Simple Task condition, suggesting a minor increase in bias under higher task complexity. However, the p-value ( 𝑝= 0.262) did not reach statistical significance, and the confidence interval ( 𝐶𝐼:-0.38 to 0.88) remained wide, indicating continued uncertainty about the effect. The effect sizes in our study suggest a much weaker status quo bias in the college jobs scenario compared to the Samuelson study, where the effect was large and highly significant. In both the simple and complex task conditions, the effect sizes were small, and the confidence intervals remained wide and included zero, indicating substantial uncertainty. While the complex task condition showed a slight increase in effect size compared to the simple task, the absence Manuscript submitted to ACM Page 18: 18 Pilli et al. of statistical significance suggests that task complexity did not meaningfully amplify status quo bias in this decision context. One possible explanation for this outcome is the demographic composition of our sample, as the majority of participants were from the UK, while the decision scenario was framed for U.S. geography. Given that participants may not have had strong preferences or familiarity with the locations, could have caused this. 6.4 Response Times and Recall Task Performances We further investigated the average response times and recall task performance of the decision scenarios. Participants in the complex task took significantly longer to respond compared to the simple task, reflecting increased cognitive effort. Despite this, their recall performance remained high and unchanged across conditions, indicating full engagement with the decision scenarios regardless of cognitive load. The average response time for the simple task was 53.58s, while participants in the complex task took significantly longer which is 76.24s. Across all decision scenarios, response times in the complex task were consistently higher than in the simple task ( 𝑡= 5.504,𝑝<0.001). These response times also aligned with cognitive load survey, indicating that participants experienced greater cognitive effort under increased task complexity. Despite the longer response time and higher cognitive load, decision scenario recall task performance remained high in both conditions, with no significant difference between the simple task ( 𝑀=0.875,𝑆𝐷=0.069) and the complex task ( 𝑀=0.858,𝑆𝐷=0.063). This suggests that participants were fully engaged with the decision scenarios regardless of cognitive load, though they required more time to complete decisions under higher complexity. 6.5 Inherent Bias in Decision Scenario Alternatives In the budget allocation scenario, there was a clear preference for the 50A50H alternative, even when 60A40H was set as the status quo. Specifically, 61% of participants still preferred 50A50H, indicating an inherent bias toward equal budget distribution as shown in Table 2. However, this preference diminished under the complex task, where the difference between choices narrowed to 46%. The neutral condition revealed the strongest bias, with a 29% vs. 71% split, further highlighting the natural inclination toward 50A50H (Refer Table 3). Importantly, in the complex task condition, this inherent bias remained relatively stable (23% vs. 77%), suggesting that increasing task complexity did not significantly alter this trend. In the investment decision-making scenario, participants showed a strong bias toward Moderate Risk investments. When High Risk was set as the status quo, 75% still preferred Moderate Risk, indicating a strong tendency to avoid higher-risk choices. This bias was even more pronounced in the neutral condition, where over 80% of participants chose Moderate Risk in both the simple and complex task conditions. Moreover, increasing task complexity did not significantly alter this preference, suggesting that the bias toward Moderate Risk investments is robust and largely unaffected by task difficulty. In contrast, we did not observe a similar inherent bias in the college jobs scenario. Preferences between College A and College B remained more balanced across conditions, with no strong tendency toward one alternative. 7 Conclusion This study examined the influence of prior discourse complexity in conversational agents on subsequent decision- making, particularly its influence on status quo bias. Across all decision scenarios, we observed a common trend: while increasing task complexity consistently shifted the effect size towards our hypothesis, the change was not statistically significant in most cases. However, in the budget allocation scenario, the complex task condition resulted in a reliable Manuscript submitted to ACM Page 19: The Influence of Prior Discourse on Conversational Agent-Driven Decision-Making 19 detection of status quo bias, as its confidence intervals did not include zero. This suggests that cognitive load can meaningfully reinforce biases under specific conditions. In the investment decision-making scenario, the shift from no effect in the simple task to a small-to-medium effect size in the complex task highlights the need for further exploration. While the effect did not reach statistical significance, the consistent directional movement suggests that heightened cognitive load may play a role in exacerbating status-quo bias. These findings have implications for digital nudging and behavioural economics. Conversational agents are increas- ingly used to guide users in various decision-making tasks, and our results suggest that dialogue complexity may shape the way biases manifest. If cognitive load strengthens biases rather than mitigating them, designers of digital nudges must be cautious in structuring interactions to avoid unintended manipulations. Our findings highlight the presence of inherent biases in decision scenarios, where certain alternatives naturally attract preference regardless of status quo framing. For example, in the budget allocation scenario, participants exhibited a strong preference for equal distribution (50A50H), even when 60A40H was framed as the default. Similarly, in the investment decision-making scenario, moderate-risk options were favoured regardless of framing. These inherent biases can impact experimental outcomes by masking or exaggerating the effects, making it crucial for the future experiments on these decision scenarios carefully design experiments that take baseline preferences into account. Future research should investigate the threshold at which cognitive load meaningfully impacts bias expression and explore additional cognitive biases such as framing, anchoring, and loss-aversion. More rigorous experimental designs, including larger sample sizes, could provide further insights into the interaction between prior dialogue complexity and decision-making. Acknowledgments This work was conducted with the financial support of the Research Ireland Centre for Research Training in Digitally- Enhanced Reality (d-real) under Grant No. 18/CRT/6224. For the purpose of Open Access, the author has applied a CC BY public copyright licence to any Author Accepted Manuscript version arising from this submission References [1]Hugues Ali Mehenni, Sofiya Kobylyanskaya, Ioana Vasilescu, and Laurence Devillers. 2021. Nudges with Conversational Agents and Social Robots: A First Experiment with Children at a Primary School. In Conversational Dialogue Systems for the Next Decade , Luis Fernando D’Haro, Zoraida Callejas, and Satoshi Nakamura (Eds.). Springer, Singapore, 257–270. doi:10.1007/978-981-15-8395-7_19 [2]Florian Brachten, Felix Brünker, Nicholas RJ Frick, Björn Ross, and Stefan Stieglitz. 2020. On the ability of virtual agents to decrease cognitive load: an experimental study. Information Systems and e-Business Management 18, 2 (2020), 187–207. [3]Ana Caraban, Evangelos Karapanos, Daniel Gonçalves, and Pedro Campos. 2019. 23 Ways to Nudge: A Review of Technology-Mediated Nudging in Human-Computer Interaction. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (CHI ’19) . Association for Computing Machinery, New York, NY, USA, 1–15. doi:10.1145/3290605.3300733 [4]Austin Carr. 2013. How Square Register’s UI Guilts You Into Leaving Tips. https://www.fastcompany.com/3022182/how-square-registers-ui-guilts- you-into-leaving-tips [5]Craig R Carter, Lutz Kaufmann, and Alex Michel. 2007. Behavioral supply management: a taxonomy of judgment and decision-making biases. International Journal of Physical Distribution & Logistics Management 37, 8 (2007), 631–669. [6]Ana Paula Chaves and Marco Aurelio Gerosa. 2021. How should my chatbot interact? A survey on social characteristics in human–chatbot interaction design. International Journal of Human–Computer Interaction 37, 8 (2021), 729–758. [7]Miruna Conţ, Aurelia Ciupe, Bogdan Orza, Irina Cohuţ, and Georgiana Niţu. 2022. Career counseling chatbot using microsoft bot frameworks. In 2022 IEEE Global Engineering Education Conference (EDUCON) . IEEE, 1387–1392. [8]Cary Deck and Salar Jahedi. 2015. The effect of cognitive load on economic decision making: A survey and new experiments. European Economic Review 78 (2015), 97–119. [9]Mateusz Dubiel, Anastasia Sergeeva, and Luis A. Leiva. 2024. Impact of Voice Fidelity on Decision Making: A Potential Dark Pattern?. In Proceedings of the 29th International Conference on Intelligent User Interfaces . ACM, Greenville SC USA, 181–194. doi:10.1145/3640543.3645202 Manuscript submitted to ACM Page 20: 20 Pilli et al. [10] Godson D’Silva, Megh Jani, Vipul Jadhav, Amit Bhoir, and Prithvi Amin. 2020. Career counselling chatbot using cognitive science and artificial intelligence. In Advanced Computing Technologies and Applications: Proceedings of 2nd International Conference on Advanced Computing Technologies and Applications—ICACTA 2020 . Springer, 1–9. [11] Scott Eidelman and Christian S. Crandall. 2012. Bias in Favor of the Status Quo. Social and Personality Psychology Compass 6, 3 (2012), 270–281. doi:10.1111/j.1751-9004.2012.00427.x _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/j.1751-9004.2012.00427.x. [12] Ahmed Fadhil. 2018. Domain specific design patterns: designing for conversational user interfaces. arXiv preprint arXiv:1802.09055 (2018). [13] Franz Faul, Edgar Erdfelder, Axel Buchner, and Albert-Georg Lang. 2009. Statistical power analyses using G* Power 3.1: Tests for correlation and regression analyses. Behavior research methods 41, 4 (2009), 1149–1160. [14] Streamlit Inc. 2019. Streamlit: A Faster Way to Build and Share Data Apps. Available at https://streamlit.io/. Accessed: 2025-02-27. [15] Kaixin Ji, Sachin Pathiyan Cherumanal, Johanne R. Trippas, Danula Hettiachchi, Flora D. Salim, Falk Scholer, and Damiano Spina. 2024. Towards Detecting and Mitigating Cognitive Bias in Spoken Conversational Search. In 26th International Conference on Mobile Human-Computer Interaction . ACM, Melbourne VIC Australia, 1–10. doi:10.1145/3640471.3680245 [16] Daniel Kahneman. 2011. Thinking, fast and slow . Farrar, Straus and Giroux, New York, NY, US. Pages: 499. [17] Natalia Kalashnikova, Ioana Vasilescu, and Laurence Devillers. [n. d.]. Linguistic Nudges and Verbal Interaction with Robots, Smart-Speakers, and Humans. ([n. d.]). [18] Yusufcan Masatlioglu and Efe A. Ok. 2014. A Canonical Model of Choice with Initial Endowments. The Review of Economic Studies 81, 2 (April 2014), 851–883. doi:10.1093/restud/rdt037 [19] Rifqi Muhammad. 2023. Barriers and effectiveness to counselling careers with Artificial Intelligence: A systematic literature review. Ricerche di Pedagogia e Didattica. Journal of Theories and Research in Education 18, 3 (2023), 143–164. [20] Fred Paas, Juhani E. Tuovinen, Huib Tabbers, and Pascal W. M. Van Gerven. 2003. Cognitive Load Measurement as a Means to Advance Cognitive Load Theory. Educational Psychologist 38, 1 (Jan. 2003), 63–71. doi:10.1207/S15326985EP3801_8 Publisher: Routledge _eprint: https://doi.org/10.1207/S15326985EP3801_8. [21] Bhavna Pancholi, Mark Dunne, and Richard Armstrong. 2009. Sample size estimation and statistical power analyses. 16 (11 2009). [22] Vinoth Pandian Sermuga Pandian and Sarah Suleri. 2020. NASA-TLX Web App: An Online Tool to Analyse Subjective Workload. http://arxiv.org/ abs/2001.09963 arXiv:2001.09963 [cs]. [23] Stephen Pilli. 2024. Exploring Conversational Agents as an Effective Tool for Measuring Cognitive Biases in Decision-Making. doi:10.48550/arXiv. 2401.06686 arXiv:2401.06686 [cs]. [24] Abhinav Rastogi, Xiaoxue Zang, Srinivas Sunkara, Raghav Gupta, and Pranav Khaitan. 2020. Towards Scalable Multi-Domain Conversational Agents: The Schema-Guided Dialogue Dataset. Proceedings of the AAAI Conference on Artificial Intelligence 34, 05 (April 2020), 8689–8696. doi:10.1609/aaai.v34i05.6394 Number: 05. [25] William Samuelson and Richard Zeckhauser. 1988. Status quo bias in decision making. J Risk Uncertainty 1, 1 (March 1988), 7–59. doi:10.1007/ BF00055564 [26] Johanna Schmidhuber, Stephan Schlögl, and Christian Ploder. 2021. Cognitive Load and Productivity Implications in Human-Chatbot Interaction. In 2021 IEEE 2nd International Conference on Human-Machine Systems (ICHMS) . 1–6. doi:10.1109/ICHMS53169.2021.9582445 [27] John Sweller. 1988. Cognitive load during problem solving: Effects on learning. Cognitive science 12, 2 (1988), 257–285. [28] John Sweller. 2011. CHAPTER TWO - Cognitive Load Theory. In Psychology of Learning and Motivation , Jose P. Mestre and Brian H. Ross (Eds.). Vol. 55. Academic Press, 37–76. doi:10.1016/B978-0-12-387691-1.00002-8 [29] Richard H. Thaler, Cass R. Sunstein, and John P. Balz. 2010. Choice Architecture. doi:10.2139/ssrn.1583509 [30] Amos Tversky and Daniel Kahneman. 1974. Judgment under Uncertainty: Heuristics and Biases. 185 (1974). [31] Hans Van Eyghen. 2022. Cognitive Bias: Phylogenesis or Ontogenesis? Frontiers in Psychology 13 (2022). doi:10.3389/fpsyg.2022.892829 [32] Chien-Sheng Wu, Steven C.H. Hoi, Richard Socher, and Caiming Xiong. 2020. TOD-BERT: Pre-trained Natural Language Understanding for Task-Oriented Dialogue. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) . Association for Computational Linguistics, Online, 917–929. doi:10.18653/v1/2020.emnlp-main.66 [33] Qinyu Xiao, Emma Lam, Muhrajan Piara, and Gilad Feldman. 2021. Revisiting status quo bias: Replication of Samuelson and Zeckhauser (1988). Meta-Psychology 5 (Feb. 2021). doi:10.15626/MP.2020.2470 [34] Yusuke Yamamoto. 2024. Suggestive answers strategy in human-chatbot interaction: a route to engaged critical decision making. Frontiers in Psychology 15 (March 2024), 1382234. doi:10.3389/fpsyg.2024.1382234 [35] Xiaoying Zhang, Baolin Peng, Kun Li, Jingyan Zhou, and Helen Meng. 2023. SGP-TOD: Building Task Bots Effortlessly via Schema-Guided LLM Prompting. In Findings of the Association for Computational Linguistics: EMNLP 2023 . Association for Computational Linguistics, Singapore, 13348–13369. doi:10.18653/v1/2023.findings-emnlp.891 [36] Jeffrey Zhao, Yuan Cao, Raghav Gupta, Harrison Lee, Abhinav Rastogi, Mingqiu Wang, Hagen Soltau, Izhak Shafran, and Yonghui Wu. 2023. AnyTOD: A Programmable Task-Oriented Dialog System. http://arxiv.org/abs/2212.09939 arXiv:2212.09939 [cs]. Received 20 February 2007; revised 12 March 2009; accepted 5 June 2009 Manuscript submitted to ACM

---