Open AGI Codes | Your Codes Reflect!

Updates

Generating audio...

arxiv

Paper 2305.14909

Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning

Authors: Lin Guan, Karthik Valmeekam, Sarath Sreedharan, Subbarao Kambhampati

Published: 2023-05-24

Abstract:

There is a growing interest in applying pre-trained large language models (LLMs) to planning problems. However, methods that use LLMs directly as planners are currently impractical due to several factors, including limited correctness of plans, strong reliance on feedback from interactions with simulators or even the actual environment, and the inefficiency in utilizing human feedback. In this work, we introduce a novel alternative paradigm that constructs an explicit world (domain) model in planning domain definition language (PDDL) and then uses it to plan with sound domain-independent planners. To address the fact that LLMs may not generate a fully functional PDDL model initially, we employ LLMs as an interface between PDDL and sources of corrective feedback, such as PDDL validators and humans. For users who lack a background in PDDL, we show that LLMs can translate PDDL into natural language and effectively encode corrective feedback back to the underlying domain model. Our framework not only enjoys the correctness guarantee offered by the external planners but also reduces human involvement by allowing users to correct domain models at the beginning, rather than inspecting and correcting (through interactive prompting) every generated plan as in previous work. On two IPC domains and a Household domain that is more complicated than commonly used benchmarks such as ALFWorld, we demonstrate that GPT-4 can be leveraged to produce high-quality PDDL models for over 40 actions, and the corrected PDDL models are then used to successfully solve 48 challenging planning tasks. Resources, including the source code, are released at: https://guansuns.github.io/pages/llm-dm.

Paper Content:

Page 1: Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning Lin Guan∗ School of Computing & AI Arizona State University Tempe, AZ 85281 lguan9@asu.eduKarthik Valmeekam∗ School of Computing & AI Arizona State University Tempe, AZ 85281 kvalmeek@asu.edu Sarath Sreedharan Department of Computer Science Colorado State University Fort Collins, CO 80523 sarath.sreedharan@colostate.eduSubbarao Kambhampati School of Computing & AI Arizona State University Tempe, AZ 85281 rao@asu.edu Abstract There is a growing interest in applying pre-trained large language models (LLMs) to planning problems. However, methods that use LLMs directly as planners are currently impractical due to several factors, including limited correctness of plans, strong reliance on feedback from interactions with simulators or even the actual environment, and the inefficiency in utilizing human feedback. In this work, we introduce a novel alternative paradigm that constructs an explicit world (domain) model in planning domain definition language (PDDL) and then uses it to plan with sound domain-independent planners. To address the fact that LLMs may not generate a fully functional PDDL model initially, we employ LLMs as an interface between PDDL and sources of corrective feedback, such as PDDL validators and humans. For users who lack a background in PDDL, we show that LLMs can translate PDDL into natural language and effectively encode corrective feedback back to the underlying domain model. Our framework not only enjoys the correctness guarantee offered by the external planners but also reduces human involvement by allowing users to correct domain models at the beginning, rather than inspecting and correcting (through interactive prompting) every generated plan as in previous work. On two IPC domains and a Household domain that is more complicated than commonly used benchmarks such as ALFWorld, we demonstrate that GPT-4 can be leveraged to produce high-quality PDDL models for over 40 actions, and the corrected PDDL models are then used to successfully solve 48 challenging planning tasks. Resources, including the source code, are released at: https://guansuns.github.io/pages/llm-dm . 1 Introduction The field of artificial intelligence has been revolutionized with the advent of large pre-trained models. Of particular significance are transformer-based large language models (LLMs) which have showcased remarkable performance in natural language processing tasks. Along with these tasks, LLMs have been tested to perform another widely-studied crucial aspect of AI agents, namely, sequential decision- ∗Equal contribution 37th Conference on Neural Information Processing Systems (NeurIPS 2023).arXiv:2305.14909v2 [cs.AI] 2 Nov 2023 Page 2: making or planning. Preliminary studies suggest that, in some everyday domains, LLMs are capable of suggesting sensible action plans [ 19,1]. However, the correctness and executability of these plans are often limited. For instance, LLMs may regularly overlook the physical plausibility of actions in certain states and may not effectively handle long-term dependencies across multiple actions. Several approaches have been proposed to improve the planning capabilities of LLMs. One promising approach involves collecting feedback from the environment during plan execution and subsequently refining the plans. By incorporating various forms of feedback, such as sensory information [ 20], human corrections [ 60], or information of unmet preconditions [ 42,56], the planners can re-plan and produce plans that are closer to a satisficing plan. Despite the improvements in planning performance, LLMs are still far from being a usable and reliable planner due to various factors: (a)LLMs have not yet demonstrated sufficient capabilities in reasoning and planning [ 24,55, 53,54,31]. Recent investigations show that even when provided with detailed descriptions of actions, such as a PDDL domain model [ 33] or a natural-language version of a PDDL model, LLMs still struggle to produce correct and executable plans [48, 55]. (b)Existing LLMs-planning paradigms only allow for feedback collection in a fully online manner, meaning that the feedback signals are only available after the agent has started executing the plan. However, when a faithful simulator is not available or is expensive to use, collecting feedback through actual plan execution can be costly and may not fully exploit the advantages of provably sound planning, as seen in classical-planning literature [ 11,13]. (c)LLMs exhibit complex behaviors that are not yet fully understood, particularly with respect to error occurrences. LLM planners are prone to repeating the same mistakes in slightly different scenarios. Repeatedly providing the same feedback can lead to frustration for end users. To overcome these limitations, rather than using LLMs directly as planners, we advocate a model- based paradigm, wherein a PDDL world model is teased out of LLMs. We follow the identical problem setup as existing approaches, which involves providing the planner with a set of actions and their brief natural language descriptions. However, instead of directly mapping user commands to plans, we utilize LLMs to extract a symbolic representation of the actions in the form of PDDL action models. This intermediate output can be used with an external domain-independent planner to reliably search for feasible plans, or it can be used to validate and correct "heuristic" plans generated by an LLM planner. Additionally, our modular method essentially divides the planning process into two distinct parts, namely modeling the causal dependencies of actions and determining the appropriate sequence of actions to accomplish the goals. LLMs, which have been trained on extensive web-scale knowledge, exhibit greater proficiency in the former task rather than the latter. Nevertheless, we still take into account the fact that the LLMs may not be able to generate error-free PDDL models at the outset. To address this, we show that LLMs can also serve as an interface between PDDL and any feedback sources that can provide corrective feedback in natural language, such as humans and the PDDL validator in V AL [ 18]. The LLM middle layer translates PDDL representation to natural language and presents it to users for inspection. The acquired feedback is then incorporated and archived back to the PDDL models. This conceals the complexity of PDDL from users who do not have prior knowledge of PDDL, and enables seamless inclusion of feedback. We conducted an extensive evaluation of our methodology on two IPC domains [ 22] from classical planning literature and a household domain that has a more diverse set of actions and constraints than commonly used benchmarks such as ALFWORLD [ 47]. We assess the quality of the generated PDDL models through manual evaluation. Results show that GPT-4 [ 37] generates high-quality PDDL domain models with over 400 literals for 41 actions in total. Then, by replaying and continuing the PDDL-construction dialogue, we show that GPT-4 can readily correct all the errors according to natural language feedback from PDDL validators and humans. We consider two use cases of the generated PDDL action models for downstream planning tasks. For one, by utilizing an LLM to translate user instructions into goal specifications in PDDL [ 58,30], we can use any standard domain-independent planner to search for a plan. On the other hand, the extracted PDDL model can be used to validate plans suggested by an LLM planner and to provide corrective feedback in the form of unmet preconditions or goal conditions. In this case, the PDDL model is essentially serving as an inexpensive high-level simulator or a human proxy to ensure plan correctness. 2 Page 3: This reduces the reliance on faithful simulators or extensive manual inspection of plans by domain experts. Compared to the first approach, the second approach potentially offers better flexibility in incorporating both explicit and implicit user constraints in common-sense domains because of the LLM planner. For instance, the LLM planner can directly incorporate ordering constraints such as "heat the potato first before mashing it" and "bring me a fork first, then a plate." On the contrary, an approach purely based on classical planners would require extra steps, such as introducing extra state variables in the PDDL models, in order to accommodate such constraints. However, as demonstrated in our experiments, although the validation feedback significantly improves the plan correctness on average, the performance of the second approach is still limited by the "planning capability" of LLMs. 2 Related Work LLMs and planning. The growing interest in evaluating the emergent abilities of LLMs paved way into exploring their abilities in sequential decision-making tasks. Preliminary studies [ 24,55] have shown that off-the-shelf LLMs are currently incapable of producing accurate plans. But their plans can be used as heuristics or seeds to either an external planner or a human in the loop [ 55,48]. SayCan [ 1] and Text2Motion [ 29] employ an LLM as a heuristic by utilizing it to score high-level actions, followed by a low-level planner that grounds these actions to determine the executability in the physical world. In a similar vein, [ 28,50] use LLMs to generate plans represented in Python-style code. Other works have aimed to improve the planning performance of LLMs through prompt engineering [ 60] or collecting various forms of feedback such as sensory information [ 51,20,34], human corrections [60], self-corrections [46] or information of unmet preconditions [42, 56]. Training transformers for sequential decision-making tasks. Along with using off-the-shelf LLMs, there are works that either fine-tune LLMs [ 55,38] or train sequence models [ 62,27,7,43] for sequential decision making tasks. Experiments in [ 26] have shown that training sequence models on a specific task gives rise to an internal world representation within the model. In this work, we use off-the-shelf LLMs to construct symbolic world models without performing any extra training. Learning/acquiring symbolic domain models. In classical planning, the community has explored numerous learning-based methods [ 59,61,9,25,4] and interactive editor-based methods [ 49] for acquiring symbolic domain models. For a more comprehensive survey, we refer the reader to [ 2,6]. Here, we are interested in leveraging the common-world knowledge embedded in LLMs and their in-context learning ability for constructing domain models. Recent studies have shown the efficacy of LLMs in translating natural language to formal descriptions [ 35] or constructing PDDL goals from natural-language instructions [ 58,32]. Moreover, a contemporary work [ 15] considers the use of LLM as a parametric world model and plan critic. However, unlike a symbolic model that can simulate plan outcomes with guaranteed correctness, using LLMs directly as a world model actually adds another layer of errors. There is evidence that autoregressive models lack reliable capacity for reasoning about action effects [3, 31] and capturing errors in candidate plans [53, 54]. Language models with access to external tools. Since LLMs are approximately omniscient, they may not always outperform specialized models or tools in specific downstream tasks. To address this limitation, frameworks have been developed to enable LLMs to utilize external tools for performing sub-tasks like arithmetic [ 45] and logical reasoning [ 39,57]. In this context, our work can be regarded as an exercise in employing external sound planners to augment the capacity of LLMs for more reliable plan generation. 3 Problem Setting and Background Our work focuses on a scenario where an intelligent agent receives high-level instructions or tasks, denoted as i, from a user. The agent is capable of only executing skills or operations that are part of a skill library Π, where each skill khas a short language description lk. We assume that the agent is equipped with the low-level control policies corresponding to these high-level skills. In order to achieve the goal conditions specified in i, a planner, which can be either an LLM or an external planner [ 16,12,17], needs to come up with a sequence of high-level skills that the agent can execute. This type of problem is referred to as a sequential decision-making or planning problem. Similar to previous works such as [ 60,20], we also allow for human-in-the-loop feedback during both the domain-model construction and plan execution stages. In the next subsections, we describe the formalism behind planning problems and a standard way in the literature to specify them. 3 Page 4: 3.1 Classical planning problems The most fundamental planning formalism is goal-directed deterministic planning problem, referred to as a classical planning problem in the planning literature. A classical planning problem [ 44] can be formally represented with a tuple P=⟨D,I,G⟩.Dis referred to as the domain, Iis the initial state, and Gis the goal specification. The state space of a planning problem consists of the truth assignments for predicates. The domain Dis further defined by the tuple D=⟨F,A⟩.F corresponds to the set of fluents, i.e., the state variables used to define the state space with each fluent corresponding to a predicate with some arity. Acorresponds to the set of actions that can be performed. Each action ai[V]∈ A (where Vis the set of variables used by the operator aiand each variable could be mapped to an object) can be further defined by two components, the precondition prec[V]which describes when an action can be executed, and the effects eff[V]which defines what happens when an action is executed. We assume that prec[V]consists of a set of predicates defined over the variables V. An action is assumed to be executable only if its preconditions are met, i.e, the predicates in the precondition hold in the given state. The effect set eff[V]is further defined by the tuple⟨add[V],del[V]⟩, where add[V]is the set of predicates that will be set true by the action and del[V]is the set of predicates that will be set false by the action. An action is said to be grounded if we replace each of the variables with an object, else it is referred to as a lifted action model. A solution to a planning problem is called a plan, and it is a sequence of actions that once executed in the initial state would lead to a state where the goal specification holds. Classical planning problems are one of the simpler classes in planning and there are multiple extensions with more complex forms of preconditions, conditional effects, and also support for richer planning formalisms. 3.2 PDDL Planning Definition and Domain Language (PDDL) [ 33], is the standard encoding language for classical planning problems. Here is an example of a lifted action in PDDL which corresponds to putting a block onto the table in the classical Blocksworld domain: (:action PutDownBlock :parameters (?x - block) :precondition (and (robot-holding ?x)) :effect (and (not (robot-holding ?x)) (block-clear ?x) (robot-hand-empty) (block-on-table ?x))) The parameters line provides the possible variable(s), and in this case, ?xrepresents the block to put down. The precondition states that the robot must be holding the block in its gripper. The effects line describes the expected outcome of this action. 4 Methodology PDDL provides a succinct and standardized way to represent a world model. Once a PDDL model is constructed, it can be seamlessly used by any domain-independent planner developed in the automated planning community to search for a plan given the initial state and goal conditions. In this section, we will introduce our solution for constructing PDDL models using LLMs. We then discuss techniques for correcting errors in the generated PDDL models. Finally, we present the full pipeline for utilizing the generated PDDL models to solve planning problems. 4.1 Constructing PDDL models with LLMs Our approach involves prompting pre-trained LLMs with the following information: (a) detailed instructions for the PDDL generation task, outlining components of upcoming inputs and desired outputs; (b) one or two examples from other domains (e.g., the classical Blocksworld domain) for illustrating the input and output formats; (c) a description of the current domain, including contextual information about the agent’s tasks and physical constraints due to the specific embodiment of the agent; (d) a description of the agent’s action; and (e) a dynamically updated list of predicates that the LLM can reuse to maintain consistent use of symbols across multiple actions. Note that the predicate list is initialized to an empty list , and thus all predicates are introduced by the LLM. The structure of the prompt is illustrated in Fig. 2, and a complete prompt for the household-robot domain can be found at Appx. A.6.1. 4 Page 5: Figure 1: An overview of our framework and existing methods that use LLMs directly as planners. Depending on the information included in the action description or the domain context, users may gain varying levels of control over the extracted PDDL or receive differing levels of support from the LLMs. On one hand, when the user provides only a minimal description of the action, such as "this action enables the robot to use a microwave to heat food," we not only use the LLM as a PDDL constructor but also leverage the common world knowledge encoded within the model for knowledge acquisition. This is particularly useful when expanding the set of actions for an AI agent. For example, a robot engineer could set up a training environment for skill learning by following the suggested preconditions and effects. On the other hand, when some preconditions or effects are explicitly mentioned in the prompt, we rely more on the LLM’s ability to parse the knowledge provided in natural language and to precisely represent it by devising a collection of predicates. This capability is useful when there could be different initial setups of a skill, and the engineers have already made some assumptions on the preconditions at the time of designing the skill. This capability is also crucial when constructing PDDL for specialized domains. For instance, robots such as Fetch and Spot Robot have only one robot arm, which is less flexible than a human arm, and are therefore subject to many uncommon physical constraints. The desired output comprises the following elements: (a) the list of arguments for the action; (b) the preconditions and effects expressed in PDDL; and (c) a list of any newly defined predicates and their descriptions in natural language, if applicable. An example output is shown in Fig. 2. Our algorithm generates PDDL models for each action separately, one at a time, by iterating over the set of actions. Any newly defined predicates will be added to an actively maintained predicate list, such that the LLM can reuse existing predicates in subsequent actions without creating redundant ones. Once we obtain the initial PDDL models and the full predicate list, we repeat the entire process but with all of the extracted predicates presented to the LLM. Running the generation process twice is useful because the LLMs may be unaware of some precondition(s) during the first iteration, especially if the precondition(s) are not explicitly mentioned. For instance, the LLM may overlook the fact that a furniture piece can be openable, but a predicate created in the "open a furniture piece or appliance" skill can inform the LLM of this fact. One alternative to this action-by-action generation could be to include descriptions of all the actions in the prompt and require the LLM to construct the entire domain model in a single dialogue. An additional discussion on this can be found at Sec. A.2 in Appendix. It is worth noting that every time a new predicate is defined, the LLM is required to give the natural language description of it. As we will see in the following sections, this is crucial for enabling any user to easily understand and inspect the generated PDDL models without having to delve into the low-level symbolic representation. Additionally, natural language descriptions allow the predicate values of the initial state to be automatically grounded by using LLMs to translate environment description in natural language to PDDL [ 30], or leveraging pre-trained vision-language models 5 Page 6: Instructions for the PDDL generation task You are defining the preconditions and effects (represented in PDDL format) of an AI agent 's actions. Information about the AI agent will be provided in the domain description ... ,→ One or two examples from other domains for illustrating the input and output formats Here are two examples from the classical BlocksWorld domain for demonstrating the output format. ... Here is the task. A natural language description of the domain Domain information: The AI agent here is a household robot that can navigate to various large and normally immovable furniture pieces or appliances in the house to carry out household tasks ...,→ ,→ A natural language description of the action Action: This action enables the robot to toggle small appliances (like humidifiers and light bulbs) which are toggleable to switch them on ... ,→ The dynamically updated list of predicates You can create and define new predicates, but you may also reuse the following predicates: 1. (robot-at ?r - robot ?f - furnitureAppliance): true if the robot ?r is at the furniture or appliance ?f ,→ 2. (object-in-on ?o - householdObject ?f - furnitureAppliance): true if the object ?o is in or on the furniture or appliance ?f ,→ ... Parameters: --------------------------------------- The LLM: ... 2. ?o - householdObject: the small appliance to be toggled on ... Preconditions: (and ... (not (appliance-on ?o)) ) Effects: (and (appliance-on ?o) ) New Predicates: 1. (appliance-on ?o - householdObject): true if the small appliance ?o is switched on Figure 2: The prompt template for PDDL construction and an example of the LLM output for the household domain. [41,37,10] and querying them in a question-answering manner, based on observations from the environment. 4.2 Correcting errors in the initial PDDL models As with any use case involving LLMs, there is no guarantee that the output is completely error-free. Therefore, it is essential to incorporate error-correction mechanisms. While it may be easy for PDDL experts to directly inspect and correct the generated PDDL models, we cannot assume that all end users possess this level of expertise. Our solution is to use the LLM as a middle layer or interface between the underlying PDDL model and any feedback source that can provide corrective feedback in natural language. We consider two feedback sources in this work, namely the PDDL model validation tools (e.g., the one in V AL [ 18]) and human domain experts. The former is used to detect basic syntax errors, while the latter is mainly responsible for catching factual errors, such as missing effects. It is worth noting that the feedback sources are not limited to those mentioned above, and we leave the investigation of other sources for future research. For corrective feedback from PDDL validators, a generated PDDL model is directly presented to the validator to obtain brief but readable error messages. Examples of feedback messages for syntax errors are shown in Appx. A.3. For corrective feedback from users, a PDDL model is translated into its natural-language version based on the natural language descriptions of the predicates and parameters (Sec. 4.1). The user can then examine potentially erroneous action models. Human corrections can occur both during the construction of PDDL models and after the models have been used for planning. Although there are techniques available to assist users to locate errors in the models (as discussed in 6 Page 7: Appx. A.4), this is beyond the scope of this work, since the focus here is to investigate the feasibility of using LLMs to correct PDDL models based on feedback. We also note that correcting action models is not more cognitively demanding than correcting plans or the "reasoning traces" of an LLM planner [ 60]. In fact, when correcting plans, humans must also maintain the action models and their causal chains in mind in order to validate the plans. More importantly, once the action models are corrected, users no longer need to provide similar feedback repeatedly. Finally, corrective feedback is integrated by replaying and continuing the PDDL-construction dialogue. Examples of such dialogues can be found in Sec. A.7, Sec. A.9, and Sec. A.11 in Appendix. 4.3 Generating plans with the extracted PDDL models Recall that given the set of extracted predicates and their natural language descriptions, we can get the grounded initial state by using LLMs to translate descriptions of the environment to PDDL, or by observing the environment and querying pre-trained vision-language models. Besides, the goal specification can be obtained by using an LLM to parse the user’s command and convert it into a symbolic form, as done previously in [ 30,58,32]. With this setup, the following two methods can be used to generate the final plans. Classical planner with LLM-acquired PDDL model. One straightforward approach is to employ a standard domain-independent planner to reliably find a satisficing or even optimal plan for the specified goal. In common-sense domains where LLMs may generate meaningful "heuristics", the LLM plans may also be used as seed plans for a local-search planner such as LPG [ 12] to accelerate the plan searching. This is similar to the approach suggested in [ 55], but with a higher degree of automation. LLM modulo planner backprompted by V AL using LLM-acquired PDDL model. As outlined in Sec. 1, we can also use the extracted PDDL as a symbolic simulator or human proxy to provide corrective feedback based on validation information to an LLM planner. With this setup, the planner can iteratively refine the plans through re-prompting [42]. It is worth noting that depending on the specific problem settings, the extracted PDDL model can also be used for tasks other than task planning. For instance, in cases where reinforcement learning is permissible, the domain model can be used to guide skill learning [ 21,8] or exploration even if the model is not fully situated [14]. 5 Empirical Evaluation We conduct our experiments2on an everyday household-robot domain and two more specialized IPC domains (i.e., Tyreworld and Logistics). The Household domain is similar to other commonly used benchmarks like ALFWORLD [ 47] and VirtualHome [ 40]. However, in our household domain, a single-arm robot is equipped with a more diverse and extended set of 22 mobile and manipulation skills. In addition, we apply more rigorous physical-plausibility constraints to each skill. A detailed description of this domain can be found at Appx. A.5. In our experiments, we first evaluate the quality of PDDL models generated by the LLMs. Next, we assess the ability of the LLMs to incorporate corrective feedback from both PDDL validators and users in order to obtain error-free PDDL models. Lastly, we showcase multiple ways to use the corrected PDDL model for downstream planning tasks. We present the results of GPT-4 [ 37] and GPT-3.5-Turbo [ 36] for PDDL construction (we also conducted experiments with GPT-3 [ 5], and observe that its performance is comparable to that of GPT-3.5-Turbo). 5.1 Constructing PDDL In PDDL construction tasks, we aim to investigate the extent to which LLMs can construct accurate PDDL models before getting corrective feedback from domain experts. For all the domains, two actions from the classical Blocksworld domain are used as demonstrations in the prompt so that the end user is not required to come up with any domain-specific example. To evaluate the degree of correctness, we recruit multiple graduate students who possess expertise in PDDL. These experts are responsible for annotating and correcting any errors present in the generated PDDL models. As an evaluation metric, we count and report the total number of annotations, which may include the removal of irrelevant preconditions, the addition of missing preconditions, the replacement of incorrect predicates, the inclusion of missing parameters, and other commonly made corrections. Note 2Experiments were done in May 2023. 7 Page 8: that the number of annotations can be viewed as the approximate distance between a generated PDDL model and its corrected version. In order to provide the reader with a comprehensive understanding of the quality of the generated models, we also list all the models and collected annotations in Appendix. In each of the figures, errors that affect the functionality of the PDDL model are highlighted in yellow, while minor issues are highlighted in green. One example of a minor issue is the redundant inclusion of(pickupable ?o) in preconditions when (robot-holding ?o) has already been listed. The former is unnecessary because it can be implied by the latter, but this only affects conciseness rather than functionality. Domain # of actions # of params and literals # of GPT-4 errors # of GPT-3.5-Turbo errors Household 22 271 53 218+ Logistics 6 54 2 38 Tyreworld 13 108 4 94+ Table 1: The number of errors in the domain models produced by the LLMs for each of the domains. A "+" mark indicates that the generated model is excessively noisy, making it challenging to determine an exact number of errors. We first evaluate the PDDL models generated when partial constraint information is given, as this is closer to most of the practical use cases where constraints on skills in the library Πare often pre-specified. In this setting, our evaluation focuses on the LLMs’ ability to accurately recover a "ground truth PDDL" that captures the mentioned constraints and underlying dependencies among skills. Our results indicate that GPT-4 can produce high-quality PDDL models with significantly fewer errors when compared to GPT-3.5-Turbo. Table 1 presents the number of errors in the generated domain models for each domain. To help the readers understand the complexities of the action models, we additionally report the total number of parameters and literals in the final corrected domain models produced by GPT-4. Out of the total 59 errors made by GPT-4, three of them are syntax errors and the rest are factual errors such as missing preconditions and effects. This observation suggests that while GPT-4 demonstrates proficiency in adhering to the grammar of PDDL, it may still have an inaccurate understanding of the actions. By examining the set of predicates (listed in the Appendix), we also find that GPT-4 can devise a set of intuitively-named predicates that can concisely and precisely describe the states of objects and events in the domain. In contrast, GPT-3.5-Turbo produces highly noisy outputs with over 350 errors. This suggests that our framework relies heavily on GPT-4’s improved capability in understanding symbols, and future work may investigate how to enable the use of more lightweight models (e.g., by fine-tuning on some PDDL datasets). Furthermore, recall that when the action description contains minimal information, LLMs could also be utilized to propose preconditions and effects to assist with knowledge acquisition. To verify this hypothesis, we conduct additional experiments on the Household domain that can have a more open-ended action design. In this setting, the correctness of the action models is determined based on whether the preconditions and effects establish correct connections among the actions. Our results show that GPT-4 can suggest meaningful action models, and the generated PDDL models have only around 45 errors. Although GPT-4 has shown improved performance in the PDDL construction task, our experiments still uncover some limitations. Firstly, GPT-4 still exhibits a shallow understanding of the causal relationships between actions, particularly when it comes to tasks involving reasoning skills such as spatial reasoning. For instance, when constructing the model of action "pick up an object from a furniture piece," GPT-4 fails to consider that there could be other objects stacked on top of the target object, even if relevant predicates are provided (which were created in the action "stack objects"). In addition, although it occurs rarely, GPT-4 may output contradictory effects. For instance, in the action of mashing food with a blender, GPT-4 lists both (not (object-in-receptacle ...)) and(object-in-receptacle ...) as effects at the same time. 5.2 Correcting PDDL with domain experts We proceed with the PDDL models generated by GPT-4 when the constraint information is partially given. Our objective is to demonstrate the feasibility of using GPT-4 as a middle layer to incorporate natural-language feedback and correct the PDDL models. As discussed in Sec. 4.2, we use PDDL validators to capture basic syntax errors. In the Household domain, there are two syntax errors 8 Page 9: associated with improper usage of relevant predicates due to issues with the object types of parameters 3. As shown in Appx. A.7.1, by continuing the PDDL-construction dialogue with a feedback message " the second parameter of object-on should be a furnitureAppliance but a householdObject was given ," GPT-4 can locate the inaccurate PDDL snippet and replace it with a correct one. For the other factual errors, GPT-4 successfully corrects all of them based on the natural language feedback. An example feedback message on factual errors is " there is a missing effect: the item is no longer pickupable after being mashed ." More PDDL-correction conversations can be found in Appendix. We also experiment with feedback written in various ways, and GPT-4 is able to understand all the messages and successfully correct the models. To quantify how effectively GPT-4 utilizes feedback from domain experts, we count the number of feedback messages concerning factual errors. Our result shows that GPT-4 required 59 feedback messages to address a total of 56 factual errors. There are three instances where additional feedback was needed. One case involved the user reiterating the error, while the other two cases involved GPT-4 introducing new errors. Furthermore, we attempt to correct the same errors using GPT-3.5-Turbo. Results show that GPT-3.5-Turbo not only fails to correct all the errors but also occasionally introduces new errors, again confirming its lack of ability to manipulate symbols. Some examples can be found in Appendix starting from Sec. A.7.3. 5.3 Generating plans with the extracted PDDL models For planning tasks (i.e., user instructions and initial states), we use the Household domain and Logistics domain, where state-of-the-art LLM planners struggle to find valid plans. We sampled 27 tasks for Household and 21 for Logistics. For the initial states, we assume the grounding is provided, and for the goals, we leverage GPT-4 to translate user instructions into PDDL goal specifications in terms of the extracted predicates (an example prompt can be found at Appx. A.13), and send it over to a standard STRIPS planner which already has access to the domain model acquired through LLMs . With this setup, a classical planner Fast Downward [ 16] can effectively find valid plans in 95% of the cases (the failures were only due to goal translation errors). Note that in contrast to earlier methods such as [ 30] that use LLMs only as a mechanism for translating user goals to PDDL format, and throw that over to external sound planners with hand-crafted correct PDDL domain models, our approach uses LLMs themselves to develop the PDDL world model driving the external planner. Planner Type Household Logistics Only LLM Planner 15% 0% Fast Downward with LLM-acquired PDDL model95% 100% LLM backprompted by V AL using LLM-acquired PDDL model48% 33% Table 2: Success rates of different planning approaches in the Household domain and the Logistics domain.On the other hand, for the approach that utilizes PDDL models to validate LLM plans (i.e., LLM modulo planner back-prompted by V AL using LLM- acquired domain model), we employ the state-of-the-art algorithm ReAct [60] with GPT-4 as the underlying LLM planner. However, we made two modifications to the prompt design. Firstly, we provide a detailed descrip- tion of all actions in natural language, including parameters, preconditions, and effects. These descriptions are ob- tained by using another LLM to translate the generated PDDL domain model into natural language. Secondly, we use only two fixed examples for each domain because end users might not always be able to provide a large pool of examples, and the planner should rely on the action model information. The LLM plans, symbolic goal specifications, initial states and domain models are passed to a plan validation system (i.e., V AL) to check for unmet precondition(s) or goal condition(s). The validation results (given in PDDL) are then translated into natural language with GPT-4 and provided to the LLM planner by continuing the planning dialogue (see Appx. A.12.1 for examples). In our experiments, we limit the number of feedbacks per task to 8 due to the restricted access to GPT-4. Table 2 provides a summary of the average success rates of all approaches. Not surprisingly, the vanilla LLM planner constantly overlooks action preconditions and achieves an extremely low success rate. With the integration of validation feedback, we observe a notable improvement in plan correctness. Despite this improvement, the overall performance is still not satisfactory, as the success rate remains below 3At present, our approach only permits a fixed set of object types that are specified in the prompt. Future extensions may explore ways of enabling LLMs to create object-type hierarchy or expand the set of object types as needed. 9 Page 10: 50%. Furthermore, we have observed that GPT-4 fails to effectively utilize the feedback, often getting stuck in a loop by repeatedly generating the same plan. In some cases, it may also introduce new errors while attempting to rectify the plans. Beyond the notion of correctness, the experiments also uncover intriguing properties of the LLM planner. In the Household domain, we intentionally introduce ordering constraints in some instructions that cannot be expressed using existing predicates (refer to Appx. A.12 for examples). Remarkably, upon manual examination of the generated plans, we observe that all LLM plans adhere to the specified ordering, despite not being entirely correct or executable. Furthermore, also in the Household domain, we observe that classical planners occasionally generate physically plausible but unconventional actions, such as placing a knife on a toaster when the knife is not being used. In contrast, the LLM planner rarely exhibits such actions, suggesting that LLMs possess knowledge of implicit human preferences. It would be meaningful to explore methods that more effectively combine the strengths of LLM planners and the correctness guarantee provided by symbolic domain models, particularly in determining which information from LLM plans should be preserved. 6 Conclusion We introduce a new paradigm for leveraging LLMs in planning tasks, which involves maintaining an explicit world model instead of directly mapping user prompts to plans. This is motivated by the insight that LLMs, while incapable of the combinatorial search needed to produce correct plans, may be better suited as the source of world models. We present a complete pipeline that begins with generating high-quality PDDL models using GPT-4, then corrects the PDDL models with natural- language feedback, and finally utilizes the extracted domain models to reliably plan in multiple ways. Our experiments demonstrate that pairing LLMs with an external planner significantly outperforms existing methods when applied to two IPC domains and a household-robot domain that has more action-wise constraints than commonly used benchmarks such as ALFWorld. Apart from directions for further research that we have previously mentioned, there are several exciting opportunities for extending this work. Firstly, the complexity of our evaluation domains is still lower than that of many domains used in the classical planning literature. It remains to be seen whether LLMs can effectively scale to write PDDL models that express more intricate logic. Secondly, our framework assumes full observability, meaning that the agent must fully explore the environment to acquire object states at the beginning. It would be useful to support partial observability. Finally, our experiments assume the grounding of predicate values is done perfectly. However, it would be useful to take into account that perception can be noisy in practice. Acknowledgement This research was supported by ONR grants N00014-18-1-2442, N00014-18-1-2840, N00014- 19-1-2119 and N00014-23-1-2409, AFOSR grant FA9550-18-1-0067, DARPA SAIL-ON grant W911NF-19-2-0006, and a JP Morgan AI Faculty Research Grant to Kambhampati. Sreedharan was supported in part by NSF grant 2303019. References [1]Michael Ahn, Anthony Brohan, Noah Brown, Yevgen Chebotar, Omar Cortes, Byron David, Chelsea Finn, Keerthana Gopalakrishnan, Karol Hausman, Alex Herzog, et al. Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691 , 2022. [2]Ankuj Arora, Humbert Fiorino, Damien Pellier, Marc Métivier, and Sylvie Pesty. A review of learning planning action models. The Knowledge Engineering Review , 33:e20, 2018. [3] Pratyay Banerjee, Chitta Baral, Man Luo, Arindam Mitra, Kuntal Pal, Tran C Son, and Neeraj Varshney. Can transformers reason about effects of actions? arXiv preprint arXiv:2012.09938 , 2020. [4]Blai Bonet and Hector Geffner. Learning first-order symbolic representations for planning from the structure of the state space. arXiv preprint arXiv:1909.05546 , 2019. [5]Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems , 33:1877–1901, 2020. 10 Page 11: [6]Ethan Callanan, Rebecca De Venezia, Victoria Armstrong, Alison Paredes, Tathagata Chakraborti, and Christian Muise. Macq: A holistic view of model acquisition techniques. In The ICAPS Workshop on Knowledge Engineering for Planning and Scheduling (KEPS) , 2022. [7]Hongyi Chen, Yilun Du, Yiye Chen, Joshua B. Tenenbaum, and Patricio A. Vela. Planning with sequence models through iterative energy minimization. In The Eleventh International Conference on Learning Representations , 2023. [8]Shuo Cheng and Danfei Xu. Guided skill learning and abstraction for long-horizon manipulation. arXiv preprint arXiv:2210.12631 , 2022. [9]Stephen N Cresswell, Thomas L McCluskey, and Margaret M West. Acquiring planning domain models using locm. The Knowledge Engineering Review , 28(2):195–213, 2013. [10] Danny Driess, Fei Xia, Mehdi SM Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, et al. Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378 , 2023. [11] Richard E Fikes and Nils J Nilsson. Strips: A new approach to the application of theorem proving to problem solving. Artificial intelligence , 2(3-4):189–208, 1971. [12] Alfonso Gerevini and Ivan Serina. Lpg: A planner based on local search for planning graphs with action costs. In AIPS , volume 2, pages 281–290, 2002. [13] Malik Ghallab, Dana Nau, and Paolo Traverso. Automated Planning: theory and practice . Elsevier, 2004. [14] Lin Guan, Sarath Sreedharan, and Subbarao Kambhampati. Leveraging approximate symbolic models for reinforcement learning via skill diversity. In International Conference on Machine Learning , pages 7949–7967. PMLR, 2022. [15] Shibo Hao, Yi Gu, Haodi Ma, Joshua Jiahua Hong, Zhen Wang, Daisy Zhe Wang, and Zhit- ing Hu. Reasoning with language model is planning with world model. arXiv preprint arXiv:2305.14992 , 2023. [16] Malte Helmert. The fast downward planning system. Journal of Artificial Intelligence Research , 26:191–246, 2006. [17] Jörg Hoffmann and Bernhard Nebel. The ff planning system: Fast plan generation through heuristic search. Journal of Artificial Intelligence Research , 14:253–302, 2001. [18] Richard Howey, Derek Long, and Maria Fox. Val: Automatic plan validation, continuous effects and mixed initiative planning using pddl. In 16th IEEE International Conference on Tools with Artificial Intelligence , pages 294–301. IEEE, 2004. [19] Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning , pages 9118–9147. PMLR, 2022. [20] Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, et al. Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608 , 2022. [21] León Illanes, Xi Yan, Rodrigo Toro Icarte, and Sheila A McIlraith. Symbolic plans as high-level instructions for reinforcement learning. In Proceedings of the international conference on automated planning and scheduling , volume 30, pages 540–550, 2020. [22] IPC. International planning competition, 1998. [23] Subbarao Kambhampati, Sarath Sreedharan, Mudit Verma, Yantian Zha, and Lin Guan. Symbols as a lingua franca for bridging human-ai chasm for explainable and advisable ai systems. In Proceedings of the AAAI Conference on Artificial Intelligence , volume 36, pages 12262–12267, 2022. 11 Page 12: [24] Subbarao Kambhampati, Karthik Valmeekam, Matthew Marquez, and Lin Guan. On the role of large language models in planning, July 2023. Tutorial presented at the International Conference on Automated Planning and Scheduling (ICAPS), Prague. https://yochan-lab.github. io/tutorial/ICAPS-2023/ . [25] George Konidaris, Leslie Pack Kaelbling, and Tomas Lozano-Perez. From skills to sym- bols: Learning symbolic representations for abstract high-level planning. Journal of Artificial Intelligence Research , 61:215–289, 2018. [26] Kenneth Li, Aspen K Hopkins, David Bau, Fernanda Viégas, Hanspeter Pfister, and Martin Wattenberg. Emergent world representations: Exploring a sequence model trained on a synthetic task. In The Eleventh International Conference on Learning Representations , 2023. [27] Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar, et al. Pre-trained language models for interactive decision-making. Advances in Neural Information Processing Systems , 35:31199–31212, 2022. [28] Jacky Liang, Wenlong Huang, Fei Xia, Peng Xu, Karol Hausman, Brian Ichter, Pete Florence, and Andy Zeng. Code as policies: Language model programs for embodied control. arXiv preprint arXiv:2209.07753 , 2022. [29] Kevin Lin, Christopher Agia, Toki Migimatsu, Marco Pavone, and Jeannette Bohg. Text2motion: From natural language instructions to feasible plans. arXiv preprint arXiv:2303.12153 , 2023. [30] Bo Liu, Yuqian Jiang, Xiaohan Zhang, Qiang Liu, Shiqi Zhang, Joydeep Biswas, and Peter Stone. Llm+ p: Empowering large language models with optimal planning proficiency. arXiv preprint arXiv:2304.11477 , 2023. [31] Man Luo, Shrinidhi Kumbhar, Mihir Parmar, Neeraj Varshney, Pratyay Banerjee, Somak Aditya, Chitta Baral, et al. Towards logiglue: A brief survey and a benchmark for analyzing logical reasoning capabilities of language models. arXiv preprint arXiv:2310.00836 , 2023. [32] Qing Lyu, Shreya Havaldar, Adam Stein, Li Zhang, Delip Rao, Eric Wong, Marianna Apid- ianaki, and Chris Callison-Burch. Faithful chain-of-thought reasoning. arXiv preprint arXiv:2301.13379 , 2023. [33] Drew McDermott, Malik Ghallab, Adele E. Howe, Craig A. Knoblock, Ashwin Ram, Manuela M. Veloso, Daniel S. Weld, and David E. Wilkins. Pddl-the planning domain definition language. 1998. [34] Kolby Nottingham, Prithviraj Ammanabrolu, Alane Suhr, Yejin Choi, Hannaneh Hajishirzi, Sameer Singh, and Roy Fox. Do embodied agents dream of pixelated sheep?: Embodied decision making using language guided world modelling. arXiv preprint arXiv:2301.12050 , 2023. [35] Alberto Olmo, Sarath Sreedharan, and Subbarao Kambhampati. Gpt3-to-plan: Extracting plans from text using gpt-3. arXiv preprint arXiv:2106.07131 , 2021. [36] OpenAI. Introducing chatgpt by openai, 2022. [37] OpenAI. Gpt-4 technical report, 2023. [38] Vishal Pallagani, Bharath Muppasani, Keerthiram Murugesan, Francesca Rossi, Lior Horesh, Biplav Srivastava, Francesco Fabiano, and Andrea Loreggia. Plansformer: Generating symbolic plans using transformers. arXiv preprint arXiv:2212.08681 , 2022. [39] Liangming Pan, Alon Albalak, Xinyi Wang, and William Yang Wang. Logic-lm: Empowering large language models with symbolic solvers for faithful logical reasoning. arXiv preprint arXiv:2305.12295 , 2023. [40] Xavier Puig, Kevin Ra, Marko Boben, Jiaman Li, Tingwu Wang, Sanja Fidler, and Antonio Torralba. Virtualhome: Simulating household activities via programs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , pages 8494–8502, 2018. 12 Page 13: [41] Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning , pages 8748–8763. PMLR, 2021. [42] Shreyas Sundara Raman, Vanya Cohen, Eric Rosen, Ifrah Idrees, David Paulius, and Stefanie Tellex. Planning with large language models via corrective re-prompting. arXiv preprint arXiv:2211.09935 , 2022. [43] Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo, Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky, Jackie Kay, Jost Tobias Springenberg, et al. A generalist agent. arXiv preprint arXiv:2205.06175 , 2022. [44] Stuart Jonathan Russell. Norvig (2003). Artificial intelligence: a modern approach , 25:26, 2003. [45] Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettle- moyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761 , 2023. [46] Noah Shinn, Beck Labash, and Ashwin Gopinath. Reflexion: an autonomous agent with dynamic memory and self-reflection. arXiv preprint arXiv:2303.11366 , 2023. [47] Mohit Shridhar, Xingdi Yuan, Marc-Alexandre Côté, Yonatan Bisk, Adam Trischler, and Matthew Hausknecht. ALFWorld: Aligning Text and Embodied Environments for Interactive Learning. In Proceedings of the International Conference on Learning Representations (ICLR) , 2021. [48] Tom Silver, Varun Hariprasad, Reece S Shuttleworth, Nishanth Kumar, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Pddl planning with pretrained large language models. In NeurIPS 2022 Foundation Models for Decision Making Workshop , 2022. [49] Ron M Simpson, Diane E Kitchin, and Thomas Leo McCluskey. Planning domain definition using gipo. The Knowledge Engineering Review , 22(2):117–134, 2007. [50] Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, and Animesh Garg. Progprompt: Generating situated robot task plans using large language models. arXiv preprint arXiv:2209.11302 , 2022. [51] Chan Hee Song, Jiaman Wu, Clayton Washington, Brian M Sadler, Wei-Lun Chao, and Yu Su. Llm-planner: Few-shot grounded planning for embodied agents with large language models. arXiv preprint arXiv:2212.04088 , 2022. [52] Sarath Sreedharan, Tathagata Chakraborti, Christian Muise, Yasaman Khazaeni, and Subbarao Kambhampati. –d3wa+–a case study of xaip in a model acquisition task for dialogue planning. In Proceedings of the International Conference on Automated Planning and Scheduling , volume 30, pages 488–497, 2020. [53] Kaya Stechly, Matthew Marquez, and Subbarao Kambhampati. Gpt-4 doesn’t know it’s wrong: An analysis of iterative prompting for reasoning problems. arXiv preprint arXiv:2310.12397 , 2023. [54] Karthik Valmeekam, Matthew Marquez, and Subbarao Kambhampati. Can large language models really improve by self-critiquing their own plans? arXiv preprint arXiv:2310.08118 , 2023. [55] Karthik Valmeekam, Matthew Marquez, Sarath Sreedharan, and Subbarao Kambhampati. On the planning abilities of large language models–a critical investigation. arXiv preprint arXiv:2305.15771 , 2023. [56] Zihao Wang, Shaofei Cai, Anji Liu, Xiaojian Ma, and Yitao Liang. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560 , 2023. 13 Page 14: [57] Lionel Wong, Gabriel Grand, Alexander K Lew, Noah D Goodman, Vikash K Mansinghka, Jacob Andreas, and Joshua B Tenenbaum. From word models to world models: Translating from natural language to the probabilistic language of thought. arXiv preprint arXiv:2306.12672 , 2023. [58] Yaqi Xie, Chen Yu, Tongyao Zhu, Jinbin Bai, Ze Gong, and Harold Soh. Translating natural language to planning goals with large-language models. arXiv preprint arXiv:2302.05128 , 2023. [59] Qiang Yang, Kangheng Wu, and Yunfei Jiang. Learning action models from plan examples using weighted max-sat. Artificial Intelligence , 171(2-3):107–143, 2007. [60] Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik R Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. In The Eleventh International Conference on Learning Representations , 2023. [61] Hankz Hankui Zhuo, Qiang Yang, Derek Hao Hu, and Lei Li. Learning complex action models with quantifiers and logical implications. Artificial Intelligence , 174(18):1540–1569, 2010. [62] Hankz Hankui Zhuo, Yantian Zha, Subbarao Kambhampati, and Xin Tian. Discovering underly- ing plans based on shallow models. ACM Transactions on Intelligent Systems and Technology (TIST) , 11(2):1–30, 2020. 14 Page 15: A Appendix Contents 1 Introduction 1 2 Related Work 3 3 Problem Setting and Background 3 3.1 Classical planning problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.2 PDDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 4 Methodology 4 4.1 Constructing PDDL models with LLMs . . . . . . . . . . . . . . . . . . . . . . . 4 4.2 Correcting errors in the initial PDDL models . . . . . . . . . . . . . . . . . . . . . 6 4.3 Generating plans with the extracted PDDL models . . . . . . . . . . . . . . . . . 7 5 Empirical Evaluation 7 5.1 Constructing PDDL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 5.2 Correcting PDDL with domain experts . . . . . . . . . . . . . . . . . . . . . . . . 8 5.3 Generating plans with the extracted PDDL models . . . . . . . . . . . . . . . . . 9 6 Conclusion 10 A Appendix 15 A.1 Broader impact on using LLMs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 A.2 Additional discussion on alternative to action-by-action PDDL construction . . . . 16 A.3 Examples of feedback messages that capture syntax errors . . . . . . . . . . . . . 17 A.4 Techniques that assist users to locate errors in PDDL models . . . . . . . . . . . . 17 A.5 Detailed description of the Household domain . . . . . . . . . . . . . . . . . . . . 18 A.6 Household: Constructing PDDL Models . . . . . . . . . . . . . . . . . . . . . . . 18 A.6.1 An example prompt for constructing PDDL models of the action "close a small receptacle" . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 A.6.2 Navigate to a furniture piece or an appliance . . . . . . . . . . . . . . . . 20 A.6.3 Pick up an object on or in a furniture piece or an appliance . . . . . . . . . . 21 A.6.4 Put an object on or in a furniture piece or an appliance . . . . . . . . . . . . 21 A.6.5 Stack Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 A.6.6 Unstack Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 A.6.7 Open a furniture piece or an appliance . . . . . . . . . . . . . . . . . . . . 23 A.6.8 Close a furniture piece or an appliance . . . . . . . . . . . . . . . . . . . . 23 A.6.9 Toggle a small appliance on . . . . . . . . . . . . . . . . . . . . . . . . . 24 A.6.10 Toggle a small appliance off . . . . . . . . . . . . . . . . . . . . . . . . . 24 A.6.11 Slice an object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 A.6.12 Heat food with a microwave . . . . . . . . . . . . . . . . . . . . . . . . . 25 A.6.13 Heat food with pan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 A.6.14 Transfer food from one small receptacle to another . . . . . . . . . . . . . 27 A.6.15 Put an object onto or into a small receptacle like a bowl and plate . . . . . 27 A.6.16 Pick up an object on or in a small receptacle like a bowl and plate . . . . . 28 A.6.17 Open a small receptacle such as a lunch box with a lid . . . . . . . . . . . 29 A.6.18 Close a small receptacle such as a lunch box with a lid . . . . . . . . . . . 30 A.6.19 Mash food with a blender . . . . . . . . . . . . . . . . . . . . . . . . . . 30 A.6.20 Wash an object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 A.6.21 Wipe a surface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 A.6.22 Vacuum a carpet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 A.6.23 Empty a vacuum cleaner . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 A.6.24 Examples of PDDL action models constructed by GPT-3.5-Turbo . . . . . 33 A.6.25 The initial set of predicates extracted by GPT-4 . . . . . . . . . . . . . . . 34 A.7 Household: Correcting PDDL Models . . . . . . . . . . . . . . . . . . . . . . . . 35 A.7.1 Heat food with a pan . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 15 Page 16: A.7.2 Slice an object . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 A.7.3 GPT-3.5-Turbo trying to correct the action model of "slice an object" . . . 38 A.8 Logistics: Constructing PDDL Models . . . . . . . . . . . . . . . . . . . . . . . . 39 A.8.1 Load a package into a truck . . . . . . . . . . . . . . . . . . . . . . . . . 39 A.8.2 Unload a package from a truck . . . . . . . . . . . . . . . . . . . . . . . . 39 A.8.3 Load a package into an airplane . . . . . . . . . . . . . . . . . . . . . . . 40 A.8.4 Unload a package from an airplane . . . . . . . . . . . . . . . . . . . . . 40 A.8.5 Drive a truck from one location to another in a city . . . . . . . . . . . . . 40 A.8.6 Fly an airplane from one city to another . . . . . . . . . . . . . . . . . . . . 41 A.8.7 Examples of PDDL action models constructed by GPT-3.5-Turbo . . . . . . 41 A.8.8 The initial set of predicates extracted by GPT-4 . . . . . . . . . . . . . . . 42 A.9 Logistics: Correcting PDDL Models . . . . . . . . . . . . . . . . . . . . . . . . . 42 A.9.1 Fly an airplane from one city to another . . . . . . . . . . . . . . . . . . . 42 A.9.2 GPT-3.5-Turbo trying to correct the action model of "fly an airplane from one city to another" . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 A.10 Tyreworld: Constructing PDDL Models . . . . . . . . . . . . . . . . . . . . . . . 44 A.10.1 Open a container . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 A.10.2 Close a container . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 A.10.3 Fetch an object from a container . . . . . . . . . . . . . . . . . . . . . . . 45 A.10.4 Put an object into a container . . . . . . . . . . . . . . . . . . . . . . . . . 45 A.10.5 Loosen a nut in a hub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 A.10.6 Tighten a nut in a hub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 A.10.7 Jack up a hub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 A.10.8 Jack down a hub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 A.10.9 Unfasten a hub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 A.10.10 Fasten a hub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 A.10.11 Remove wheel from hub . . . . . . . . . . . . . . . . . . . . . . . . . . 48 A.10.12 Put wheel on hub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 A.10.13 Inflate wheel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 A.10.14 Examples of PDDL action models constructed by GPT-3.5-Turbo . . . . . 50 A.10.15 The initial set of predicates extracted by GPT-4 . . . . . . . . . . . . . . 50 A.11 Tyreworld: Correcting PDDL Models . . . . . . . . . . . . . . . . . . . . . . . . . 51 A.11.1 Unfasten a hub . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 A.11.2 Inflate wheel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 A.12 LLM planners back-prompted by V AL using LLM-acquired PDDL model . . . . . 53 A.12.1 Examples of prompts for LLM planners . . . . . . . . . . . . . . . . . . . 53 A.12.2 Translation to admissible actions . . . . . . . . . . . . . . . . . . . . . . . 55 A.12.3 Back-prompting LLM planners with validation feedback by V AL . . . . . 56 A.12.4 Examples of validation feedback and instructions with ordering constraints 57 A.13 Translating user instructions into PDDL goal specifications . . . . . . . . . . . . . 59 A.1 Broader impact on using LLMs There is a general temptation to use LLMs for a variety of tasks, including plan generation. Given the fact that LLMs cannot guarantee the generation of correct plans, this can lead to safety and security issues downstream. Our approach of teasing a domain model from LLMs, and using it in conjunction with external sound planners aims to mitigate these safety concerns. Nevertheless, given that humans are still in charge of verifying the correctness of the domain models extracted from LLMs, there is still the possibility that an incorrect or undesirable domain model is inadvertently certified correct, leading to undesirable plans and agent behaviors down the line. Improved explainability is another advantage of extracting explicit domain models. As opposed to just directly querying LLMs for plans, generating behaviors with (intermediate) symbolic models offers additional opportunities for explanation, drawing upon existing works in explainable AI [ 23]. In cases where the user’s understanding does not align with the transcribed model, debugging and model reconciliation techniques such as D3wa+ [52] can also be directly applied. A.2 Additional discussion on alternative to action-by-action PDDL construction One alternative to our action-by-action generation could be to include descriptions of all the actions in the prompt and require the LLM to construct the entire domain model in a single dialogue. This 16 Page 17: approach may allow the LLM to better establish a global view of all the actions. However, we do not pursue this alternative here for the following reasons: (a) the inclusion of all actions might result in a lengthy prompt, potentially exceeding the context window size of an LLM. This could pose practical issues for utilizing smaller language models (e.g., GPT-3.5-Turbo [ 36]) or attempting to train smaller specialized models; (b) our integration of corrective feedback relies on continuing the construction dialogue (Sec. 4.2), which necessitates a shorter initial prompt to fit within the context window; (c) our experiments indicate that the action-by-action construction approach already achieves satisfactory results A.3 Examples of feedback messages that capture syntax errors Recall that we rely on the PDDL validator in V AL to identify syntax errors. However, several "simpler" syntax errors can be easily detected using simple Python scripts. In our experiments, we wrote our scripts to capture such syntax errors. Also note that since these errors can be detected at a minimal cost, the corresponding feedback messages are directly provided to the LLMs, and they are not counted in the results reported in Table 1. These "simpler" syntax errors include: 1.In this work, we only consider standard base-level PDDL. However, it’s possible that the LLMs have seen various extensions of PDDL and might use them in the constructed domain models. Hence, we provide a feedback message to the LLM whenever we detect an unsupported keyword. One example feedback is: " The precondition or effect contain the keyword ‘forall’ that is not supported in a standard STRIPS style model. Please express the same logic in a simplified way. You can come up with new predicates if needed (but note that you should use existing predicates as much as possible). " 2.Newly created predicates might have the same names as existing object types, which is not allowed in PDDL. In such cases, a feedback message is provided to notify the LLM about the name clash. For instance, a message might state: " The following predicate(s) have the same name(s) as existing object types: 1. ‘smallReceptacle’. Please rename these predicates. " 3.Newly created predicates might have the same names as existing predicates, which is not allowed in PDDL. Also, LLMs often mistakenly list existing predicates under the ‘New Predicates’ section. In such cases, a feedback message is provided to notify the LLM about the name clash or mistake. For instance, a message might state: " The following predicate(s) have the same name(s) as existing predicate(s): 1. (cutting-board ?z - smallReceptacle), true if the small receptacle ?z is a cutting board | existing predicate with the same name: (cutting-board ?z - householdObject), true if the object ?z is a cutting board. You should reuse existing predicates whenever possible. If you are reusing existing predicate(s), you shouldn’t list them under ‘New Predicates’. If existing predicates are not enough and you are devising new predicate(s), please use names that are different from existing ones. Please revise the PDDL model to fix this error. " This is the most common syntax error made by GPT-4 in our experiments. 4.The LLMs might fail to only use the object types given in the prompt. An example feedback can be: " There is an invalid object type ‘pump’ for the parameter ?p. " In our experiments, the above error types encompass the majority of syntax errors made by GPT-4. In some less-common cases, GPT-4 may have problems with predicate usage, usually caused by mis-matched object types. This kind of error can be captured by V AL and an example feedback message can be: " There is a syntax error, the second parameter of ‘object-on’ should be a furnitureAppliance, but a householdObject was given. Please use the correct predicate or devise new one(s) if needed. " A.4 Techniques that assist users to locate errors in PDDL models Several well-established techniques and tools are available for locating errors in PDDL models. For example, graphical tools like GIPO [ 49] can effectively visualize the causal dependencies of actions. However, these advanced tools or techniques are beyond the scope of this work. Here, we outline a viable solution as a starting point for users who are unfamiliar with these tools. 17 Page 18: There are two stages in which corrective feedback can be obtained: during the construction of the PDDL model, and when the domain model is used to generate a plan. In the first stage, end users can direclty review the domain model and identify potential factual errors. Since all the predicates and parameters are accompanied by natural language descriptions, we can easily convert the PDDL model into natural language and present it to the users. This allows users to pre-screen the preconditions and effects. Note that we do not expect all factual errors to be caught in this stage, because the users may not be aware of certain constraints until they review the final plans. In the second stage, the PDDL model is used to solve downstream planning problems by following the procedure outlined in Sec. 4.3. Two possible cases may arise here: (a) no plan can be found for the given goal specification, or (b) at least one plan is found but it either gets rejected by the users or results in execution failure in the actual environment. To address the first case, we can request the users to suggest a goal-satisficing plan, which is supposed to be executable (but not necessarily optimal). We then use the generated PDDL model to "validate" the suggested plan. This allows us to find the first step in the plan that has an unsatisfied precondition(s). The models of all actions up to this step, along with the unmet precondition(s), are then converted to natural language and presented to the user for inspection. As an example, in the model of "slice an object" extracted by GPT-4 (Sec. A.6.11 in Appendix), the model requires the object to be placed on a cutting board and a furniture piece at the same time, which is not physically possible. By leveraging a user-suggested plan, we can identify the potentially erroneous model(s) and flag incorrect precondition(s). In the second case where an invalid plan is afforded by the PDDL model, there are typically missing preconditions or effects in the actions taken, both during and prior to the execution failure. The users can bring attention to these actions. A.5 Detailed description of the Household domain We consider a single-arm robot model that closely resembles the SPOT robot and the Fetch Mobile Manipulator. Consequently, the robot is incapable of grasping multiple objects simultaneously or executing manipulation actions while holding irrelevant items (e.g., opening a fridge door while holding a mug). We also ensure the constraints align with real-world robotic capabilities. For example, we recognize that robot arms may be significantly less flexible than human arms, and therefore, we require certain manipulation tasks to be performed on furniture pieces with open and flexible surfaces (e.g., the robot can only pick up food items from a lunch box when the lunch box is placed on a kitchen countertop instead of inside a fridge). The list of actions and their descriptions can be found in Appendix starting from Sec. A.6.2. The PDDL-construction prompt for this domain includes a general description of the domain, which outlines the tasks to be performed by the robot, the types of objects involved, and details of the robot’s morphology. An example of a complete prompt can be found in Fig. A.6.1. A.6 Household: Constructing PDDL Models The AI agent here is a household robot that can navigate to various large and normally immovable furniture pieces or appliances in the house to carry out household tasks. Note that the robot has only one gripper, so (a) it can only hold one object; (b) it shouldn’t hold any other irrelevant objects in its gripper while performing some manipulation tasks (e.g., opening a drawer or closing a window); (c) operations on small household items should be carried out on furniture with a flat surface to get enough space for manipulation. There are three major types of objects in this domain: robot, furnitureAppliance, and householdObject. The object type furnitureAppliance covers large and normally immovable furniture pieces or appliances, such as stove burners, side tables, dining tables, drawer, cabinets, or microwaves. The object type householdObject covers all other small household items, such as handheld vacuum cleaners, cloth, apples, bananas, and small receptacles like bowls and lunch boxes. There is a subtype of householdObject called smallReceptacle that covers small receptacles like bowls, lunch boxes, plates, etc. In this domain, the locations of the robot and small household items (e.g., apples, oranges, bowls, lunch boxes or lamps) are determined by large and normally immovable furniture pieces or appliances. 18 Page 19: A.6.1 An example prompt for constructing PDDL models of the action "close a small receptacle" An example prompt for constructing PDDL models of the action "close a small receptacle" Instructions for the PDDL generation task You are defining the preconditions and effects (represented in PDDL format) of an AI agent 's actions. Information about the AI agent will be provided in the domain description. Note that individual conditions in preconditions and effects should be listed separately. For example, “object_1 is washed and heated” should be considered as two separate conditions “object_1 is washed” and “object_1 is heated”. Also, in PDDL, two predicates cannot have the same name even if they have different parameters. Each predicate in PDDL must have a unique name, and its parameters must be explicitly defined in the predicate definition. It is recommended to define predicate names in an intuitive and readable way.,→ ,→ ,→ ,→ ,→ ,→ ,→ One or two examples from other domains for illustrating the input and output formats Here are two examples from the classical BlocksWorld domain for demonstrating the output format. Domain information: BlocksWorld is a planning domain in artificial intelligence. The AI agent here is a mechanical robot arm that can pick and place the blocks. Only one block may be moved at a time: it may either be placed on the table or placed atop another block. Because of this, any blocks that are, at a given time, under another block cannot be moved. There is only one type of object in this domain, and that is the block.,→ ,→ ,→ ,→ Example 1 Action: This action enables the robot to put a block onto the table. For example, the robot puts block_1 onto the table. ,→ You can create and define new predicates, but you may also reuse the following predicates: No predicate has been defined yet Parameters: 1. ?x - block: the block to put down Preconditions: ``` (and (robot-holding ?x) ) ``` Effects: ``` (and (not (robot-holding ?x)) (block-clear ?x) (robot-hand-empty) (block-on-table ?x) ) ``` New Predicates: 1. (robot-holding ?x - block): true if the robot arm is holding the block ?x 2. (block-clear ?x - block): true if the block ?x is not under any another block 3. (robot-hand-empty): true if the robot arm is not holding any block 4. (block-on-table ?x - block): true if the block ?x is placed on the table Example 2 Action: This action enables the robot to pick up a block on the table. You can create and define new predicates, but you may also reuse the following predicates: 1. (robot-holding ?x - block): true if the robot arm is holding the block ?x 2. (block-clear ?x - block): true if the block ?x is not under any another block 3. (robot-hand-empty): true if the robot arm is not holding any block 4. (block-on-table ?x - block): true if the block ?x is placed on the table Parameters: 1. ?x - block: the block to pick up Preconditions: ``` (and (block-clear ?x) (block-on-table ?x) (robot-hand-empty) ) ``` 19 Page 20: Effects: ``` (and (not (block-on-table ?x)) (not (block-clear ?x)) (not (robot-hand-empty)) (robot-holding ?x) ) ``` New Predicates: No newly defined predicate Here is the task. A natural language description of the domain Domain information: The AI agent here is a household robot that can navigate to various large and normally immovable furniture pieces or appliances in the house to carry out household tasks. Note that the robot has only one gripper, so (a) it can only hold one object; (b) it shouldn 't hold any other irrelevant objects in its gripper while performing some manipulation tasks (e.g., opening a drawer or closing a window); (c) operations on small household items should be carried out on furniture with a flat surface to get enough space for manipulation. There are three types of objects in this domain: robot, furnitureAppliance, and householdObject. The object type furnitureAppliance covers large and normally immovable furniture pieces or appliances, such as stove burners, side tables, dining tables, drawer, cabinets, or microwaves. The object type householdObject covers all other small household items, such as handheld vacuum cleaners, cloth, apples, bananas, and small receptacles like bowls and lunch boxes. In this domain, the locations of the robot and small household items (e.g., apples, oranges, bowls, lunch boxes or lamps) are determined by large and normally immovable furniture pieces or appliances.,→ ,→ ,→ ,→ ,→ ,→ ,→ ,→ ,→ ,→ ,→ ,→ ,→ A natural language description of the action Action: This action enables the robot to close a small receptacle receptacle_1 that is openable (e.g. small storage boxes or lunch boxes with lids). For example, the robot closes lunch_box_2, or the robot closes storage_box_1.,→ ,→ The dynamically updated list of predicates You can create and define new predicates, but you may also reuse the following predicates: 1. (robot-at ?x - furnitureAppliance): true if the robot is at the furniture or appliance ?x 2. (object-on ?x - householdObject ?y - furnitureAppliance): true if the object ?x is on or in the furniture or appliance ?y ,→ 3. (pickupable ?x - householdObject): true if the object ?x can be picked up by the robot 4. (closed-receptacle ?x - smallReceptacle): true if the small receptacle ?x is closed ... Parameters: A.6.2 Navigate to a furniture piece or an appliance Action description This action enables the robot to navigate from one normally immovable piece of furniture to another (e.g., dining tables, side tables, cabinets, and sinks) or an appliance (e.g., refrigerators, coffee makers, microwaves, and washers).,→ ,→ GPT-4: Navigate to a furniture piece or an appliance Parameters: 1. ?x - furnitureAppliance: the furniture or appliance the robot is currently at 2. ?y - furnitureAppliance: the furniture or appliance the robot wants to navigate to Preconditions: (and (robot-at ?x) (not (= ?x ?y)) ) Effects: (and (not (robot-at ?x)) (robot-at ?y) ) 20 Page 21: A.6.3 Pick up an object on or in a furniture piece or an appliance Action description This action enables the robot to pick up an object object_1 in/on a large piece of furniture (e.g., dining tables, drawers, cabinets) or an appliance (e.g., dishwashers and refrigerators). For example, the robot pick up coffee_mug_1 on dining_table_1, the robot pick up fork_1 in drawer_1, or the robot pick up apple_1 in refrigerator_1.,→ ,→ ,→ Additional information from the user (a) This action is only applicable for objects that are pickupable by the robot. (b) The furniture piece or appliance should be opened if it is openable. (c) The object to pick up should not be stacked on top of other household items. GPT-4: Pick up an object on or in a furniture piece or an appliance Parameters: 1. ?x - householdObject: the object to pick up 2. ?y - furnitureAppliance: the furniture or appliance where the object is located Preconditions: (and (robot-at ?y) (object-on ?x ?y) (pickupable ?x) Inaccurate precondition: "?x is not stacked on top of other object" should only apply when ?x is a stackable object (not (object-stacked ?x)) (robot-hand-empty) (or (not (openable ?y)) (opened ?y) ) Missing precondition: there should be no other object stacked on top of ?x if ?x is a stackable object ) Effects: (and (not (object-on ?x ?y)) (robot-holding ?x) (not (robot-hand-empty)) ) A.6.4 Put an object on or in a furniture piece or an appliance Action description This action enables the robot to put an object object_1 in/on a large piece of furniture (e.g., dining tables, drawers, cabinets) or an appliance (e.g., dishwashers and refrigerators). For example, the robot puts coffee_mug_1 on dining_table_1, the robot puts fork_1 in drawer_1, or the robot puts apple_1 in refrigerator_1.,→ ,→ ,→ Additional information from the user (a) The furniture piece or appliance should be opened if it is openable. GPT-4: Put an object on or in a furniture piece or an appliance Parameters: 1. ?x - householdObject: the object to put in/on the furniture or appliance 2. ?y - furnitureAppliance: the furniture or appliance to put the object in/on Preconditions: (and (robot-at ?y) (robot-holding ?x) (pickupable ?x) 21 Page 22: Unnecessary precondition "no other object stacked on object ?x" (object-clear ?x) (or (not (openable ?y)) (opened ?y) ) ) Effects: (and (not (robot-holding ?x)) (robot-hand-empty) (object-on ?x ?y) Syntax error, there is an illegal keyword "if" (if (openable ?y) (closed ?y)) ) A.6.5 Stack Objects Action description This action enables the robot to stack one object object_1 on top of another object object_2 that is on the same piece of furniture. For example, the robot stacks book_1 on top of book_2, or the robot stacks lunch_box_1 on top of lunch_box_2.,→ ,→ Additional information from the user (a) The furniture piece has to have an open and flat surface. (b) Note that both object_1 and object_2 must be stackable objects. (c) You may assume the robot is holding object_1 at the beginning. (d) Also, there should be no other object on top of object_2. GPT-4: Stack objects Parameters: 1. ?x - householdObject: the object to stack on top of another object 2. ?y - householdObject: the object on which ?x will be stacked 3. ?z - furnitureAppliance: the furniture or appliance on which both objects are placed Preconditions: (and (robot-holding ?x) (object-on ?y ?z) (object-clear ?y) (flat-surface ?z) Incorrect precondition: should be stackable (pickupable ?x) Incorrect precondition: should be stackable (pickupable ?y) (robot-at ?z) ) Effects: (and (not (robot-holding ?x)) (not (object-clear ?y)) (robot-hand-empty) (object-on ?x ?z) (object-stacked ?x) Missing effect: object ?x should be stacked on top of object ?y ) A.6.6 Unstack Objects Action description This action enables the robot to unstack one object object_1 that is on top of another object object_2. For example, the robot stacks book_1 on top of book_2, or the robot stacks lunch_box_1 on top of lunch_box_2. The robot will hold the object after unstacking it.,→ ,→ 22 Page 23: GPT-4: Unstack objects Parameters: 1. ?x - householdObject: the object to unstack from another object 2. ?y - householdObject: the object that ?x is stacked on 3. ?z - furnitureAppliance: the furniture or appliance where the objects are stacked Preconditions: (and (robot-at ?z) (object-on ?x ?z) (object-on ?y ?z) (pickupable ?x) (object-stacked ?x) (object-clear ?x) (robot-hand-empty) Missing precondition: object ?x should be stacked on top of object ?y ) Effects: (and (not (object-on ?x ?z)) (not (object-stacked ?x)) Incorrect effect, this action shouldn 't change whether or not object ?x is under any another household object (not (object-clear ?x)) (not (robot-hand-empty)) (robot-holding ?x) (object-clear ?y) Missing effect: object ?x is no longer stacked on top of object ?y ) A.6.7 Open a furniture piece or an appliance Action description This action enables the robot to open a large piece of furniture (e.g., cabinets and drawers) or an appliance (e.g., dishwashers and refrigerators) that is openable. For example, the robot opens cabinet_1, or the robot opens drawer_2.,→ ,→ GPT-4: Open a furniture piece or an appliance Parameters: 1. ?y - furnitureAppliance: the furniture or appliance to be opened Preconditions: (and (robot-at ?y) (openable ?y) (closed ?y) (robot-hand-empty) ) Effects: (and (not (closed ?y)) (opened ?y) ) A.6.8 Close a furniture piece or an appliance Action description This action enables the robot to close a large piece of furniture (e.g., cabinets and drawers) or an appliance (e.g., dishwashers and refrigerators) that is openable. For example, the robot closes cabinet_1, or the robot closes drawer_2.,→ ,→ 23 Page 24: GPT-4: Close a furniture piece or an appliance Parameters: 1. ?y - furnitureAppliance: the furniture or appliance to close Preconditions: (and (robot-at ?y) (openable ?y) (opened ?y) (robot-hand-empty) ) Effects: (and (not (opened ?y)) (closed ?y) ) A.6.9 Toggle a small appliance on Action description This action enables the robot to toggle a small appliances (like humidifiers and light bulbs) to switch them on. For example, the robot toggles humidifier_2 on, or the robot toggle light_bulb_1 on.,→ ,→ GPT-4: Toggle a small appliance on Parameters: 1. ?x - householdObject: the small appliance to toggle on 2. ?y - furnitureAppliance: the furniture or appliance where the small appliance is located Preconditions: (and (robot-at ?y) (object-on ?x ?y) (robot-hand-empty) (not (appliance-on ?x)) Missing precondition: the appliance to toggle should be a toggleable appliance ) Effects: (and (appliance-on ?x) ) A.6.10 Toggle a small appliance off Action description This action enables the robot to toggle a small appliances (like humidifiers and light bulbs) to switch them off. For example, the robot toggles humidifier_2 off, or the robot toggle light_bulb_1 off.,→ ,→ GPT4: Toggle a small appliance off Parameters: 1. ?r - robot: the robot performing the action 2. ?o - householdObject: the small appliance to be toggled off 3. ?f - furnitureAppliance: the furniture or appliance where the small appliance is located Preconditions: (and (robot-at ?r ?f) (object-in-on-furniture ?o ?f) (robot-gripper-empty ?r) (appliance-on ?o) ) Effects: 24 Page 25: (and (not (appliance-on ?o)) ) A.6.11 Slice an object Action description This action enables the robot to slice objects (like fruits and vegetables) with a knife. For example, the robot slices orange_1 with knife_2, the robot slices potato_3 with knife_1. ,→ Additional information from the user (a) Note that the object to slice needs to be placed on a cutting board. A cutting board can be viewed as a special kind of receptacle. ,→ (b) You may assume the robot is holding the knife in its gripper at the beginning. (c) The object to slice should be sliceable. (d) The furniture piece needs to have an open and flat surface to allow enough space for manipulation (e.g., countertops or tables). ,→ GPT-4: Slice an object Parameters: 1. ?x - householdObject: the object to slice 2. ?k - householdObject: the knife to use for slicing 3. ?y - furnitureAppliance: the furniture or appliance with a flat surface for manipulation Incorrect object type, a cutting board should be viewed as a small receptacle as mentioned in the prompt 4. ?z - householdObject: the cutting board on which the object is placed Preconditions: (and (robot-at ?y) Incorrect precondition: ?x should be on the cutting board not directly on the furniture ?y (object-on ?x ?y) (object-on ?z ?y) (robot-holding ?k) (sliceable ?x) (cutting-board ?z) (knife ?k) (flat-surface ?y) (object-whole ?x) ) Effects: (and (not (object-whole ?x)) (object-sliced ?x) ) A.6.12 Heat food with a microwave Action description This action enables the robot to start a microwave and heat up the food inside. For example, the robot heats pizza_1 with microwave_2, the robot heats pasta_2 with microwave_1. ,→ Additional information from the user (a) The food to heat should be placed in a small receptacle (e.g., a plate or a bowl). (b) The robot also needs to close the door of the microwave before taking this action. (c) Note that the food is no longer pickupable after it has been heated. GPT-4: Heat food with a microwave Parameters: 1. ?x - householdObject: the food to heat up 25 Page 26: 2. ?y - furnitureAppliance: the microwave to use 3. ?z - smallReceptacle: the small receptacle containing the food Preconditions: (and (robot-at ?y) Incorrect precondition: the object should be in the receptacle not directly in the microwave (object-on ?x ?y) (object-in ?x ?z) (is-small-receptacle ?z) (object-on ?z ?y) Incorrect precondition: ?x needs not be pickupable (pickupable ?x) (closed ?y) (not (food-heated ?x)) Missing precondition: ?y should be a microwave ) Effects: (and (not (pickupable ?x)) (food-heated ?x) ) A.6.13 Heat food with pan Action description This action enables the robot to heat food which is heatable with a pan. For example, the robot heats pizza_1 with pan_2, the robot heats pasta_2 with pan_1. ,→ Additional information from the user (a) Note that the food should be placed on the pan, and the pan needs to be placed on a stove burner before executing this action. ,→ (b) A stove burner can be viewed as an immovable appliance, and a pan can be viewed as a small receptacle. ,→ (c) Note that the food is no longer pickupable after it has been heated. GPT-4: Heat food with pan Parameters: 1. ?f - householdObject: the food to be heated Inaccurate object type, a pan ?p can be viewed as a small receptacle 2. ?p - householdObject: the pan to be used for heating the food 3. ?s - furnitureAppliance: the stove burner on which the pan is placed Preconditions: (and (robot-at ?s) Syntax error: the second parameter of `object-on `should be a furnitureAppliance, but a householdObject was given (object-on ?f ?p) (object-on ?p ?s) Incorrect precondition: ?f needs not be pickupable (pickupable ?f) (is-small-receptacle ?p) Incorrect precondition: a stove or burner ?s doesn 't need to have a flat surface (flat-surface ?s) (robot-hand-empty) Missing precondition: ?p has to be a pan Missing precondition: ?s has to be a stove burner ) Effects: (and (not (pickupable ?f)) (food-heated-pan ?f) ) 26 Page 27: A.6.14 Transfer food from one small receptacle to another Action description This action enables the robot to transfer food food_1 from one small receptacle receptacle_1 to another small receptacle receptacle_2. This skill is usually used to transfer food that can not be easily picked up by the robot. For example, the robot can pour the milk from milk_box_1 to mug_1, or the robot can transfer heated pizza_1 from pan_2 to plate_1.,→ ,→ ,→ Additional information from the user (a) The furniture piece needs to have an open and flat surface to allow enough space for manipulation (e.g., countertops or tables). ,→ (b) A receptacle should be opened if it is openable. (c) Both receptacles should not be stacked on top of other objects if they are stackable. GPT-4: Transfer food from one small receptacle to another Parameters: 1. ?food - householdObject: the food to be transferred 2. ?receptacle1 - smallReceptacle: the small receptacle from which the food will be transferred 3. ?receptacle2 - smallReceptacle: the small receptacle to which the food will be transferred 4. ?furniture - furnitureAppliance: the furniture piece with a flat surface where the transfer will take place ,→ Preconditions: (and (robot-at ?furniture) (object-on ?receptacle1 ?furniture) (object-on ?receptacle2 ?furniture) (object-in ?food ?receptacle1) (is-small-receptacle ?receptacle1) (is-small-receptacle ?receptacle2) (flat-surface ?furniture) (robot-hand-empty) Inaccurate precondition: "?receptacle1 is not stacked on other objects" should only apply if ?receptacle1 is a stackable object (not (object-stacked ?receptacle1)) Inaccurate precondition: "?receptacle2 is not stacked on other objects" should only apply if ?receptacle2 is a stackable object (not (object-stacked ?receptacle2)) (or (not (openable-receptacle ?receptacle1)) (opened-receptacle ?receptacle1)) (or (not (openable-receptacle ?receptacle2)) (opened-receptacle ?receptacle2)) Missing precondition: there should be no other object stacked on top of ?receptacle1 if ?receptacle1 is stackable Missing precondition: there should be no other object stacked on top of ?receptacle2 if ?receptacle2 is stackable ) Effects: (and (not (object-in ?food ?receptacle1)) (object-in ?food ?receptacle2) ) A.6.15 Put an object onto or into a small receptacle like a bowl and plate Action description This action enables the robot to put an object object_1 into/onto a small receptacle receptacle_1 (e.g. storage boxes, bowls, plates, or pans). For example, the robot puts apple_1 into blow_1, or the robot puts apple_2 into lunch_box_2.,→ ,→ Additional information from the user (a) The furniture piece needs to have an open and flat surface to allow enough space for manipulation (e.g., countertops or tables). ,→ (b) The receptacle should be opened if it is openable. (c) The receptacle should not be stacked on top of other objects if it is stackable. 27 Page 28: GPT-4: Put an object onto or into a small receptacle like a bowl and plate Parameters: 1. ?x - householdObject: the object to put into/onto the small receptacle 2. ?y - furnitureAppliance: the furniture or appliance with a flat surface for manipulation 3. ?z - smallReceptacle: the small receptacle to put the object into/onto Preconditions: (and (robot-at ?y) Inaccurate precondition: object ?x can 't be on the furnitureAppliance ?y when ?x is held by the robot (object-on ?x ?y) (pickupable ?x) Inaccurate precondition: this should only apply if ?z is stackable (not (object-stacked ?z)) (flat-surface ?y) (robot-holding ?x) (is-small-receptacle ?z) (object-on ?z ?y) Inaccurate precondition: this should only apply if ?z is stackable (object-clear ?z) Unnecessary precondition: the robot holding the object ?x already implies ?x is not in the receptacle ?z (not (object-in ?x ?z)) (or (not (openable-receptacle ?z)) (opened-receptacle ?z) ) ) Effects: (and (not (robot-holding ?x)) (not (object-on ?x ?y)) (robot-hand-empty) (object-in ?x ?z) ) A.6.16 Pick up an object on or in a small receptacle like a bowl and plate Action description This action enables the robot to pick up an object object_1 in some small receptacle receptacle_1 (e.g. storage boxes, lunch boxes, bowls, plates). For example, the robot picks up apple_1 in blow_1, or the robot picks up apple_2 in lunch_box_2.,→ ,→ Additional information from the user (a) The furniture piece needs to have an open and flat surface to allow enough space for manipulation (e.g., countertops or tables). ,→ (b) The receptacle should be opened if it is openable. GPT-4: Pick up an object on or in a small receptacle like a bowl and plate Parameters: 1. ?x - householdObject: the object to pick up 2. ?y - furnitureAppliance: the furniture or appliance where the object is located 3. ?z - smallReceptacle: the small receptacle containing the object Preconditions: (and (robot-at ?y) Incorrect precondition: the object ?x is in the receptacle ?z so ?x is not directly on the furniture or appliance ?y (object-on ?x ?y) (pickupable ?x) (object-in ?x ?z) (is-small-receptacle ?z) Inaccurate precondition: "?z is not stacked on other objects" should only apply if ?z is a stackable object (not (object-stacked ?z)) 28 Page 29: (flat-surface ?y) (robot-hand-empty) Incorrect precondition: ?x can 't be stacked inside a receptacle so the precondition `?x is not under other household object `is unnecessary (object-clear ?x) Incorrect precondition: ?z can be an openable receptacle (not (openable-receptacle ?z)) (or Syntax error, the first parameter of `openable `should be a furnitureAppliance but here a smallReceptacle is given (not (openable ?z)) (opened-receptacle ?z) ) Missing precondition: there should be no other object stacked on top of ?z if ?z is stackable ) Effects: (and (not (object-on ?x ?y)) (not (object-in ?x ?z)) (not (robot-hand-empty)) (robot-holding ?x) ) A.6.17 Open a small receptacle such as a lunch box with a lid Action description This action enables the robot to open a small receptacle receptacle_1 (e.g. small storage boxes or lunch boxes with lids). For example, the robot opens lunch_box_2, or the robot opens storage_box_1.,→ ,→ Additional information from the user (a) This action is only applicable for receptacles that are openable. (b) The receptacle needs to be placed on a furniture piece that has an open and flat surface to allow enough space for manipulation (e.g., countertops or tables). ,→ (c) The receptacle should not be stacked on top of other objects if it is stackable. GPT-4: Open a small receptacle such as a lunch box with a lid Parameters: 1. ?x - smallReceptacle: the small receptacle to be opened 2. ?y - furnitureAppliance: the furniture or appliance on which the small receptacle is placed Preconditions: (and (robot-at ?y) (object-on ?x ?y) (openable-receptacle ?x) (flat-surface ?y) (robot-hand-empty) Inaccurate precondition: should only apply if ?x is stackable (not (object-stacked ?x)) (closed-receptacle ?x) Missing precondition: there should be no other object stacked on top of ?x ) Effects: (and (not (closed-receptacle ?x)) (opened-receptacle ?x) ) 29 Page 30: A.6.18 Close a small receptacle such as a lunch box with a lid Action description This action enables the robot to close a small receptacle receptacle_1 that is openable (e.g. small storage boxes or lunch boxes with lids). For example, the robot closes lunch_box_2, or the robot closes storage_box_1.,→ ,→ Additional information from the user (a) This action is only applicable for receptacles that are openable. (b) The receptacle needs to be placed on a furniture piece that has an open and flat surface to allow enough space for manipulation (e.g., countertops or tables). ,→ (c) The receptacle should not be stacked on top of other objects if it is stackable. GPT-4: Close a small receptacle such as a lunch box with a lid Parameters: 1. ?x - smallReceptacle: the small receptacle to close 2. ?y - furnitureAppliance: the furniture or appliance on which the small receptacle is placed Preconditions: (and (robot-at ?y) (object-on ?x ?y) (openable-receptacle ?x) (opened-receptacle ?x) (flat-surface ?y) (robot-hand-empty) Inaccurate precondition: "?x is not stacked on other objects" should only apply if ?x is a stackable object (not (object-stacked ?x)) Missing precondition: there should be no other object stacked on top of ?x ) Effects: (and (not (opened-receptacle ?x)) (closed-receptacle ?x) ) A.6.19 Mash food with a blender Action description This action enables the robot to use a blender to mash some food in it. For example, the robot gets mashed apple_2 with blender_1, or the robot gets mashed banana_1 with blender_2. ,→ Additional information from the user (a) The food needs to be sliced beforehand and the food should be placed inside the blender. (b) Note that the food remains in the blender after this action is performed. (c) You may also assume the blender is turned off before and after mashing the food. GPT-4: Mash food with a blender Parameters: Incorrect object type: a blender ?b should be a smallReceptacle 1. ?b - householdObject: the blender to be used for mashing the food 2. ?f - householdObject: the food to be mashed 3. ?y - furnitureAppliance: the furniture or appliance where the blender is located Preconditions: (and (robot-at ?y) (object-on ?b ?y) Incorrect precondition: the food to mash should be in the blender ?b rather than directly on the furniture piece ?y (object-on ?f ?y) 30 Page 31: (object-in ?f ?b) (object-sliced ?f) (is-small-receptacle ?b) Incorrect precondition: as mentioned in the prompt, the blender should be turned off at the beginning (appliance-on ?b) (robot-hand-empty) Missing precondition: ?b should be a blender ) Effects: (and (not (appliance-on ?b)) (object-mashed ?f) Missing effect: ?f is no longer pickupable ) A.6.20 Wash an object Action Description This action enables the robot to wash an object (e.g., fruits and cloths) in a sink or basin. For example, the robot washed apple_1, or the robot washed romaine_lettuce_1. ,→ Additional information from the user (a) The object has to be something washable. (b) The robot should hold the object when washing it. GPT-4: Wash an object Parameters: 1. ?x - householdObject: the object to be washed 2. ?y - furnitureAppliance: the sink or basin where the object will be washed Preconditions: (and (robot-at ?y) (washable ?x) (robot-holding ?x) Missing precondition: ?y should be a sink or basin ) Effects: (and (not (object-dirty ?x)) (object-washed ?x) ) A.6.21 Wipe a surface Action description This action enables the robot to wipe and clean the surface of a piece of furniture or an appliance, such as a dining table, a mirror, a sink, or a bathtub, with a cloth. For example, the robot wipes dining_table_1 with cloth_1, the robot wipes mirror_1 with cloth_2.,→ ,→ Additional information from the user (a) You may assume the robot is holding the cloth before executing this action. (b) The cloth will be dirty after executing the action. (c) The robot should also use a clean cloth. 31 Page 32: GPT-4: Wipe a surface Parameters: 1. ?x - furnitureAppliance: the furniture or appliance to be wiped and cleaned 2. ?c - householdObject: the cloth to be used for wiping and cleaning Preconditions: (and (robot-at ?x) (robot-holding ?c) (washable ?c) (object-washed ?c) Incorrect precondition: ?x doesn 't need to have a flat surface (flat-surface ?x) Missing precondition: ?c should be a cloth ) Effects: (and (not (object-washed ?c)) (object-dirty ?c) (furniture-cleaned ?x) ) A.6.22 Vacuum a carpet Action description This action enables the robot to vacuum a carpet with a handheld vacuum cleaner. For example, the robot uses vacuum_cleaner_1 to vacuum carpet_2, the robot uses vacuum_cleaner_2 to vacuum carpet_1.,→ ,→ Additional information from the user (a) You need to make sure the dust bin of the vacuum cleaner is not full before executing this action. ,→ (b) You may assume the robot is holding the vacuum cleaner at the beginning. (c) The dust bin of the vacuum cleaner will be full of dust. GPT4: Vacuum a carpet Parameters: 1. ?v - householdObject: the handheld vacuum cleaner to use 2. ?c - furnitureAppliance: the carpet to vacuum Preconditions: (and (robot-holding ?v) (robot-at ?c) (pickupable ?v) (not (dustbin-full ?v)) (not (carpet-clean ?c)) Missing precondition: ?v should be a handheld vacuum cleaner Missing precondition: ?c should be a carpet ) Effects: (and (carpet-clean ?c) (dustbin-full ?v) ) A.6.23 Empty a vacuum cleaner Action description This action enables the robot to empty a vacuum cleaner’s dust bin and dump the dust into a trash can. For example, the robot empties vacuum_cleaner_1 and dumps the dust into trash_can_2, or the robot empties vacuum_cleaner_2 and dumps the dust into trash_can 1.,→ ,→ 32 Page 33: Additional information from the user (a) Note that the robot should hold the vacuum cleaner and stand by the trash can before executing this action. ,→ (b) After executing this action, the robot is still holding the vacuum cleaner. (c) The trash can should be opened if it 's openable. (d) The dust bin will be empty after executing this action. GPT-4: Empty a vacuum cleaner Parameters: 1. ?v - householdObject: the vacuum cleaner with a full dust bin 2. ?t - furnitureAppliance: the trash can where the dust will be dumped Preconditions: (and (robot-at ?t) (robot-holding ?v) (dustbin-full ?v) (pickupable ?v) Inaccurate precondition: ?t should be open only if it is openable (openable ?t) (opened ?t) Missing precondition: ?t should be a trash can Missing precondition: ?v should be a vacuum cleaner ) Effects: (and (not (dustbin-full ?v)) ) A.6.24 Examples of PDDL action models constructed by GPT-3.5-Turbo GPT-3.5-Turbo: Navigate to a furniture piece or an appliance Parameters: 1. ?f1 - furnitureAppliance: the furniture/appliance to navigate from 2. ?f2 - furnitureAppliance: the furniture/appliance to navigate to Preconditions: (and (robot-at ?f1) (reachable ?f1 ?f2) Irrelevant preconditions (not (robot-holding ?o)) Irrelevant preconditions (forall (?o - householdObject) (not (object-on ?o ?f1)) ) ) Effects: (and (robot-at ?f2) Missing effect: (not (robot-at ?f1)) ) GPT-3.5-Turbo: Pick up an object on or in a furniture piece or an appliance Parameters: 1. ?o - householdObject: the object to pick up 2. ?f - furnitureAppliance: the furniture/appliance on which the object is located Preconditions: (and (robot-at ?f) Incorrect precondition and unknown parameter: ?f1 (reachable ?f1 ?f) (object-on ?o ?f) (object-pickupable ?o) Inaccurate precondition: "?x is not stacked on top of other object" should 33 Page 34: only apply when ?x is a stackable object (not (object-stacked ?o)) Inaccurate precondition: the gripper should be empty (not (robot-holding ?o)) (or (not (furniture-openable ?f)) (furniture-open ?f)) Missing precondition: there should be no other object stacked on top of ?x if ?x is a stackable object ) Effects: (and (not (object-on ?o ?f)) Incorrect effect and unknown parameter: ?f1 (not (object-in ?o ?c)) Incorrect effect and unknown parameter: ?o2 (not (object-stacked-on ?o ?o2)) Incorrect effect (not (object-washed ?o)) (robot-holding ?o) ) GPT-3.5-Turbo: Stack objects 1. ?object_1 - householdObject: the object to be stacked on top of another object 2. ?object_2 - householdObject: the object on which ?object_1 will be stacked Missing parameter: ?f, the flat furniture piece Preconditions: (and Syntax error (robot-at ?f - furnitureAppliance) Syntax error and incorrect precondition (reachable ?f - furnitureAppliance ?f2 - furnitureAppliance) Incorrect precondition: should be stackable (object-pickupable ?object_1) Syntax error (object-stacked-on ?object_2 ?o - householdObject) Unnecessary precondition (not (object-stacked ?object_1)) Unknown parameter (not (object-stacked-on ?o ?object_1)) (object-on ?object_2 ?f) Incorrect precondition: the furniture piece should be flat not openable (furniture-openable ?f) Incorrect precondition (furniture-open ?f) (robot-holding ?object_1) Missing precondition: there should be no other objects on top of ?object_2 Missing precondition: ?object_2 should be stackable ) Effects: (and (not (robot-holding ?object_1)) (object-stacked-on ?object_1 ?object_2) Incorrect effect (not (object-on ?object_1 ?f)) ) A.6.25 The initial set of predicates extracted by GPT-4 The initial set of predicates extracted by GPT-4 1. (robot-at ?x - furnitureAppliance): true if the robot is at the furniture or appliance ?x 2. (object-on ?x - householdObject ?y - furnitureAppliance): true if the object ?x is on or in the furniture or appliance ?y ,→ 3. (pickupable ?x - householdObject): true if the object ?x can be picked up by the robot 4. (object-stacked ?x - householdObject): true if the object ?x is stacked on top of other household items ,→ 5. (openable ?y - furnitureAppliance): true if the furniture or appliance ?y can be opened 6. (opened ?y - furnitureAppliance): true if the furniture or appliance ?y is opened 7. (robot-holding ?x - householdObject): true if the robot is holding the object ?x 8. (robot-hand-empty): true if the robot 's gripper is empty 34 Page 35: 9. (object-clear ?x - householdObject): true if the object ?x is not under any another household object ,→ 10. (appliance-on ?x - householdObject): true if the small appliance ?x is switched on 11. (sliceable ?x - householdObject): true if the object ?x can be sliced 12. (cutting-board ?z - householdObject): true if the object ?z is a cutting board 13. (knife ?k - householdObject): true if the object ?k is a knife 14. (flat-surface ?y - furnitureAppliance): true if the furniture or appliance ?y has a flat surface for manipulation ,→ 15. (object-whole ?x - householdObject): true if the object ?x is whole and not sliced 16. (object-sliced ?x - householdObject): true if the object ?x has been sliced 17. (food-heated ?x - householdObject): true if the food ?x has been heated in the microwave 18. (food-heated-pan ?x - householdObject): true if the food ?x has been heated with a pan 19. (is-small-receptacle ?x - householdObject): true if the object ?x is a small receptacle like bowls, lunch boxes, plates, etc. ,→ 20. (object-in ?x - householdObject ?z - smallReceptacle): true if the object ?x is in the small receptacle ?z ,→ 21. (openable-receptacle ?x - smallReceptacle): true if the small receptacle ?x can be opened 22. (opened-receptacle ?x - smallReceptacle): true if the small receptacle ?x is opened 23. (object-mashed ?x - householdObject): true if the object ?x has been mashed in the blender 24. (washable ?x - householdObject): true if the object ?x can be washed 25. (object-washed ?x - householdObject): true if the object ?x has been washed 26. (object-dirty ?x - householdObject): true if the object ?x is dirty 27. (dustbin-full ?v - householdObject): true if the dust bin of the vacuum cleaner ?v is full 28. (carpet-clean ?c - furnitureAppliance): true if the carpet ?c is clean 29. (closed ?y - furnitureAppliance): true if the furniture or appliance ?y is closed 30. (closed-receptacle ?x - smallReceptacle): true if the small receptacle ?x is closed 31. (furniture-cleaned ?x - furnitureAppliance): true if the furniture or appliance ?x has been wiped and cleaned ,→ A.7 Household: Correcting PDDL Models A.7.1 Heat food with a pan GPT-4: Heat food with a pan User: ... The original prompt for PDDL construction ... Parameters: ------------------------------------- GPT-4: Parameters: 1. ?f - householdObject: the food to be heated There is an inaccurate object type: a pan ?p can be viewed as a small receptacle 2. ?p - householdObject: the pan to be used for heating the food 3. ?s - furnitureAppliance: the stove burner on which the pan is placed Preconditions: (and (robot-at ?s) Syntax error: the second parameter of `object-on `should be a furnitureAppliance, but a householdObject was given (object-on ?f ?p) (object-on ?p ?s) Incorrect precondition: ?f needs not be pickupable (pickupable ?f) (is-small-receptacle ?p) Incorrect precondition: a stove or burner ?s doesn 't need to have a flat surface (flat-surface ?s) (robot-hand-empty) Missing precondition: ?p has to be a pan Missing precondition: ?s has to be a stove burner ) Effects: (and (not (pickupable ?f)) (food-heated-pan ?f) ) ---------------------------------------- Corrective feedback: There are some errors in the PDDL model: 1. There is an inaccurate object type, a pan ?p can be viewed as a small receptacle 35 Page 36: 2. There is a syntax error: the second parameter of `object-on `should be a furnitureAppliance, but a householdObject was given. Please use the correct predicate or devise new one(s) if needed,→ ,→ 3. Incorrect precondition: ?f needs not be pickupable 4. There is a missing precondition: ?p has to be a pan 5. There is a missing precondition: ?s has to be a stove burner 6. There is an incorrect precondition: a stove or burner ?s doesn 't need to have a flat surface Please revise the PDDL model (and the list of predicates if needed) to fix the above errors (and other potentially similar errors). ,→ Parameters: ---------------------------------------- GPT-4: 1. ?f - householdObject: the food to be heated 2. ?p - smallReceptacle: the pan to be used for heating the food 3. ?s - furnitureAppliance: the stove burner on which the pan is placed Preconditions: (and (robot-at ?s) (object-in ?f ?p) (object-on ?p ?s) (pan ?p) (stove-burner ?s) (robot-hand-empty) ) Effects: (and (not (pickupable ?f)) (food-heated-pan ?f) ) New Predicates: 1. (pan ?x - smallReceptacle): true if the small receptacle ?x is a pan 2. (stove-burner ?x - furnitureAppliance): true if the furniture or appliance ?x is a stove burner 3. (object-in ?x - householdObject ?y - smallReceptacle): true if the object ?x is in the small receptacle ?y ,→ ---------------------------------------- Automated corrective feedback: The following predicate(s) have the same name(s) as existing predicate(s): 1. (object-in ?x - householdObject ?y - smallReceptacle), true if the object ?x is in the small receptacle ?y; existing predicate with the same name: (object-in ?x - householdObject ?z - smallReceptacle), true if the object ?x is in the small receptacle ?z,→ ,→ You should reuse existing predicates whenever possible. If you are reusing existing predicate(s), you shouldn 't list them under 'New Predicates '. If existing predicates are not enough and you are devising new predicate(s), please use names that are different from existing ones.,→ ,→ Please revise the PDDL model to fix this error. Parameters: ---------------------------------------- GPT-4: 1. ?f - householdObject: the food to be heated 2. ?p - smallReceptacle: the pan to be used for heating the food 3. ?s - furnitureAppliance: the stove burner on which the pan is placed Preconditions: (and (robot-at ?s) (object-in ?f ?p) (object-on ?p ?s) (pan ?p) (stove-burner ?s) (robot-hand-empty) ) Effects: (and (not (pickupable ?f)) (food-heated-pan ?f) ) New Predicates: 1. (pan ?x - smallReceptacle): true if the small receptacle ?x is a pan 2. (stove-burner ?x - furnitureAppliance): true if the furniture or appliance ?x is a stove burner 36 Page 37: A.7.2 Slice an object GPT-4: Slice an object User: ... The original prompt for PDDL construction ... Parameters: ------------------------------------- GPT-4: Parameters: 1. ?x - householdObject: the object to slice 2. ?k - householdObject: the knife to use for slicing 3. ?y - furnitureAppliance: the furniture or appliance with a flat surface for manipulation Incorrect object type, a cutting board should be viewed as a small receptacle as mentioned in the prompt 4. ?z - householdObject: the cutting board on which the object is placed Preconditions: (and (robot-at ?y) Incorrect precondition: ?x should be on the cutting board not directly on the furniture ?y (object-on ?x ?y) (object-on ?z ?y) (robot-holding ?k) (sliceable ?x) (cutting-board ?z) (knife ?k) (flat-surface ?y) (object-whole ?x) ) Effects: (and (not (object-whole ?x)) (object-sliced ?x) ) ---------------------------------------- Corrective feedback: There are some errors in the PDDL model: 1. There is an incorrect object type: a cutting board should be viewed as a small receptacle as mentioned in the prompt ,→ 2. There is an incorrect precondition: ?x should be on the cutting board not directly on the furniture ?y ,→ Please revise the PDDL model (and the list of predicates if needed) to fix the above errors (and other potentially similar errors). ,→ Parameters: ---------------------------------------- GPT-4: 1. ?x - householdObject: the object to slice 2. ?k - householdObject: the knife to use for slicing 3. ?y - furnitureAppliance: the furniture or appliance with a flat surface for manipulation 4. ?z - smallReceptacle: the cutting board on which the object is placed Preconditions: (and (robot-at ?y) (object-in ?x ?z) (object-on ?z ?y) (robot-holding ?k) (sliceable ?x) (cutting-board ?z) (knife ?k) (flat-surface ?y) (object-whole ?x) ) Effects: (and (not (object-whole ?x)) (object-sliced ?x) ) New Predicates: No newly defined predicate 37 Page 38: A.7.3 GPT-3.5-Turbo trying to correct the action model of "slice an object" GPT-3.5-Turbo trying to correct the action model of "slice an object" User: ... The original prompt for PDDL construction ... Parameters: ------------------------------------- GPT-4: Parameters: 1. ?x - householdObject: the object to slice 2. ?k - householdObject: the knife to use for slicing 3. ?y - furnitureAppliance: the furniture or appliance with a flat surface for manipulation Incorrect object type, a cutting board should be viewed as a small receptacle as mentioned in the prompt 4. ?z - householdObject: the cutting board on which the object is placed Preconditions: (and (robot-at ?y) Incorrect precondition: ?x should be on the cutting board not directly on the furniture ?y (object-on ?x ?y) (object-on ?z ?y) (robot-holding ?k) (sliceable ?x) (cutting-board ?z) (knife ?k) (flat-surface ?y) (object-whole ?x) ) Effects: (and (not (object-whole ?x)) (object-sliced ?x) ) ---------------------------------------- Corrective feedback: There are some errors in the PDDL model: 1. There is an incorrect object type: a cutting board should be viewed as a small receptacle as mentioned in the prompt ,→ 2. There is an incorrect precondition: ?x should be on the cutting board not directly on the furniture ?y ,→ Please revise the PDDL model (and the list of predicates if needed) to fix the above errors (and other potentially similar errors). ,→ Parameters: ---------------------------------------- GPT-3.5-Turbo: 1. ?x - householdObject: the object to slice 2. ?k - householdObject: the knife to use for slicing 3. ?y - furnitureAppliance: the furniture or appliance with a flat surface for manipulation 4. ?z - smallReceptacle: the cutting board on which the object is placed Preconditions: (and (robot-at ?y) (object-on ?z ?y) (object-in ?x ?z) (robot-holding ?k) (sliceable ?x) Should use the predicate (cutting-board ?z) (is-small-receptacle ?z) (knife ?k) (flat-surface ?y) (object-whole ?x) ) Effects: (and (not (object-whole ?x)) (object-sliced ?x) ) 38 Page 39: A.8 Logistics: Constructing PDDL Models The AI agent here is a logistics planner that has to plan to transport packages within the locations in a city through a truck and between cities through an airplane. Within a city, the locations are directly linked, allowing trucks to travel between any two of these locations. Similarly, cities are directly connected to each other allowing airplanes to travel between any two cities. Each city is equipped with one truck and has a designated location that functions as an airport. There are five types of objects: package, truck, plane, location, and city. There are multiple cities and each city can have multiple locations. Also, there is no limit to how many packages a truck or plane can carry (so in theory a truck or plane can carry an infinite number of packages). A.8.1 Load a package into a truck Action description This action enables the agent to load a package into a truck. For example, load a package_1 into a truck_1. ,→ GPT-4: Load a package into a truck Parameters: 1. ?p - package: the package to load 2. ?t - truck: the truck to load the package into 3. ?l - location: the location where the package and truck are Preconditions: (and (package-at ?p ?l) (truck-at ?t ?l) ) Effects: (and (not (package-at ?p ?l)) (package-in-truck ?p ?t) ) A.8.2 Unload a package from a truck Action description This action enables the agent to unload a package from a truck. For example, unload a package_1 from a truck_1. ,→ GPT-4: Unload a package from a truck Parameters: 1. ?p - package: the package to unload 2. ?t - truck: the truck from which the package will be unloaded 3. ?l - location: the location where the package will be unloaded Preconditions: (and (package-in-truck ?p ?t) (truck-at ?t ?l) ) Effects: (and (not (package-in-truck ?p ?t)) (package-at ?p ?l) ) 39 Page 40: A.8.3 Load a package into an airplane Action description This action enables the agent to load a package into an airplane. For example, load a package_1 into an airplane_1. ,→ GPT-4: Load a package into an airplane Parameters: 1. ?p - package: the package to load into the airplane 2. ?a - plane: the airplane to load the package into 3. ?l - location: the location where the package and airplane are present Preconditions: (and (package-at ?p ?l) (plane-at ?a ?l) (not (package-in-plane ?p ?a)) ) Effects: (and (not (package-at ?p ?l)) (package-in-plane ?p ?a) ) A.8.4 Unload a package from an airplane Action description This action enables the agent to unload a package from an airplane. For example, unload a package_1 from an airplane_1. ,→ GPT-4: Unload a package from an airplane Parameters: 1. ?p - package: the package to unload 2. ?a - plane: the airplane from which the package will be unloaded 3. ?l - location: the location where the airplane is and the package will be unloaded Preconditions: (and (package-in-plane ?p ?a) (plane-at ?a ?l) ) Effects: (and (not (package-in-plane ?p ?a)) (package-at ?p ?l) ) A.8.5 Drive a truck from one location to another in a city Action description This action enables the agent to drive a truck from one location to another in a city. For example, drive a truck_1 from location_1 to location_2 in city_1. ,→ GPT-4: Drive a truck from one location to another in a city Parameters: 1. ?t - truck: the truck to be driven 2. ?from - location: the starting location of the truck 3. ?to - location: the destination location of the truck 4. ?c - city: the city in which the truck is being driven Preconditions: 40 Page 41: (and (truck-at ?t ?from) (location-in-city ?from ?c) (location-in-city ?to ?c) (not (= ?from ?to)) ) Effects: (and (not (truck-at ?t ?from)) (truck-at ?t ?to) (move-packages ?t ?from ?to) ) A.8.6 Fly an airplane from one city to another Action description This action enables the agent to fly an airplane from one city 's airport to another. The airports are locations in the city. For example, fly an airplane_1 from location_0 to location_1. ,→ GPT-4: Fly an airplane from one city to another Parameters: 1. ?a - plane: the airplane to fly 2. ?from - location: the departure location (airport) in the origin city 3. ?to - location: the destination location (airport) in the destination city 4. ?c1 - city: the origin city 5. ?c2 - city: the destination city Preconditions: (and (plane-at ?a ?from) (location-in-city ?from ?c1) (location-in-city ?to ?c2) (not (= ?c1 ?c2)) Missing precondition: ?from should be an airport Missing precondition: ?to should be an airport ) Effects: (and (not (plane-at ?a ?from)) (plane-at ?a ?to) (move-packages-plane ?a ?from ?to) ) A.8.7 Examples of PDDL action models constructed by GPT-3.5-Turbo GPT-3.5-Turbo: Fly an airplane from one city to another Parameters: 1. ?a - plane: the airplane to fly 2. ?c1 - city: the city to fly from 3. ?c2 - city: the city to fly to 4. ?l1 - location: the airport location in ?c1 5. ?l2 - location: the airport location in ?c2 Preconditions: (and Incorrect predicate design (at ?a ?l1) (in-city ?l1 ?c1) (in-city ?l2 ?c2) (not (= ?c1 ?c2)) Missing precondition: ?l1 should be an airport Missing precondition: ?l2 should be an airport ) Effects: (and Incorrect predicate design 41 Page 42: (not (at ?a ?l1)) Incorrect predicate design (at ?a ?l2) ) GPT-3.5-Turbo: Unload a package from a truck Parameters: 1. ?p - package: the package to unload 2. ?t - truck: the truck to unload from Missing parameter: ?l, the location of the truck Preconditions: (and Syntax error and incorrect predicate design (at ?t - truck ?l - location) Syntax error and incorrect predicate design (at ?p - package ?t - truck) ) Effects: (and Syntax error (at ?p - package ?l - location) Syntax error (not (at ?p - package ?t - truck)) Syntax error and incorrect effect (not (truck-empty ?t - truck)) ) A.8.8 The initial set of predicates extracted by GPT-4 The initial set of predicates extracted by GPT-4 1. (package-at ?p - package ?l - location): true if the package ?p is at the location ?l 2. (truck-at ?t - truck ?l - location): true if the truck ?t is at the location ?l 3. (package-in-truck ?p - package ?t - truck): true if the package ?p is in the truck ?t 4. (plane-at ?a - plane ?l - location): true if the airplane ?a is at the location ?l 5. (location-in-city ?l - location ?c - city): true if the location ?l is in the city ?c 6. (package-in-plane ?p - package ?a - plane): true if the package ?p is in the airplane ?a 7. (move-packages ?t - truck ?from - location ?to - location): true if all packages in the truck ?t are moved from location ?from to location ?to. This predicate is used to represent the effect of moving all packages in the truck when the truck is driven from one location to another.,→ ,→ ,→ 8. (move-packages-plane ?a - plane ?from - location ?to - location): true if all packages in the airplane ?a are moved from location ?from to location ?to. This predicate is used to represent the effect of moving all packages in the airplane when the airplane is flown from one location to another,→ ,→ ,→ A.9 Logistics: Correcting PDDL Models A.9.1 Fly an airplane from one city to another GPT-4: Fly an airplane from one city to another User: ... The original prompt for PDDL construction ... Parameters: ------------------------------------- GPT-4: Parameters: 1. ?a - plane: the airplane to fly 2. ?from - location: the departure location (airport) in the origin city 3. ?to - location: the destination location (airport) in the destination city 4. ?c1 - city: the origin city 5. ?c2 - city: the destination city Preconditions: (and (plane-at ?a ?from) 42 Page 43: (location-in-city ?from ?c1) (location-in-city ?to ?c2) (not (= ?c1 ?c2)) Missing precondition: ?from should be an airport Missing precondition: ?to should be an airport ) Effects: (and (not (plane-at ?a ?from)) (plane-at ?a ?to) (move-packages-plane ?a ?from ?to) ) ---------------------------------------- Corrective feedback: There is an error in the PDDL model: 1. The current action model allows the plane to fly from any location to another location in a different city, but a plane can only land at airports ,→ Please revise the PDDL model (and the list of predicates if needed) to fix the above errors (and other potentially similar errors). ,→ Parameters: ---------------------------------------- GPT-4: 1. ?a - plane: the airplane to fly 2. ?from - location: the departure location (airport) in the origin city 3. ?to - location: the destination location (airport) in the destination city 4. ?c1 - city: the origin city 5. ?c2 - city: the destination city Preconditions: (and (plane-at ?a ?from) (location-in-city ?from ?c1) (location-in-city ?to ?c2) (not (= ?c1 ?c2)) (airport ?from) (airport ?to) ) Effects: (and (not (plane-at ?a ?from)) (plane-at ?a ?to) (move-packages-plane ?a ?from ?to) ) New Predicates: 1. (airport ?l - location): true if the location ?l is an airport A.9.2 GPT-3.5-Turbo trying to correct the action model of "fly an airplane from one city to another" GPT-3.5-Turbo trying to correct the action model of "fly an airplane from one city to another" User: ... The original prompt for PDDL construction ... Parameters: ------------------------------------- GPT-4: Parameters: 1. ?a - plane: the airplane to fly 2. ?from - location: the departure location (airport) in the origin city 3. ?to - location: the destination location (airport) in the destination city 4. ?c1 - city: the origin city 5. ?c2 - city: the destination city Preconditions: (and (plane-at ?a ?from) (location-in-city ?from ?c1) (location-in-city ?to ?c2) (not (= ?c1 ?c2)) 43 Page 44: Missing precondition: ?from should be an airport Missing precondition: ?to should be an airport ) Effects: (and (not (plane-at ?a ?from)) (plane-at ?a ?to) (move-packages-plane ?a ?from ?to) ) ---------------------------------------- Corrective feedback: There is an error in the PDDL model: 1. The current action model allows the plane to fly from any location to another location in a different city, but a plane can only land at airports ,→ Please revise the PDDL model (and the list of predicates if needed) to fix the above errors (and other potentially similar errors). ,→ Parameters: ---------------------------------------- GPT-3.5-Turbo: 1. ?a - plane: the airplane to fly 2. ?from - location: the departure location (airport) in the origin city 3. ?to - location: the destination location (airport) in the destination city 4. ?c1 - city: the origin city 5. ?c2 - city: the destination city Preconditions: (and (plane-at ?a ?from) (location-in-city ?from ?c1) (location-in-city ?to ?c2) (not (= ?c1 ?c2)) (airport ?to) Missing precondition: ?from should be an airport ) Effects: (and (not (plane-at ?a ?from)) (plane-at ?a ?to) (move-packages-plane ?a ?from ?to) ) New Predicates: 1. (airport ?l - location): true if the location ?l is an airport. A.10 Tyreworld: Constructing PDDL Models The AI agent here is a robot that has to replace a flat tyre with a spare one. This involves fetching the tools (i.e., wrench, jack, pump) from the boot, undoing the nuts on the flat tyre, jacking up the (appropriate) hub(s), removing the tyre, doing up the spare one, etc. There are three major object types: ‘small_object’, ‘container’ and ‘hub’. The object type ‘small_object’ covers tools, wheels and nuts. The ‘small_object’ object type has three subtypes: ‘tool’, ‘wheel’ and ‘nut’. The subtype ‘tool’ covers tools like wrenches, jacks, pumps etc. The subtype ‘wheel’ covers various wheels. The subtype ‘nut’ covers various nuts. The object type ‘container’ covers storage spaces like the boot in a car. The object type ‘hub’ covers the hubs in the wheels of the car. Note that there is no restriction on how many objects the AI agent (i.e., the robot) can carry. Also note that each hub has only one nut. A.10.1 Open a container Action description This action enables the robot to open a container. For example, the robot can open a boot. GPT-4: Open a container Parameters: 1. ?c - container: the container to open 44 Page 45: Preconditions: (and (container-closed ?c) ) Effects: (and (not (container-closed ?c)) (container-open ?c) ) A.10.2 Close a container Action description This action enables the robot to close a container. For example, the robot can close a boot. GPT-4: Close a container Parameters: 1. ?c - container: the container to close Preconditions: (and (container-open ?c) ) Effects: (and (not (container-open ?c)) (container-closed ?c) ) A.10.3 Fetch an object from a container Action description This action enables the robot to take an object from a container. For example, the robot can fetch jack from the boot or the robot can fetch wheel_1 from the boot. ,→ GPT-4: Fetch an object from a container Parameters: 1. ?x - small_object: the object to fetch from the container 2. ?c - container: the container from which the object is fetched Preconditions: (and (container-open ?c) (object-in-container ?x ?c) ) Effects: (and (not (object-in-container ?x ?c)) (robot-holding ?x) ) A.10.4 Put an object into a container Action description This action enables the robot to put an object into a container. For example, the robot can put away jack into the boot or the robot can put away wheel_1 into the boot. ,→ 45 Page 46: GPT-4: Put an object into a container Parameters: 1. ?x - small_object: the object to put into the container 2. ?c - container: the container to put the object into Preconditions: (and (robot-holding ?x) (container-open ?c) ) Effects: (and (not (robot-holding ?x)) (object-in-container ?x ?c) ) A.10.5 Loosen a nut in a hub Action description This action enables the robot to loosen a nut in a hub. For example, the robot can loosen nut_1 in hub_1. To safely perform this action, the hub needs to be on the ground (i.e., not jacked up). ,→ GPT-4: Loosen a nut in a hub Parameters: 1. ?n - nut: the nut to be loosened 2. ?h - hub: the hub where the nut is located 3. ?w - tool: the wrench to be used for loosening the nut Preconditions: (and (is-nut ?n) (is-hub ?h) (is-wrench ?w) (nut-in-hub ?n ?h) (hub-on-ground ?h) (robot-holding ?w) ) Effects: (and (nut-loose ?n) ) A.10.6 Tighten a nut in a hub Action description This action enables the robot to tighten a nut in a hub. For example, the robot can tighten nut_1 in hub_1. To safely perform this action, the hub needs to be on the ground (i.e., not jacked up).,→ ,→ GPT-4: Tighten a nut in a hub Parameters: 1. ?n - nut: the nut to tighten 2. ?h - hub: the hub where the nut is to be tightened 3. ?w - tool: the wrench to use for tightening the nut Preconditions: (and (is-nut ?n) (is-hub ?h) (is-wrench ?w) (robot-holding ?w) (nut-in-hub ?n ?h) (hub-on-ground ?h) (nut-loose ?n) 46 Page 47: ) Effects: (and (not (nut-loose ?n)) ) A.10.7 Jack up a hub Action description This action enables the robot to jack up a hub. For example, the robot can jack up hub_1. GPT-4: Jack up a hub Parameters: 1. ?h - hub: the hub to jack up 2. ?j - small_object: the jack to use for jacking up the hub Preconditions: (and (is-hub ?h) (hub-on-ground ?h) (is-jack ?j) (robot-holding ?j) ) Effects: (and (not (hub-on-ground ?h)) (hub-jacked-up ?h) ) A.10.8 Jack down a hub Action description This action enables the robot to jack down a hub. For example, the robot can jack down hub_1. GPT-4: Jack down a hub Parameters: 1. ?h - hub: the hub to jack down 2. ?j - small_object: the jack used to jack down the hub Preconditions: (and (is-hub ?h) (is-jack ?j) (robot-holding ?j) (hub-jacked-up ?h) ) Effects: (and (not (hub-jacked-up ?h)) (hub-on-ground ?h) ) A.10.9 Unfasten a hub Action description This action enables the robot to unfasten a hub and remove the nut on it. For example, the robot can unfasten hub_1 that has nut_1. At the end of this action, the hub should be unfastened, and the nut should be detached from the hub and held by the robot. To safely perform this action, the hub needs to be jacked up.,→ ,→ ,→ 47 Page 48: GPT-4: Unfasten a hub Parameters: 1. ?h - hub: the hub to unfasten 2. ?n - nut: the nut to detach from the hub Preconditions: (and (is-hub ?h) (is-nut ?n) (nut-in-hub ?n ?h) (nut-loose ?n) (hub-jacked-up ?h) Incorrect precondition: the robot should not hold the nut if the nut is in the hub (robot-holding ?n) ) Effects: (and (not (nut-in-hub ?n ?h)) (hub-unfastened ?h) Missing effect: the robot should hold the nut at the end ) A.10.10 Fasten a hub Action description This action enables the robot to fasten a hub by installing a nut on it. For example, the robot can fasten hub_1 that has nut_1. Note that the robot needs to install a nut into the hub, but the robot is not required to fasten the nut in this operation. To safely perform this action, the hub needs to be jacked up.,→ ,→ ,→ GPT-4: Fasten a hub Parameters: 1. ?h - hub: the hub to fasten 2. ?n - nut: the nut to install in the hub Preconditions: (and (is-hub ?h) (is-nut ?n) (robot-holding ?n) (hub-jacked-up ?h) (hub-unfastened ?h) Unnecessary precondition (not (nut-in-hub ?n ?h)) ) Effects: (and (not (robot-holding ?n)) (nut-in-hub ?n ?h) (not (hub-unfastened ?h)) Missing effect: the nut should be loose ) A.10.11 Remove wheel from hub Action description This action enables the robot to remove a wheel from a hub. For example, the robot can remove wheel_1 from hub_1. Note that the hub needs to be unfastened in order to remove the wheel. To safely perform this action, the hub needs to be jacked up.,→ ,→ GPT-4: Remove wheel from hub Parameters: 1. ?w - wheel: the wheel to remove 2. ?h - hub: the hub from which the wheel is to be removed 48 Page 49: Preconditions: (and (is-wheel ?w) (is-hub ?h) (wheel-on-hub ?w ?h) (hub-jacked-up ?h) (hub-unfastened ?h) ) Effects: (and (not (wheel-on-hub ?w ?h)) (robot-holding ?w) ) A.10.12 Put wheel on hub Action description This action enables the robot to put a wheel on an unfastened hub. For example, the robot can put wheel_1 on hub_1. Note that the hub needs to be unfastened in order to put on the wheel. To safely perform this action, the hub needs to be jacked up.,→ ,→ GPT-4: Put wheel on hub Parameters: 1. ?w - wheel: the wheel to put on the hub 2. ?h - hub: the hub to put the wheel on Preconditions: (and (robot-holding ?w) (is-wheel ?w) (is-hub ?h) (hub-jacked-up ?h) (hub-unfastened ?h) (not (wheel-on-hub ?w ?h)) ) Effects: (and (not (robot-holding ?w)) (wheel-on-hub ?w ?h) ) A.10.13 Inflate wheel Action description This action enables the robot to inflate a wheel. For example, the robot can inflate wheel_1. Note that you need to make sure the wheel is intact. ,→ GPT-4: Inflate wheel Parameters: 1. ?w - wheel: the wheel to inflate Preconditions: (and (is-wheel ?w) (robot-holding ?w) (wheel-deflated ?w) (wheel-intact ?w) Missing precondition: the robot should have a pump ) Effects: (and (not (wheel-deflated ?w)) (wheel-inflated ?w) ) 49 Page 50: A.10.14 Examples of PDDL action models constructed by GPT-3.5-Turbo GPT-3.5-Turbo: Open a containe Parameters: 1. ?x - container: the container to open Preconditions: (and (not (container-open ?x)) Incorrect precondition and unknown parameter (robot-holding ?y) Incorrect precondition and syntax error (tool-type ?y "key") Incorrect precondition and syntax error (not (hub-jacked-up ?z)) ) Effects: (and (container-open ?x) Incorrect effect (not (robot-holding ?y)) ) GPT-3.5-Turbo: Loosen a nut in a hub Parameters: 1. ?x - nut: the nut to loosen 2. ?y - hub: the hub containing the nut Missing parameter: ?z, the wrench Preconditions: (and (not (hub-jacked-up ?y)) (robot-holding ?z) Syntax error (tool-type ?z wrench) (nut-in-hub ?x ?y) (not (nut-loosened ?x)) Irrelevant precondition (not (wheel-detached ?w)) Irrelevant precondition (wheel-attached ?w ?y) Irrelevant precondition (wheel-intact ?w) Irrelevant precondition (not (wheel-deflated ?w)) Irrelevant precondition (object-in-container ?z ?c) Irrelevant precondition (container-open ?c) ) Effects: (and (nut-loosened ?x) ) A.10.15 The initial set of predicates extracted by GPT-4 The initial set of predicates extracted by GPT-4 1. (container-closed ?c - container): true if the container ?c is closed 2. (container-open ?c - container): true if the container ?c is open 3. (object-in-container ?x - small_object ?c - container): true if the small_object ?x is inside the container ?c ,→ 4. (robot-holding ?x - small_object): true if the robot is holding the small_object ?x 5. (is-wrench ?x - small_object): true if the small_object ?x is a wrench 6. (is-nut ?x - small_object): true if the small_object ?x is a nut 7. (is-hub ?x - hub): true if the object ?x is a hub 8. (nut-in-hub ?n - nut ?h - hub): true if the nut ?n is in the hub ?h 9. (hub-on-ground ?h - hub): true if the hub ?h is on the ground (not jacked-up) 10. (nut-loose ?n - nut): true if the nut ?n is loosened 50 Page 51: 11. (is-jack ?x - small_object): true if the small_object ?x is a jack 12. (hub-jacked-up ?h - hub): true if the hub ?h is jacked up (not on the ground) 13. (hub-unfastened ?h - hub): true if the hub ?h is unfastened 14. (is-wheel ?x - small_object): true if the small_object ?x is a wheel 15. (wheel-on-hub ?w - wheel ?h - hub): true if the wheel ?w is on the hub ?h 16. (is-pump ?x - small_object): true if the small_object ?x is a pump 17. (wheel-intact ?w - wheel): true if the wheel ?w is intact 18. (wheel-deflated ?w - wheel): true if the wheel ?w is deflated 19. (wheel-inflated ?w - wheel): true if the wheel ?w is inflated A.11 Tyreworld: Correcting PDDL Models A.11.1 Unfasten a hub GPT-4: Unfasten a hub User: ... The original prompt for PDDL construction ... Parameters: ------------------------------------- GPT-4: Parameters: 1. ?h - hub: the hub to unfasten 2. ?n - nut: the nut to detach from the hub Preconditions: (and (is-hub ?h) (is-nut ?n) (nut-in-hub ?n ?h) (nut-loose ?n) (hub-jacked-up ?h) Incorrect precondition: the robot should not hold the nut if the nut is in the hub (robot-holding ?n) ) Effects: (and (not (nut-in-hub ?n ?h)) (hub-unfastened ?h) Missing effect: the robot should hold the nut at the end ) ---------------------------------------- Corrective feedback: There are some errors in the PDDL model: 1. There is an incorrect precondition: the robot is not holding the nut because the nut is in the hub ,→ 2. There is a missing effect: the robot should hold the nut at the end Please revise the PDDL model (and the list of predicates if needed) to fix the above errors (and other potentially similar errors). ,→ Parameters: ---------------------------------------- GPT-4: 1. ?h - hub: the hub to unfasten 2. ?n - nut: the nut to detach from the hub Preconditions: (and (is-hub ?h) (is-nut ?n) (nut-in-hub ?n ?h) (nut-loose ?n) (hub-jacked-up ?h) ) Effects: (and (not (nut-in-hub ?n ?h)) (hub-unfastened ?h) (robot-holding ?n) ) 51 Page 52: A.11.2 Inflate wheel GPT-4: Inflate wheel User: ... The original prompt for PDDL construction ... Parameters: ------------------------------------- GPT-4: Parameters: 1. ?w - wheel: the wheel to inflate Preconditions: (and (is-wheel ?w) (robot-holding ?w) (wheel-deflated ?w) (wheel-intact ?w) Missing precondition: the robot should have a pump ) Effects: (and (not (wheel-deflated ?w)) (wheel-inflated ?w) ) ---------------------------------------- Corrective feedback: There is an error in the PDDL model: 1. You allow the robot to inflate a wheel without having a pump, this is incorrect. Please revise the PDDL model (and the list of predicates if needed) to fix the above errors (and other potentially similar errors). ,→ Parameters: ---------------------------------------- GPT-4: 1. ?w - wheel: the wheel to inflate 2. ?p - pump: the pump to inflate the wheel Preconditions: (and (is-wheel ?w) (is-pump ?p) (robot-holding ?w) (robot-holding ?p) (wheel-deflated ?w) (wheel-intact ?w) ) Effects: (and (not (wheel-deflated ?w)) (wheel-inflated ?w) ) New Predicates: No newly defined predicate ---------------------------------------- Automated corrective feedback: There is an error in the PDDL model: 1. There is an unknown object type `pump `for the parameter ?p Please revise the PDDL model (and the list of predicates if needed) to fix the above errors (and other potentially similar errors). ,→ Parameters: ---------------------------------------- GPT-4: 1. ?w - wheel: the wheel to inflate 2. ?p - small_object: the pump to inflate the wheel Preconditions: (and (is-wheel ?w) (is-pump ?p) 52 Page 53: (robot-holding ?w) (robot-holding ?p) (wheel-deflated ?w) (wheel-intact ?w) ) Effects: (and (not (wheel-deflated ?w)) (wheel-inflated ?w) ) A.12 LLM planners back-prompted by V AL using LLM-acquired PDDL model A.12.1 Examples of prompts for LLM planners An example prompt for the Household domain You are in a household to complete a certain task for the owners. You are a household robot that can navigate to various large and normally immovable furniture pieces or appliances in the house to carry out household tasks. Note that you have only one gripper, so (a) it can only hold one object, (b) it shouldn 't hold any other irrelevant objects in its gripper while performing some manipulation tasks (e.g., opening a drawer or closing a window), (c) operations on small household items should be carried out on furniture with a flat surface to get enough space for manipulation.,→ ,→ ,→ ,→ ,→ ,→ The following actions are available to you, and you should strictly follow the output format (demonstrated in the "example output") of each action: ,→ 1. navigate from a furniture piece or appliance ?x to another ?y (example output: go from dining_table_1 to fridge_1). ,→ Parameters: (a) ?x: the furniture or appliance the robot is currently at; (b) ?y: the furniture or appliance the robot wants to navigate to. ,→ Preconditions: (a) the robot is at the furniture or appliance ?x; (b) the furniture or appliance ?x is not equal to ?y. ,→ Effects: (a) the robot is not at the furniture or appliance ?x; (b) the robot is at the furniture or appliance ?y. ,→ 2. pick up an object ?x in/on a furniture piece or appliance ?y (example output: pick up apple_1 in/on side_table_1). ,→ Parameters: (a) ?x: the object to pick up; (b) ?y: the furniture or appliance where the object is located. ,→ Preconditions: (a) the robot is at the furniture or appliance ?y; (b) the object ?x is on or in the furniture or appliance ?y; (c) the object ?x can be picked up by the robot; (d) either the object ?x cannot be stacked on top of other household items, or if it can be stacked, it is not stacked on top of and not under any another household object; (e) the robot 's gripper is empty; (f) either the furniture or appliance ?y cannot be opened, or it is already opened.,→ ,→ ,→ ,→ Effects: (a) the object ?x is not on or in the furniture or appliance ?y; (b) the robot is holding the object ?x; (c) the robot 's gripper is not empty. ,→ ... Here are two examples. Example 1 The house contains the following large furniture pieces and appliances: drawer_1, fridge_1, dining_table_1, countertop_1, stove_burner_1, cabinet_1, microwave_1, sink_basin_1, dish_washer_1, light_switch_1, garbage_can_1.,→ ,→ The current states of the household objects are as follows: 1. drawer_1 is closed 2. fridge_1 is closed ... 17. robot is not holding anything Your task is to: put an orange in a fridge. Your plan is: Step 1: > think: First, I need to open the fridge. > action: go from dining_table_1 to fridge_1 Step 2: > action: open furniture/appliance fridge_1 Step 3: > think: Now I need to find an orange. In the current state, orange_1 is in drawer_1 and drawer_1 is closed. I need to get the orange and then put it in the fridge. ,→ > action: go from fridge_1 to drawer_1 Step 4: 53 Page 54: > action: open furniture/appliance drawer_1 Step 5: > action: pick up orange_1 in/on drawer_1 Step 6: > action: go from drawer_1 to fridge_1 Step 7: > action: put orange_1 in/on fridge_1 Step 8: > action: close furniture/appliance fridge_1 Example 2 ... Here is the task. The house contains the following large furniture pieces and appliances: drawer_1, drawer_2, ..., dish_washer_1, garbage_can_1. Note that cutting boards are viewed as small receptacles, and blenders are small appliances (and thus small receptacles).,→ ,→ The current states of the household objects are as follows: 1. drawer_1 is opened 2. drawer_2 is opened ... 71. robot is holding mug_1 Your task is to: put apple_2 on side_table_2. Your plan is: An example prompt for the Logistics domain You are a logistics planner that has to plan to transport packages within the locations in a city through a truck and between cities through an airplane. Within a city, the locations are directly linked, allowing trucks to travel between any two of these locations. Similarly, cities are directly connected to each other allowing airplanes to travel between any two cities. Each city is equipped with one truck and has a designated location that functions as an airport. There are five types of objects: package, truck, plane, location, and city. There are multiple cities and each city can have multiple locations. Also, there is no limit to how many packages a truck or plane can carry (so in theory a truck or plane can carry an infinite number of packages).,→ ,→ ,→ ,→ ,→ ,→ ,→ ,→ The following actions are available to you, and you should strictly follow the output format (demonstrated in the "example output") of each action: ,→ 1. load a package ?p into a truck ?t at location ?l (example output: load package_1 into a truck truck_1 at location_0). ,→ Parameters: (a) ?p: the package to load; (b) ?t: the truck to load the package into; (c) ?l: the location where the package and truck are. ,→ Preconditions: (a) the package ?p is at the location ?l; (b) the truck ?t is at the location ?l. Effects: (a) the package ?p is not at the location ?l; (b) the package ?p is in the truck ?t. 2. unload a package ?p from a truck ?t at location ?l (example output: unload package_1 from a truck truck_1 at location_1). ,→ Parameters: (a) ?p: the package to unload; (b) ?t: the truck from which the package will be unloaded; (c) ?l: the location where the package will be unloaded. ,→ Preconditions: (a) the package ?p is in the truck ?t; (b) the truck ?t is at the location ?l. Effects: (a) the package ?p is not in the truck ?t; (b) the package ?p is at the location ?l. ... Here are two examples. Example 1 The environment contains the following objects: plane_0 (type: plane), city_0 (type: city), truck_0 (type: truck), location_0 (type: location), location_1 (type: location), package_0 (type: package).,→ ,→ The current states of the objects are as follows: 1. location_0 is in city city_0 2. location_1 is in city city_0 3. truck_0 is at location location_0 4. plane_0 is at location location_0 5. package_0 is at location location_1 6. location_0 is an airport Your task is to: Transport package package_0 to location location_0 Your plan is: 54 Page 55: Step 1: > think: First, I need to drive the truck to the location of the package and load the package. > action: drive truck_0 from location_0 to location_1 in city_0 Step 2: > action: load package_0 into a truck truck_0 at location_1 Step 3: > think: Now I need to drive the truck to the destination and unload the package. > action: drive truck_0 from location_1 to location_0 in city_0 Step 4: > action: unload package_0 from a truck truck_0 at location_0 Example 2 ... Here is the task. The environment contains the following objects: plane_0 (type: plane), city_0 (type: city), city_1 (type: city), truck_0 (type: truck), truck_1 (type: truck), location_0 (type: location), location_1 (type: location), location_2 (type: location), package_0 (type: package),→ ,→ The current states of the objects are as follows: 1. plane_0 is at location location_1 2. truck_0 is at location location_2 3. truck_1 is at location location_0 4. location_0 is in city city_1 5. location_0 is an airport 6. location_1 is in city city_0 7. location_1 is an airport 8. location_2 is in city city_0 9. package_0 is at location location_0 Your task is to: Transport package package_0 to location location_2. Your plan is: A.12.2 Translation to admissible actions We extract the plan from the LLM planner as a completion to the prompt. Although the prompt explicitly states that the planner should strictly adhere to the demonstrated output format of each action, there are still cases where GPT-4 fails to follow these formats. As a result, extra post-processing is needed to translate the generated actions into admissible actions. To do this, we utilize another LLM (i.e., GPT-4) to perform the translation. In cases where certain actions have missing parameters, we present an error message to the LLM planner and ask it to regenerate the plan. The error message is structured as follows: " There is an invalid output at step x. Please strictly follow the output format provided in the example output of each action. Your revised plan: ". We use a fixed prompt template for action translation in all the domains: Template of the prompt for action translation Your task is to translate free-form action description to its corresponding structural format based on its semantic meaning. The structural output format is given in the example output of each candidate action. You should strictly follow the structural output format. When there is missing information in the free-form action description, you should output "Not able to translate due to missing information".,→ ,→ ,→ ,→ Here are 4 examples. Example 1: Action description: put bowl_1 with sliced orange_1 in/on dining_table_1 Candidate actions: 1. put an object on or in a furniture piece or an appliance, example output: "put apple_1 in/on side_table_1" ,→ 2. puts an object onto or into a small receptacle, example output: "put apple_1 in/on plate_1 at countertop_1" ,→ Translated action: put bowl_1 in/on side_table_1 Example 2: Action description: put sliced orange_1 in/on bowl_1 at drawer_1 Candidate actions: 1. put an object on or in a furniture piece or an appliance, example output: "put apple_1 in/on side_table_1" ,→ 2. puts an object onto or into a small receptacle, example output: "put apple_1 in/on plate_1 at countertop_1" ,→ 55 Page 56: Translated action: put orange_1 in/on bowl_1 at drawer_1 Example 3: Action description: pick up bowl_1 with sliced orange_1 in/on countertop_1 Candidate actions: 1. pick up an object on or in a furniture piece or an appliance, example output: "pick up apple_1 in/on side_table_1" ,→ 2. pick up an object on or in a small receptacle, example output: "pick up apple_1 in/on plate_1 at countertop_1" ,→ Translated action: pick up bowl_1 in/on countertop_1 Example 4: Action description: put mashed apple_1 in/on plate_1 Candidate actions: 1. put an object on or in a furniture piece or an appliance, example output: "put apple_1 in/on side_table_1" ,→ 2. puts an object onto or into a small receptacle, example output: "put apple_1 in/on plate_1 at countertop_1" ,→ Translated action: Not able to translate due to missing information Here is the task. A.12.3 Back-prompting LLM planners with validation feedback by V AL We consider three types of validation feedback that can be obtained from V AL [ 18]. One is unsat- isfied precondition. V AL highlights unsatisfied precondition(s) in PDDL, which can be translated into natural language and provided as corrective feedback to the LLM planner. The feedback mes- sage is structured as follows: " The action at step x is not executable due to unmet precondition(s). Here are the unsatisfied precondition(s): ". The second type of validation feedback is related to unmet goal condition(s), and the third type is related to the usage of invalid or illegal parameter(s) in some action(s). For translating PDDL into natural language, we use a fixed prompt template in all the domains: Template of the prompt for translating PDDL into natural language Your task is to translate planning domain definition language (PDDL) to natural language. Here is an example: Predicates: 1. (robot-holding ?x - block): true if the robot arm is holding the block ?x 2. (block-clear ?x - block): true if the block ?x is not under any another block 3. (robot-hand-empty): true if the robot arm is not holding any block 4. (block-on-table ?x - block): true if the block ?x is placed on the table Contextual information: N/A PDDL: ``` (and (not (robot-holding x1)) (block-clear x1) (robot-hand-empty) (block-on-table x1) ) ``` Translated PDDL: (and The robot arm is not holding the block x1; The block x1 is not under any another block; The robot arm is not holding any block; The block x1 is placed on the table; ) Here is the task: 56 Page 57: A.12.4 Examples of validation feedback and instructions with ordering constraints Make heated mashed potato_1 & Remember to mash the potato at the end after heating it User: ... The original prompt for the LLM planner ... The current states of the household objects are as follows: ... 70. robot at dining_table_1 71. robot is holding mug_1 Your task is to: make heated and mashed potato_1, remember to mash the potato at the end after heating it. ,→ Your plan is: ------------------------------------- GPT-4: Step 1: > think: First, I need to find a cutting board and put it on a flat surface (countertop_1 or countertop_2). In the current state, cutting_board_1 is in/on countertop_2. ,→ > action: go from dining_table_1 to countertop_2 Step 2: > think: Now I need to find potato_1. In the current state, potato_1 is in fridge_1 and fridge_1 is closed. I need to get the potato and then put it on the cutting board. ,→ > action: go from countertop_2 to fridge_1 Step 3: > action: open furniture/appliance fridge_1 Step 4: > action: pick up potato_1 in/on fridge_1 Step 5: > action: go from fridge_1 to countertop_2 Step 6: > action: put potato_1 in/on cutting_board_1 at countertop_2 Step 7: > think: Now I need to find a knife. In the current state, knife_1 is in cabinet_3 and cabinet_3 is closed. I need to get the knife and then slice the potato. ,→ > action: go from countertop_2 to cabinet_3 Step 8: > action: open furniture/appliance cabinet_3 Step 9: > action: pick up knife_1 in/on cabinet_3 Step 10: > action: go from cabinet_3 to countertop_2 Step 11: > action: slice potato_1 with knife_1 and cutting_board_1 at countertop_2 Step 12: > think: Now I need to heat the sliced potato. In the current state, microwave_1 is closed. I need to put the sliced potato in a microwave-safe container and then heat it in the microwave. ,→ > action: go from countertop_2 to cabinet_4 Step 13: > action: open furniture/appliance cabinet_4 Step 14: > action: pick up bowl_2 in/on dish_washer_1 Step 15: > action: go from cabinet_4 to countertop_2 Step 16: > action: transfer potato_1 from cutting_board_1 to bowl_2 at countertop_2 Step 17: > action: go from countertop_2 to microwave_1 Step 18: > action: open furniture/appliance microwave_1 Step 19: > action: put bowl_2 in/on microwave_1 Step 20: > action: close furniture/appliance microwave_1 Step 21: > action: heat potato_1 in bowl_2 with microwave_1 Step 22: > action: open furniture/appliance microwave_1 Step 23: > action: pick up bowl_2 in/on microwave_1 Step 24: > action: go from microwave_1 to countertop_1 Step 25: > think: Now I need to find a blender to mash the heated potato. In the current state, blender_1 is in/on countertop_1 and blender_1 is turned off. I need to put the heated potato in the blender and then mash it.,→ ,→ 57 Page 58: > action: transfer potato_1 from bowl_2 to blender_1 at countertop_1 Step 26: > action: mash potato_1 with blender_1 at countertop_1 Step 27: > think: Task complete. The heated mashed potato_1 is now in the blender. ------------------------------------- Validation feedback: The action at step 3 is not executable due to unmet precondition(s). Here are the unsatisfied precondition(s): ,→ The robot 's gripper is not holding any object. Your revised plan is: ------------------------------------- GPT-4: ... The revised plan ... Place fork_1 spoon_1 and knife_1 on dining_table_1 & Please take the fork first and the knife last User: ... The original prompt for the LLM planner ... The current states of the household objects are as follows: ... 24. lunch_box_2 in/on fridge_1 25. lunch_box_2 is not opened ... 59. spoon_1 in/on lunch_box_2 ... Your task is to: place fork_1, spoon_1, and knife_1 on dining_table_1, please take the fork first and the knife last. ,→ Your plan is: ------------------------------------- GPT-4: Step 1: > think: First, I need to put down the mug_1 I am currently holding. > action: go from dining_table_1 to side_table_1 Step 2: > action: put mug_1 in/on side_table_1 Step 3: > think: Now, I need to pick up fork_1 from dish_washer_1. > action: go from side_table_1 to dish_washer_1 Step 4: > action: open furniture/appliance dish_washer_1 Step 5: > action: pick up fork_1 in/on dish_washer_1 Step 6: > think: I have the fork_1, now I need to place it on dining_table_1. > action: go from dish_washer_1 to dining_table_1 Step 7: > action: put fork_1 in/on dining_table_1 Step 8: > think: Next, I need to pick up spoon_1 from lunch_box_2 in fridge_1. > action: go from dining_table_1 to fridge_1 Step 9: > action: open furniture/appliance fridge_1 Step 10: > action: pick up spoon_1 in/on lunch_box_2 at fridge_1 Step 11: > think: I have the spoon_1, now I need to place it on dining_table_1. > action: go from fridge_1 to dining_table_1 Step 12: > action: put spoon_1 in/on dining_table_1 Step 13: > think: Finally, I need to pick up knife_1 from cabinet_1. > action: go from dining_table_1 to cabinet_1 Step 14: > action: pick up knife_1 in/on cabinet_1 Step 15: > think: I have the knife_1, now I need to place it on dining_table_1. > action: go from cabinet_1 to dining_table_1 Step 16: > action: put knife_1 in/on dining_table_1 Step 17: 58 Page 59: > think: All utensils are placed on the dining table. Now, I need to close the opened appliances. > action: go from dining_table_1 to fridge_1 Step 18: > action: close furniture/appliance fridge_1 Step 19: > action: go from fridge_1 to dish_washer_1 Step 20: > action: close furniture/appliance dish_washer_1 ------------------------------------- Validation feedback: The action at step 10 is not executable due to unmet precondition(s). Here are the unsatisfied precondition(s) described in Pseudo-PDDL format: ,→ (and The furniture or appliance fridge_1 has a flat surface for manipulation; (or The small receptacle lunch_box_2 cannot be opened; The small receptacle lunch_box_2 is open; ) ) Your revised plan is: ------------------------------------- GPT-4: ... The revised plan ... A.13 Translating user instructions into PDDL goal specifications Template of the prompt for translating user instructions into PDDL goal specifications Your task is to translate natural language instructions to planning goals in planning domain definition language (PDDL). You should only use the given predicates to express the goal specification.,→ ,→ Here is an example from the classical BlocksWorld domain for demonstrating the output format. Domain information: BlocksWorld is a planning domain in artificial intelligence. The AI agent here is a mechanical robot arm that can pick and place the blocks. Only one block may be moved at a time: it may either be placed on the table or placed atop another block. Because of this, any blocks that are, at a given time, under another block cannot be moved. There is only one type of object in this domain, and that is the block.,→ ,→ ,→ ,→ The domain has the following objects: block_1 (type: block), block_2 (type: block), block_3 (type: block), block_4 (type: block), block_5 (type: block). ,→ Predicates: 1. (robot-holding ?x - block): true if the robot arm is holding the block ?x 2. (block-clear ?x - block): true if the block ?x is not under any another block 3. (robot-hand-empty): true if the robot arm is not holding any block 4. (block-on-table ?x - block): true if the block ?x is placed on the table 5. (on ?x - block ?y - block): true if block ?x is on top of block ?y Instruction: rearrange the blocks such that block_1 is on top of block_2, block_2 is on top of block_3, and block_5 is not on the table. ,→ Goal in PDDL: ``` (:goal (and (on block_1 block_2) (on block_2 block_3) (not (block-on-table block_5)) ) ) ``` Here is the task. Domain information: xxx. 59