Open AGI Codes | Your Codes Reflect!

Updates

Generating audio...

arxiv

Paper 2503.10416

Super-Linear Speedup by Generalizing Runtime Repeated Recursion Unfolding in Prolog

Authors: Thom Fruehwirth

Published: 2025-03-13

Abstract:

Runtime repeated recursion unfolding was recently introduced as a just-in-time program transformation strategy that can achieve super-linear speedup. So far, the method was restricted to single linear direct recursive rules in the programming language Constraint Handling Rules (CHR). In this companion paper, we generalize the technique to multiple recursion and to multiple recursive rules and provide an implementation of the generalized method in the logic programming language Prolog. The basic idea of the approach is as follows: When a recursive call is encountered at runtime, the recursive rule is unfolded with itself and this process is repeated with each resulting unfolded rule as long as it is applicable to the current call. In this way, more and more recursive steps are combined into one recursive step. Then an interpreter applies these rules to the call starting from the most unfolded rule. For recursions which have sufficiently simplifyable unfoldings, a super-linear can be achieved, i.e. the time complexity is reduced. We implement an unfolder, a generalized meta-interpreter and a novel round-robin rule processor for our generalization of runtime repeated recursion unfolding with just ten clauses in Prolog. We illustrate the feasibility of our technique with worst-case time complexity estimates and benchmarks for some basic classical algorithms that achieve a super-linear speedup.

Paper Content:

Page 1: arXiv:2503.10416v1 [cs.PL] 13 Mar 2025Super-Linear Speedup by Generalizing Runtime Repeated Recursion Unfolding in Prolog Thom Fr¨ uhwirth Ulm University, 89069 Ulm, Germany thom.fruehwirth@uni-ulm.de March 14, 2025 Abstract Runtime repeated recursion unfolding was recently introdu ced in [Fr¨ u24] as a just-in-timeprogramtransformationstrategythatcanach ievesuper-linearspeedup. So far, the method was restricted to single linear direct rec ursive rules in the pro- gramming language Constraint Handling Rules (CHR). In this companion paper, we generalize the technique to multiple recursion and to mul tiple recursive rules and provide an implementation of the generalized method in t he logic program- ming language Prolog. Thebasicideaoftheapproachisasfollows: Whenarecursive call isencountered at runtime, the recursive rule is unfolded with itself and th is process is repeated with each resulting unfolded rule as long as it is applicable to the current call. In this way, more and more recursive steps are combined into one recursive step. Then an interpreter applies these rules to the call starting from the most unfolded rule. For recursions which have suﬃciently simplifyable unfoldi ngs, a super-linear can be achieved, i.e. the time complexity is reduced. We implement an unfolder, a generalized meta-interpreter a nd a novel round- robin rule processor for our generalization of runtime repe ated recursion unfolding with just ten clauses in Prolog. We illustrate the feasibili ty of our technique with worst-case time complexity estimates and benchmarks for so me basic classical al- gorithms that achieve a super-linear speedup. Keywords: Logic Programming, Prolog, Rule-Based Programming, Just-In-Tim e Program Transformation, Runtime Program Optimization, Online Pro gram Specializa- tion, RepeatedRecursion Unfolding, Super-Linear Speedup, Recu rsion, Meta-Interpreter, Time Complexity. Contents 1 Introduction 2 1 Page 2: 2 Repeated Recursion Unfolding in Prolog 4 2.1 Semantics of Prolog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.2 Rule Unfolding with Simpliﬁcation . . . . . . . . . . . . . . . . . . . . . 5 2.3 Repeated Recursion Unfolding . . . . . . . . . . . . . . . . . . . . . . . . 6 2.4 Optimal Rule Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3 Implementation of Runtime Repeated Recursion Unfolding i n Prolog 7 3.1 Unfolder Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.2 Generalized Meta-Interpreter Implementation for Multiple Recu rsion . . 9 3.3 Recursive Predicate Implementation . . . . . . . . . . . . . . . . . . . . . 10 4 Generalization to Multiple Recursive Rules 10 4.1 Meta-Interpreter with Continuation Goals . . . . . . . . . . . . . . . . . 11 4.2 Round-Robin Rule Processor . . . . . . . . . . . . . . . . . . . . . . . . . 11 4.3 Recursive Predicate Implementation . . . . . . . . . . . . . . . . . . . . . 12 5 Experimental Evaluation with Benchmarks 12 5.1 Summation Example, Contd. . . . . . . . . . . . . . . . . . . . . . . . . . 13 5.2 Fibonacci Number Example . . . . . . . . . . . . . . . . . . . . . . . . . 14 5.3 Greatest Common Divisor Example . . . . . . . . . . . . . . . . . . . . . 16 6 Related Work 19 7 Conclusions 20 A Fibonacci Numbers, Derivation of Rule Scheme for Unfoldin g 22 B Further Examples 24 B.1 List Reversal Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 B.2 Sorting Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1 Introduction In this companion paper, we generalize our online program optimizatio n strategy of run- time repeated recursion unfolding from the main paper [Fr¨ u24] to e nable super-linear speedups for more classes of recursion. We generalize the approa ch to multiple recur- sion and to multiple recursive rules and we provide an implementation of the generalized method in the logic programming language Prolog. Unfolding isaprogramtransformationthatbasically replacesacallin thebody(right- hand side) of a rule with the body of a rule whose head (left-hand side ) ﬁts the call. Runtime Repeated recursion unfolding [Fr¨ u20, Fr¨ u24] ﬁrst unfolds a given recursive rule with itself andsimpliﬁes it. This results in a specialized recursive rule tha t, when applied, covers two combined recursive steps instead ofone. The techniqu e continues to unfoldthe last unfolded recursive rule with itself until the resulting rule is no long er applicable to the given call. Each unfolding doubles the number of recursive steps covered. Then, the rules are applied to the given recursive call, with the most unfolded ru le being tried ﬁrst, continuing until the original rule and ﬁnally a base case are reached. If the unfolded rules 2 Page 3: admitted suﬃcient simpliﬁcation of the combined recursive steps, th is approach results in a proven super-linear speedup [Fr¨ u24]. In this paper, we assume some familiarity with Prolog [SS94]. For eﬃcie ncy, we assume that each recursive clause has a green cut . To ease the explanation of the method in this paper, we use the term guardfor the goals before the cut and the term bodyonly for the goals after the cut. Example 1.1 (Summation) Consider the following introductory example adapted from [Fr¨ u24], a simple recursive program written here in the abs tract syntax of Prolog. It recursively adds all numbers from 1ton. Rulebcovers the base case and rule rcovers the recursive case. The headsum(N,S),guard(e.g.N=1) andbodyof a rule are separated by the symbols ←and cut!, respectively. b=sum(N,S)←N= 1,!,S= 1 r=sum(N,S)←N >1,!,sum(N−1,S1),S:=N+S1 Unfolding the recursive rule with a copy of itself and simpli fying the resulting rule gives r1=sum(N,S)←N >2,!,sum(N−2,S1′),S:= 2∗N−1+S1′ Note that this rule r1behaves like applying the original rule rtwice. With rule r1we only need about half as many recursive steps as with the origi nal rule alone. Because the arithmetic computation is simpliﬁed, we can also expect to h alve the runtime. We proceed with unfolding rule r1with itself: r2=sum(N,S)←N >4,!,sum(N−4,S1),S:= 4∗N−6+S1 This rule results in fourfold speedup. We can continue this p rocess, doubling the speed each time1. This kind of simpliﬁcation of the combined recursive steps in the unfolded rules is suﬃcient to achieve a super-linear speedup. The num ber of unfoldings depends on the given call. The most unfolded rule should cover as many recursive steps of the call as possible. For example, for N=4we will unfold till rule r1with guard N>2, forN=5 till ruler2withN>4, forN=50till ruler5withN>32. Repeated recursion unfolding requires unfolding on-the-ﬂy becau se the number of unfoldings depends on the current call. We do not want to modify the given program at runtime. Therefore, the method also introduces an interprete r for the unfolded rules. Thismeta-interpreter2tries and applies each unfolded rule at most once starting with the given call and the most unfolded rule. Overview of the Paper. Asacompanionpaper, thisworkisnotfullyself-contained. To make it more accessible, we reuse some parts from the main paper [Fr¨ u24]. After this introduction, Section 2 deﬁnes the program transformation method of run- time repeated recursion unfolding with simpliﬁcation and an optimal ru le application strategy for the logic programming language Prolog. The exposition follows the one in the main paper, but replaces CHR with Prolog. 1Clearly there is a closed-form solution for this problem, S=N∗(N+1)/2, but this is not the point of the example. 2A meta-interpreter interprets a program written in its own implemen tation language (Chapter 8 in [HR99]). 3 Page 4: Section 3 presents our lean implementation of the unfolder and meta-interpr eter to perform repeated recursion unfolding with optimal rule applications at runtime, again following the main paper. Section 4 generalizes our implementation of runtime repeated recursion unfolding for multiple recursive rules using a new program la yer called the round-robin rule processor. This is the main contribution of this com panion paper. The complete implementation consists of just ten clauses written in Prolo g. It has little overhead, the runtime mainly depends on the given recursive rule an d its unfoldings. Section 5 contains the experimental evaluation of our technique on some exa mples, summation, Fibonacci numbers and greatest common divisor (GCD) . Fibonacci features multiple recursion and GCD features multiple recursive rules. We deriv e the necessary simpliﬁcations, discuss the time complexity and compare it with the res ult of benchmarks to verify the super-linear speedup. Section 6 discusses related work. Finally, the conclusion discusses achieveme nts, lim- itations and future work of generalized runtime repeated recursio n unfolding. In the appendix we give two additional examples, list reversal and sorting f rom [Fr¨ u24]. 2 Repeated Recursion Unfolding in Prolog We give a short deﬁnition of the semantics of Prolog and of rule unfold ing in Prolog [SS94]. Then we introduce repeated recursion unfolding based on [F r¨ u24] and deﬁne an optimal rule application strategy. 2.1 Semantics of Prolog Logic Programming [SS94, Kow14] is a programming and knowledge rep resentation paradigm based on ﬁrst-order logic. We assume some familiarity with t he syntax and semantics of the logic programming language Prolog: Goalsare possibly empty conjunc- tions of atoms. They are denoted by upper case letters in deﬁnition s. To avoid clutter, we use simple commas to denote logical conjunction. Prolog built-in pr edicates include trueandfalse, the syntactic equality = /2 between terms and arithmetic operations using the arithmetic equality := /2. Syntactic equality tries to unify its arguments, i.e. making them syntactically identical by instantiating their variables ap propriately. To deﬁne the operational semantics of Prolog, we assume a transit ion system with a transition relation /ma√sto→between states that are goals. An initial state is any goal, a ﬁnal state is one that contains only syntactic equalities (including the spe cial cases trueand false). Deﬁnition 2.1 (Prolog Semantics) A Prolog program Pconsists of clauses of the formH←B, whereHis an atom and Bis a goal. If Bis empty, it is equivalent to trueand the clause is called a fact, otherwise it is called a rule. LetSbe a function that simpliﬁes the syntactic equalities in a goal. Given a goal, let Gbe an atom chosen from the goal and Dbe the remainder of the goal. •Choose a clause from Pand take a copy of the clause with new variables, H←B. IfGandHare uniﬁable, then G,D/ma√sto→S(G=H,B,D). 4 Page 5: •Otherwise, if Gis a call to a built-in predicate, the transition replaces it by the result of executing the built-in, where the result is a conjunction of syntactic equalities E, G,D/ma√sto→S(E,D). •Otherwise, Ggives rise to a failure transition into the failed state deno ted by false, G,D/ma√sto→false. In implementations of Prolog, all clauses are tried for the current g oal in a systematic manner using chronological backtracking. In the context of this p aper, we assume green cuts for eﬃciency [SS94] in the recursive clauses. This means that w e commit to the chosen clause once the cut is reached. 2.2 Rule Unfolding with Simpliﬁcation We now deﬁne unfolding with simpliﬁcation for Prolog rules. Deﬁnition 2.2 (Unfolding) LetPbe a Prolog program and let randvbe copies of two (not necessarily diﬀerent) clauses from Psuch that they do not share variables r=H←D,G,B v=H′←B′, whereH,H′andGare atoms, and D,BandB′are goals. Theunfolding of rule rwith rule v,unfold(r,v),is the rule r′=H←D,G=H′,B′,BifGandH′are uniﬁable . If the goal Gin the body of rule runiﬁes with the head H′of a rule v, unfolding replaces GbyG=H′and by the body of rule vto obtain the unfolded rule r′. In the presence of a cut operator in the body of a rule, the correc t unfolding in general becomes more complicated [Pre93]. Here we assume green cuts in the recursive rules. In this case, it suﬃces to remove the cut coming from the rule rin the unfolded rule r′. The cut from rule vwill stay. Speedup crucially depends on the amount of simpliﬁcation of the combined recursive steps in the unfolded rules. We want to replace goals in the recursive step of the rules by semantically equivalent ones that can be executed more eﬃciently. T his includes moving the green cut to the left as far as semantically possible. For simpliﬁca tion, we can use any available program transformation technique. Example 2.1 (Summation, contd.) We unfold the recursive rule for summation with (a copy of) itself: r=sum(N,S)←N >1,!,sum(N−1,S1),S:=N+S1 v=sum(N′,S′)←N′>1,!,sum(N′−1,S1′),S′:=N′+S1′ Then the unfolded rule is r1=sum(N,S)←N >1,sum(N−1,S1)=sum(N′,S′),N′>1,!, sum(N′−1,S1′),S′:=N′+S1′,S:=N+S1 The unfolded rule can be simpliﬁed into the rule r1=sum(N,S)←N>2,!,sum(N−2,S1′),S:=2∗N−1+S1′ 5 Page 6: 2.3 Repeated Recursion Unfolding We can now deﬁne the program optimization strategy of repeated r ecursion unfolding for Prolog based on rule unfolding with simpliﬁcation. In our method, we st art from a call (query) for a Prolog predicate deﬁned by a recursive rule. We unfo ld the recursive rule with itself and simplify it. Then we unfold the resulting rule. We repeat t his process as long as the resulting rules are applicable to the query. Deﬁnition 2.3 (Repeated Recursion Unfolding) Letrbe a recursive rule and Gbe a goal. Let unfold(r) =unfold(r,r). Therepeated recursion unfolding of a recursive rule rwith goal Gand with rule simpli- ﬁcation is a maximal sequence of rules r0,r1,...where rule rihas head Hiand guard Ci and where r0=r ri+1=unfold(ri)as long as G=Hi,Ci/negationslash/ma√sto→false(i≥0) We unfold and simplify the current unfolded rule riif the guard of the rule is successfully executable with the query G. We add the new rule to the sequence and continue with it. Note that unfolding may fail to produce a rule. In that case, the un folding process stops. Example 2.2 (Summation, contd.) Consider a query sum(10,R). Recall the un- folded simpliﬁed rule r1=sum(N,S)←N>2,!,sum(N−2,S1),S:= 2∗N−1+S1 The query sum(10,R)means that the guard N>2succeeds since 10 =N, so we repeat the unfolding: unfold(r1) =sum(N,S)←N>2,sum(N−2,S1)=sum(N′,S′),N′>2,!, sum(N′−2,S1′),S′:= 2∗N′−1+S1′,S:= 2∗N−1+S1 The unfolded rule can be simpliﬁed into the rule r2=sum(N,S)←N>4,!,sum(N−4,S1′),S:= 4∗N−6+S1′ The rule r2is applicable to the goal. Further recursion unfolding resu lts in rules with guardsN >8and then N >16. To the latter rule, the goal sum(10,R)is not applicable anymore. Hence repeated recursion unfolding stops. The rul es for the goal sum(10,R) are therefore (more unfolded rules come ﬁrst): r3=sum(N,S)←N >8,!,sum(N−8,S1),S:= 8∗N−28+S1 r2=sum(N,S)←N >4,!,sum(N−4,S1),S:= 4∗N−6+S1 r1=sum(N,S)←N >2,!,sum(N−2,S1),S:= 2∗N−1+S1 r=r0=sum(N,S)←N >1,!,sum(N−1,S1),S:=N+S1 b=sum(N,S)←N= 1,!,S= 1. Note that to the goal sum(10,R)we can apply any of the recursive rules. The most eﬃcient way is to start with the ﬁrst, most unfolded rule. It c overs more recursive steps of the original recursive rule than any other rule. 6 Page 7: 2.4 Optimal Rule Applications Anunfoldedrulecoverstwiceasmanyrecursionstepsthanthegive nrule. Whenweapply a more unfolded rule, we cover more recursive steps with a single rule application. Based on this observation we introduce a rule application strategy where w e try to apply more unfolded rules ﬁrst. Furthermore each unfolded rule is tried only on ce and is applied at most once. In [Fr¨ u24], we proved this optimal rule application str ategy sound and complete in the case of CHR. Here we apply it to Prolog. Example 2.3 (Summation, contd.) A computation with optimal rule applications for the goalsum(10,R)is as follows (where we subscript single transitions with th e rule that was used): sum(10,R)/ma√sto→r3 S(sum(10,R)=sum(N,S),N >8,sum(N−8,S1),S:= 8∗N−28+S1)/ma√sto→+ sum(2,S1),R:= 52+S1/ma√sto→r0 S(sum(2,S1)=sum(N′,S′),N′>1,sum(N′−1,S1′),S′:=N′+S1′,R:= 52+S1)/ma√sto→+ sum(1,S1′),R:= 54+S1′/ma√sto→b R= 55 3 Implementation of Runtime Repeated Recursion Unfolding in Prolog At compile-time, the rules for the given recursive predicate are replaced by a call to the unfolder that contains these rules and then to the meta-inter preter that interprets the unfolded rules. At runtime, the unfolder repeatedly unfolds a r ecursive rule as long as it is applicable to a given goal using a predeﬁned unfolding scheme that speciﬁes the simpliﬁcation of the rule. Then the meta-interpreter applies the resulting unfolded simpliﬁed rules according to the optimal rule application strategy. In the following code in concrete Prolog syntax, =/2,is/2,copyterm/2andcall/1 are standard built-in predicates of Prolog. The syntactic equality =/2tries to unify its arguments, i.e. makingthemsyntactically identical byinstantiatingt heirvariablesappro- priately. The arithmetic equality is/2tries to unify its ﬁrst argument with the result of evaluating the arithmetic expression in its second argument. The bu ilt-incopyterm/2 produces a copy of the given term with new variables. The Prolog met a-callcall/1 executes its argument as a goal. Our implementation in SWI Prolog [WSTL12] together with the examples is fully listed in this paper. Together with the benchmarking code it is also ava ilable from the author on request. 3.1 Unfolder Implementation The unfolder is implemented as a recursive Prolog predicate unf/3. It is the result of a straightforward translation from the unfolder written in CHR in [Fr ¨ u24]. It repeat- edly unfolds and simpliﬁes a recursive rule as long as it is applicable to a go al. In 7 Page 8: unf(G,Rs,URs) , the ﬁrst argument Gis the goal and Rsis a list of given original clauses. URsis the resulting list of unfolded and original clauses. We assume that inthe goal Gthe input arguments are given andtheoutput arguments are variables. Initially, the list Rsconsists of a recursive rule followed by one or more clauses for the base cases of the recursion. To simplify the implemen tation, the body of the rules in the lists syntactically always consists of three conjunct s of goals: the atoms before the recursive goal, the recursive goal and the atoms afte r the recursive goal. We use the built-in trueto denote an empty conjunct. % unf(+RecursiveGoal, +RuleList, -UnfoldedRuleList) unf(G, [R|Rs], URs) :- % recursive step R = (H :- Co ,!, _), % get head H and guard Co of rule R copy_term((H :- Co),(G :- C)), % copy them, unify head copy wi th goal G call(C) % call instantiated guard copy C ,!, simp_unf(R, UR), % unfold, simplify rule R into rule UR unf(G, [UR,R|Rs], URs). % add new rule UR and recurse unf(_G, [_R|Rs], URs) :- URs=Rs. % otherwise return rules Rs in URs In the recursive clause for unf/3, we check if the rule Rin the list is applicable to the query (call, goal) G: If the guard Csucceeds, we unfold the current rule Rwith itself and and simplify it using the predicate simpunf/2(discussed below) and add the resulting ruleURto the rule list in the recursive call of unf/3. When the guard has failed, the base case clause of unf/3returns the rules that have been accumulated in the rule list as the result list in the third argument (with the exception of the ﬁrst rule t o which the goal was not applicable). The Prolog predicate simpunf(R,UR) takes the current rule and computes its simpli- ﬁed unfolding. It is deﬁned for each recursive rule. In the head of a clause for simpunf/2 we userule templates for the arguments to ease the implementation. The rules are then instances of the template. Certain variables in the template repres ent theparameters for the instance. These parameters will be bound at runtime. In the bo dy of a clause for simpunf/2, the parameters for the unfolded rule will be computed from the pa rameters of the current rule. The following example clariﬁes the above remarks on the implementatio n. Example 3.1 (Summation, contd.) We show how we implement simpunf/2for the summation example. We abbreviate sumto its ﬁrst letter sto avoid clutter in the code. The rule template for sumis s(A,C) :- A>V ,!, B is A-V, s(B,D), C is V*A-W+D % rule template where the variables VandWare parameters that stand for integers. Its instance for the original recursive rule is s(A,C) :- A>1 ,!, B is A-1, s(B,D), C is 1*A-0+D % rule instance V=1, W=0 The implementation of the unfolding scheme for summation is accomplished by the fol- lowing Prolog clause for simpunf/2. 8 Page 9: simp_unf( (s(A,C) :- A>V ,!, B is A-V, s(B,D), C is V*A-W+D), (s(Al,Cl) :- Al>Vl ,!, Bl is Al-Vl, s(Bl,Dl), Cl is Vl*Al-Wl+ Dl) ) :- Vl is 2*V, Wl is 2*W+V*V. For a goal s(100,S) the unfolder is called with unf(s(100,S), [ (s(A,C):-A>1 ,!, B is A-1, s(B,D), C is 1*A-0+D), % orig. recu rsion (s(A,B):-A=1 ,!, B=1, true, true) % base case ], URs). It will return the following rules in the list URs: s(A,C) :- A>64 ,!, B is A-64, s(B, D), C is 64*A-2016+D s(A,C) :- A>32 ,!, B is A-32, s(B, D), C is 32*A-496+D s(A,C) :- A>16 ,!, B is A-16, s(B, D), C is 16*A-120+D s(A,C) :- A>8 ,!, B is A-8, s(B, D), C is 8*A-28+D s(A,C) :- A>4 ,!, B is A-4, s(B, D), C is 4*A-6+D s(A,C) :- A>2 ,!, B is A-2, s(B, D), C is 2*A-1+D s(A,C) :- A>1 ,!, B is A-1, s(B, D), C is 1*A-0+D % orig. recursi on s(A,C) :- A=1 ,!, C=1, true, true % base case 3.2 Generalized Meta-Interpreter Implementation for Mult iple Recursion We implement the optimal rule application strategy with the help of a me ta-interpreter for Prolog. Again it closely follows the implementation for CHR in [Fr¨ u24 ]. The meta- interpreter handles the recursive calls, any other goal will be hand led by the underlying Prolog implementation. To a recursive goal, the meta-interpreter t ries to apply the unfolded rules produced by the unfolder and applies each of them at most once. The meta-interpreter is called with mip(G,Rs) , whereGis the given recursive goal and Rsis the list of rules from the unfolder unf/3. % Meta-Interpreter for Multiple Recursion % mip(+RecursiveGoal, +RuleList) mip(true,_Rs). % base case, no more recursive goal mip((G1,G2),Rs) :- % handle conjunction for multiple recur sion mip(G1,Rs), mip(G2,Rs). mip(G,[R|Rs]) :- % current rule is applicable to goal G copy_term(R, (G :- C ,!, B,G1,D)), % copy rule, unify head cop y with G call(C) % check guard ,!, call(B), % execute atoms before recursive call 9 Page 10: mip(G1,Rs), % recurse with recursive goal and remaining rul es call(D). % execute atoms after recursive call mip(G,[_R|Rs]) :- % current rule is not applicable mip(G,Rs). % try remaining rules on G We now discuss the four clauses of our meta-interpreter. In the ﬁ rst clause, the base case is reached since the recursive goal has been reduced to true. The second clause handles a conjunction of recursive goals by interpreting each conjunct on its own with the rule listRs. This case can handle multiple recursion and is new, it does not appear in the main paper [Fr¨ u24]. The third clause tries to apply the rule Rin the rule list to the current goal G. If the guardCholds, the rule is applied. The conjunct before the recursive goal Bis directly executed with a meta-call. Next, the recursive goals G1are handled with a recursive call to the meta-interpreter using the remainder of the rule list. Finally t he conjunct after the recursive goal Dis directly executed with a meta-call. Otherwise the ﬁrst rule from the rule list was not applicable, and so th e last meta- interpreter clause recursively continues with the remaining rules in t he list. 3.3 Recursive Predicate Implementation In order to enable runtime repeated recursion unfolding, at compile -time, the clauses for thegivenrecursive predicate c/karereplacedbyacalltotheunfolder unf/3thatcontains these clauses and then to the meta-interpreter mip/2that interprets the unfolded rules. The replacement ﬁts the following rule template where X1,...,Xk are diﬀerent variables. % rule template for a recursive predicate c/k c(X1,...,Xk) :- unf(c(X1,...,Xk), OriginalRules, UnfoldedRules), mip(c(X1,...,Xk), UnfoldedRules). Example 3.2 (Summation, contd.) For the summation example, the corresponding rule instance is as follows: sum(N,S) :- unf(s(N,S), [ (s(A,C) :- A>1 ,!, B is A-1, s(B, D), C is 1*A-0+D), (s(A,C) :- A=1 ,!, C=1, true, true) ], URs), mip(s(N,S), URs). 4 Generalization to Multiple Recursive Rules We handle multiple recursive rules by handling each of the rules in separ ation. This is the main contribution of this companion paper. Each recursive rule is subjected to runtime repeated recursion unfolding in a round-robin manner by a n ovel round-robin rule processor. 10 Page 11: 4.1 Meta-Interpreter with Continuation Goals To enable this approach, we ﬁrst generalize the original meta-inter preter by adding an additional argument that will contain the goal that remains after t rying and applying the unfolded rules of one speciﬁc original recursive rule, we call it the continuation goal . % Meta-Interpreter with Continuation Goal % mip(+RecursiveCall, +RuleList, -ContinuationGoal) mip(true,_LR,true) :- !. % base case, no more goal mip((R1,R2),LR,(G1,G2)) :- !, % for multiple recursion mip(R1,LR,G1), mip(R2,LR,G2). mip(R,[RR|LR],G) :- % apply current rule copy_term(RR, (R :- C ,!, B,Rl,RCl)), call(C) ,!, call(B), mip(Rl,LR,G), call(RCl). mip(R,[_RR|LR],G) :- % current rule not applicable mip(R,LR,G). mip(R,[],R). % no more rule applicable, return current goal The last clause is new, it is the additional base case mip(R,[],R) . It applies when the rule list has become empty. Then the remaining unprocessed goal Rin the ﬁrst argument is returned as continuation goal in the third argument. 4.2 Round-Robin Rule Processor To handle the remaining goal, we then continue with generating, tryin g and applying the unfolded rules stemming from another recursive rule. We repea t this process with all recursive rules until no goal remains. This behavior is implemented by around-robin rule processor with a recursive predicate umr(R,RS) , whereRis a recursive call and RSis a list of lists of unfolded rules. We keep and reuse unfolded rules betw een rounds. In the comments in the code below, rlstands for rule list and rlsfor rule lists. % Round-Robin Rule Processor % umr(+RecursiveCall, +RuleLists) % unfolding and interpretation rounds with several unfolde d rule lists umr(true,_RS) :- !. % base case, no more recursive call to pro cess umr(R,[goal(G)|RS]) :- !, % this clause ensures terminatio n R \= G, % if goals unchanged, no rule was applied 11 Page 12: append(RS,[goal(R)],RS1), % add current goal at end of rema ining rls umr(R,RS). % continue with goal and rls umr(R,[RL|RS]) :- unf(R,RL,RL1), % unfold with current rl, return extended rl mip(R,RL1,G), % interpret with this rl, return contin. goal append(RS,[RL1],RS1), % add expanded rl at end of remaining rls umr(G,RS1). % recurse with contin. goal and updated rls In the ﬁrst clause, the base case is reached since the goal is empty (which is represented bytrue). The second clause ensures termination in case no rule was applicable. The list element goal(G) is reached after each round and holds the goal from the start of t he round. When it is still equivalent to the current goal R, no rule was applied to the goal and the computation fails due to the requirement R \= G. In the last, main clause, umr/2tries to further unfold rules in the current rule list RL with respect to the current recursive call Rusing the unfolder unf/3and then interprets Rwith that rule list using mip/3. We add the extended rule list RL1at the end of the rule listsRSso that we can reuse it and can go cyclically through all rule lists in a fair manner. In the recursive call, the resulting continuation goal Gis further processed together with the updated rule lists RS1. 4.3 Recursive Predicate Implementation At compile-time, the rules for the given recursive predicate c/kare replaced by a call toumr/2, whereX1,...,Xk are diﬀerent variables and OriginalRuleLists is the list of lists of the given recursive rules of c/k. We also add the clauses for the base cases at the end of each recursive rule list. Finally, add the end of the lists of lists w e add the list elementgoal(c(X1,...,Xk)) to ensure termination by recording the current call. % rule template for a recursive predicate c/k with multiple r ecursion c(X1,...,Xk) :- umr(c(X1,...,Xk), OriginalRuleLists). 5 Experimental Evaluation with Benchmarks Our examples will demonstrate that super-linear speedups are inde ed possible. With suﬃcient simpliﬁcation, the time complexity is eﬀectively reduced when applying our generalized runtime repeated recursion unfolding. For the worst-case time complexity of our implementation of runtime repeated re- cursion unfolding, we have to consider the recursions in the original rules of the given program, in the unfolder, generalized meta-interpreter and roun d-robin rule processor. We parametrize the time complexity by the number of recursive step s with the original rule. From the time complexity of the recursive steps we can derive t he time complexity of the recursion using recurrence equations. For details, see the main paper [Fr¨ u24]. In the unfolder, the complexity of a recursive step mainly depends o n the time for copying head and guard, for guard checking, and for unfolding and simpliﬁcation of the 12 Page 13: current rule. In the meta-interpreter, the time complexity is domin ated by the third clause that applies a rule from the list to the current goal. The comple xity of a recursive step in that clause is determined by the time needed for copying the r ule and for the meta-calls of the guard and of the two body conjuncts of the rule. In the round-robin rule processor, the third clause calls the unfolder and the generaliz ed meta-interpreter for each rule list. Note that rule lists are reused in later rounds. Our complexity estimates are based on the following assumptions tha t hold in SWI Prolog. Uniﬁcation and copying take constant time for given terms a nd (near-)linear time in the size of the involved terms in general. A Prolog meta-call has the same time complexity as directly executing its goal argument. SWI Prolog uses the GNU multiple precision arithmetic library (GMP), where integer arithmetic is unbou nded. Comparison and addition have logarithmic worst-case time complexity in the numbe rs involved. A variety of multiplication algorithms are used in GMP to get near logarith mic complexity. If one multiplies with a power of 2, the complexity is reduced to logarith mic. In our experiments, we used SWI Prolog 9.2.3-1 running on an Apple Ma c mini M1 2020 MacOS Monterey 12.7.4 with 16 GB RAM. We use default settings f or SWI Prolog except for the command line option -Owhich compiles arithmetic expressions. During multiple runs of the benchmarks we observed a jitter in timings of at m ost 7%. Because the runtime improvement is so dramatic, we can only benchmark small inputs with the original recursion and have to benchmark larger inputs with runtime recursion unfolding. 5.1 Summation Example, Contd. Wehave alreadyunfolded andsimpliﬁed the recursive rule forsummat ion inExample 2.1. We introduced the implementation in concrete Prolog syntax in Examp le 3.1. In [Fr¨ u24], we derived estimates for the time complexities for our summation exa mple written in CHR that we can reuse here when we compare them to benchmark re sults. 5.1.1 Benchmarks Table 1 shows benchmarks results for the summation example. Times are given in mil- liseconds. Since the example features only one recursive rule, ther e is no need to use the round-robin rule processor umr/2. The benchmarks conﬁrm the super-linear speedup. Original Recursion. In each subsequent table entry, we double the input number. The runtime roughly doubles. So the runtime is at least linear. This is in lin e with the expected log-linear time complexity O(nlog(n)): since the numbers are so small, addition is fast, almost constant time, and the runtime is dominated by the line ar time overhead of the recursion itself. Unfolder and Meta-Interpreter. For runtime repeated recursion unfolding of our summation example, we give the time needed for the unfolding, the tim e needed for the execution with the meta-interpreter, and the sum of these timings (column ’Total Time’). Because our method has lower time complexity and is quickly several o rders of magnitude faster, we start from 225and in each subsequent table entry, we square the input number instead of just doubling it. The runtimes of the unfolder and meta-interpreter are similar. For each squaring of the input number, the their runtimes more than double. The benc hmarks results 13 Page 14: Original Summation Inputn Time 2164 2178 21816 21931 22062 221125 222249Runtime Repeated Recursion Unfolding InputnUnfolder Interpreter Total Time 2250.07 0.02 0.09 2500.07 0.04 0.11 21000.14 0.08 0.23 22000.29 0.17 0.46 24000.58 0.34 0.92 28001.17 0.70 1.88 216002.55 1.53 4.08 225+1 0.04 0.02 0.06 250+1 0.07 0.03 0.10 2100+1 0.15 0.06 0.21 2200+1 0.29 0.12 0.41 2400+1 0.58 0.25 0.83 2800+1 1.20 0.50 1.70 21600+1 2.45 1.00 3.44 Table 1: Benchmarks for Summation Example (times in milliseconds) obtained are consistent with the expected complexities of O(log(n)2) of the unfolder and meta-interpreter. Comparing Recursion Depths 2iand2i+1.In the meta-interpreter, each of the unfolded rules will be tried by unifying its head and checking its guard, but not all rules will be necessarily applied. This may lead to the seemingly counterintuit ive behavior that a larger query where only one unfolded rule is applied runs faste r than a smaller one where several less unfolded rules are applied. To see how pronounced this phenomena is, we compare timings for va lues ofnof the form 2iand 2i+ 1. For input numbers of the form 2i, all unfolded rules are applied. Input numbers of the form 2i+1, however, will need exactly one application of the most unfolded rule rito reach the base case. As a consequence the runtime of the meta - interpreter decreases by a linear factor of about 0 .7. The timings for the unfolder stay about the same, because only one more rule is generated for 2i+ 1 (e.g. n= 21600+ 1 generates 1601 rules). 5.2 Fibonacci Number Example This classical program generates the n-th Fibonacci number in a na ive ineﬃcient way using double recursion. fib(N, F) :- N > 1 ,!, N1 is N-1, N2 is N1-1, fib(N1, F1), fib(N2, F2), F is F1+F2. fib(N, F) :- N =< 1 ,!, F=N. The program features double recursion and exponential complexit y and therefore cannot 14 Page 15: be handled by runtime repeated recursion unfolding as deﬁned in [Fr¨ u24]. With our generalization in the meta-interpreter, it becomes possible. 5.2.1 Implementation of Runtime Repeated Recursion Unfold ing Rule Template. This time we start with the desired rule template. It is a generaliza- tion of the original recursive rule by keeping the two recursive calls a nd by introducing parameters in the sum. f(N, F) :- N > A ,!, (N1 is N-A, N2 is N1-1), (f(N1, F1), f(N2, F2)), F is P*F1+Q*F2. The parameters of the rule template that change during unfolding a reA, PandQ. For the original recursive rule, we have that A=P=Q=1. Unfolding Scheme. To ﬁnd the rule template and its unfolding scheme, the idea was to unfold both recursive calls and to simplify and to merge them so that again only two recursive calls remain. The derivation of this unfolding scheme is g iven in appendix A. The implementation is accomplished by the following clause for simpunf. We assume that the input is always a natural number (nonnegative integer). simp_unf((f(N, F) :- N > A ,!, (N1 is N-A, N2 is N1-1), (f(N1, F1), f(N2, F2)), F is P*F1+Q*F2), (f(Nc, Fc) :- Nc > Ac ,!, (N1c is Nc-Ac, N2c is N1c-1), (f(N1c, F1c), f(N2c, F2c)), Fc is Pc*F1c+Qc*F2c) ):- Ac is 2*A, QQ is Q*Q, Pc is P*P+QQ, Qc is 2*P*Q-QQ. Recursive Predicate. For the Fibonacci example, its rules are replaced by: fib(I,O) :- unf(f(I,O), [ (f(N, F) :- N > 1 ,!, (N1 is N-1, N2 is N1-1), (f(N1, F1),f(N2, F2)), F is 1*F1+1*F2 ), (f(N, F) :- N=<1 ,!, F=1, true, true)], URs), mip(f(I,O), URs). Unfolded Rules. The ﬁrst few rules that are returned by the unfolder are f(A,D):-A>16,!, (B is A-16,C is B-1), (f(B,E), f(C,F)), D is 1597*E+987*F. f(A,D):-A>8 ,!, (B is A-8, C is B-1), (f(B,E), f(C,F)), D is 34 *E+21*F. f(A,D):-A>4 ,!, (B is A-4, C is B-1), (f(B,E), f(C,F)), D is 5* E+3*F. f(A,D):-A>2 ,!, (B is A-2, C is B-1), (f(B,E), f(C,F)), D is 2* E+1*F. f(A,D):-A>1 ,!, (B is A-1, C is B-1), (f(B,E), f(C,F)), D is 1* E+1*F. 15 Page 16: 5.2.2 Complexity Original Recursion. The recursion depth is determined by the input number n. The original recursive rule for Fibonacci gives raise to the recurre nce equation f(n) = f(n−1) +f(n−2) +O(n).O(n) is the complexity of adding the Fibonacci numbers. These numbers grow exponentially with the input, thus their number of bits is linear to the input number. The solution of the recurrence results is the well- known exponential complexity in O(2n). Unfolder and Meta-Interpreter. In the unfolder, when unfolding rule ri, the input number nis checked to be larger than parameter Ain the guard, where Acan be shown to be 2i. Insimpunf/2the multiplications of the parameters of the rule template dominate the complexity. This leads to the recurrence u(n) =u(n/2) +M(n), where theM(n) is the time complexity of eﬃcient multiplication involving a number of nbits. The solution of the recurrence is in M(n). The complexity M(n) can be assumed to be O(nlog(n)log(log(n))). In the meta-interpreter, the applications of the unfolded rules de termine the complex- ity. According to the rule template, we have two recursive calls and t he multiplications with Fibonacci numbers dominate the complexity of a recursive step . This leads to the recurrence f(n) =f(n/2) +f(n/2) +M(n). The solution of the recurrence is in O(log(n)M(n)), i.e.O(nlog(n)2log(log(n))). Since the example features only one recursive rule, there is no need to use the round- robin rule processor umr/2. 5.2.3 Benchmarks Table 2 shows benchmarks results for the Fibonacci example. Times are in seconds. A time measurement of 0.0n means that it was below 0.01 but more than z ero. The timings for the original version show that theruntime is exponential as exp ected. The runtimes of theunfolderandinterpreter roughlydoublewitheachdoublingofth einputnumber which is consistent with the estimated complexities that are somewhat wor se than log-linear in the input number n. Comparing Recursion Depths 2iand2i+ 1.Fornof the form 2iall unfolded rules are applied in the meta-interpreter. The meta-interpreter is two orders of mag- nitude slower than the unfolder. The sum of runtimes is therefore d ominated by the meta-interpreter. For nof the form 2i+1 on the other hand, only the most unfolded rule is applied in the meta-interpreter. Its runtime is neglectable becaus e the multiplications involve only the Fibonacci number 1. The sum of runtimes is therefor e very much domi- nated by the unfolder. The total time is almost two orders of magnit ude faster for 2i+1 than for 2i. 5.3 Greatest Common Divisor Example The Euclidean algorithm is a method for ﬁnding the GCD of two numbers by repeatedly subtracting the smaller number from the larger number until both a re the same. The implementation naturally lends itself to using two recursive rules. g(M, N, X) :- M<N ,!, L is N-M, g(M, L, X). 16 Page 17: Original Fibonacci InputnTime 250n 25+1 1 25+2 1 25+3 2 25+4 3 25+5 5 25+6 8 25+7 13 25+8 22Runtime Repeated Recursion Unfolding InputnUnfolder Interpreter Total Time 2180,0n 0.18 0.18 2190,0n 0.34 0.34 2200,0n 0.66 0.66 2210.01 1.28 1.28 2220.02 2.48 2.50 2230.03 4.84 4.88 2240.07 9.42 9.49 218+1 0,0n 0,000n 0,0n 219+1 0,0n 0,000n 0,0n 220+1 0.01 0.0001 0.01 221+1 0.02 0.0001 0.02 222+1 0.03 0.0001 0.04 223+1 0.07 0.0002 0.07 224+1 0.15 0.0003 0.15 Table 2: Benchmarks for Fibonacci Example g(M, N, X) :- M>N ,!, L is M-N, g(L, N, X). g(M, M, M). Since we have multiple recursive rules, we use ourextended implement ation ofgeneralized runtime repeated recursion unfolding from Section 4 that relies on t he predicate umr/2. 5.3.1 Implementation of Runtime Repeated Recursion Unfold ing Rule Template. Notethat the two recursive rules have a similar structure and ther efore can be unfolded in an analogous way. Unfolding the rules once with the mselves results in the two rules: g(M,N,X) :- 2*M<N ,!, L is N-2*M, g(M,L,X). g(M,N,X) :- M>2*N ,!, L is M-2*N, g(L,N,X). The generalization is apparent. It is the introduction of a paramete rAin place of the number2with which we multiply the smaller number. This results in the following template for the two rules: g(M,N,X) :- A*M<N ,!, L is N-A*M, g(M,L,X). g(M,N,X) :- M>A*N ,!, L is M-A*N, g(L,N,X) Unfolding Scheme. From unfolding the rule templates we derive the following two clauses for simpunf, where parameter Ais doubled with each unfolding. simp_unf((g(M,N,X) :- A*M<N ,!, L is N-A*M, g(M,L,X), true) , (g(M1,N1,X1) :- A1*M1<N1 ,!, L1 is N1-A1*M1, g(M1,L1,X1), t rue)) :- A1 is 2*A. simp_unf((g(M,N,X) :- M>A*N ,!, L is M-A*N, g(L,N,X), true) , 17 Page 18: (g(M1,N1,X1) :- M1>A1*N1 ,!, L1 is M1-A1*N1, g(L1,N1,X1), t rue)) :- A1 is 2*A. The call is as follows: umr(g(A,B,_),[ [(g(M,N,Z) :- 1*M<N ,!, L is N-1*M, g(M,L,Z), true), (g(M,M,M) :- true ,!, true, true, true)], [(g(M,N,Z) :- M>1*N ,!, L is M-1*N, g(L,N,Z), true), (g(M,M,M) :- true ,!, true, true, true)] ]) The ﬁrst few unfolded rules for both recursive clauses are: g(M,N,Z) :- 1*M<N ,!, L is N-1*M, g(M,L,Z), true. g(M,N,Z) :- 2*M<N ,!, L is N-2*M, g(M,L,Z), true. g(M,N,Z) :- 4*M<N ,!, L is N-4*M, g(M,L,Z), true. g(M,N,Z) :- 8*M<N ,!, L is N-8*M, g(M,L,Z), true. g(M,N,Z) :- M>1*N ,!, L is M-1*N, g(L,N,Z), true. g(M,N,Z) :- M>2*N ,!, L is M-2*N, g(L,N,Z), true. g(M,N,Z) :- M>4*N ,!, L is M-4*N, g(L,N,Z), true. g(M,N,Z) :- M>8*N ,!, L is M-8*N, g(L,N,Z), true. 5.3.2 Complexity Original Recursion. The worst case for the original GCD program is when the input numbers are nand 1. Then the repeated subtraction results in a recursion depth ofn. Subtraction like addition has logarithmic complexity in the number. The n the recurrence g(n) =O(log(n))+g(n−1) leads to the worst-case time complexity of O(n log(n)). Unfolder, Meta-Interpreter and Round-Robin Processor. In the unfolder and meta-interpreter, the worst-case time complexity is dominated by the generation and application of the unfolded rules. In the unfolder, the multiplications of the parameters in the rule templates of simpunf/2dominate the complexity. These multiplications are by a power of 2 and in this special case have the same complexity as ad ditions. The recurrence is of the form g(n) =O(log(n)) +g(n/2) with a resulting worst-case time complexity O(log(n)2). Similarly, in the meta-interpreter, the multiplications, again by a power of 2, in the unfolded rules lead to the same complexity. In the round-robin rule processor umr/2, we apply the unfolder and meta-interpreter for several rounds (recursive steps). Since each round applies a t least one rule in the meta-interpreter, a crude upper bound for the number of round s isnwhen we alternate between applying the two original rules because no unfolding is possib le. For a more precise measure, assume a pair of numbers nandmfor GCD with n > m. One round withumrwill apply one recursive rule and its unfoldings to exhaustion and thus will lead to a pair of numbers n′andmwheren′< m. Now it holds that n′< n/2, because we keep subtracting mfromnuntiln′< mand we can subtract at least once, so n′≤n−m. Adding these two inequations gives 2 n′< nand thus n′< n/2. Hence each round, i.e. each change in rules, at least halves the larger number. Hence the n umber of rounds is in 18 Page 19: O(log(n)). Thus the overall worst-case time complexity is in O(log(n))O(log(n)2) which isO(log(n)3). This is suﬃcient for a super-linear speedup compared to O(n log(n)). 5.3.3 Benchmarks Times are in seconds. We discuss two sets of benchmarks. Table 3 sh ows some bench- marks results for the GCD example. We use powers of 2 for the large r input number n and a 37 for the second input number. For the original rules we use s uccessive exponents up to 33, the runtime of the original rules is consistent with the estim ated worst-case complexity O(n log(n)). With our generalized runtime repeated recursion unfolding, the runtime is so fast that we start from an exponent 5000 and increas e by a factor of about√ 2 up to 40000. The runtime is in the estimated complexity of O(log(n)3), but also consistent with the better O(log(n)2). Since the second input number is ﬁxed to 37, after one round in umrthe ﬁrst number will be smaller than 37 and the remaining runtime will be neglectable. Original GCD Inputn,37Time 2270.35 2280.64 2291.27 2302.52 2315.04 23210.09 23320.40Runtime Repeated Recursion Unfolding Inputn,37 Total Time 250000.015 270000.021 2100000.032 2140000.045 2200000.067 2280000.094 2400000.148 Table 3: Benchmarks for GCD Example 1 Table 4 shows results for another pair of inputs, n= 2kandm= 2k//2+2k//4−1, i.e. mis somewhat larger than the square root of n. For the original rules we use successive exponents up to 46, the runtime is consistent with the estimated wo rst-case complexity O(n log(n)). If we multiply the inputs by about 4, the runtime roughly doubles. So the actualcomplexityforthistypeofnumberpairsisin O(√n). Withourgeneralizedruntime repeated recursion unfolding, the runtime is in the estimated comple xity ofO(log(n)3), but also again consistent with the better O(log(n)2). 6 Related Work Program transformation to improve eﬃciency is usually concerned with a strategy for combining unfolding [SS94, Pre93] and folding to replace code (for an overview see e.g. [Vis05, PPFDA24]). The transformations are typically performed oﬄ ine, at compile- time. Program transformation for speciﬁc aims and applications is ab undant in logic programming in general [PP99]. Partial evaluation (partial deduction, program specializ ation)[LB02] is a program transformation to execute programs with partially known input to s pecialize it, typi- cally at compile-time. Compile-time partial evaluation alone cannot ach ieve super-linear 19 Page 20: Original GCD InputnTime 2400.07 2410.13 2420.13 2430.27 2440.27 2450.53 2460.52Runtime Repeated Recursion Unfolding Inputn Total Time 250000.015 270000.023 2100000.039 2140000.072 2200000.138 2280000.277 2400000.611 Table 4: Benchmarks for GCD Example 2 speedup up (Chapter 6 in [JGS93]). This result does not apply to our a pproach, because we use a runtime transformation and strongly rely on simpliﬁcation. I n [BLR10] just-in- time partial evaluation of Prolog is introduced consisting of 1500 lines of Prolog code. A linear speedup of up to a factor of 5 was reported in benchmarks. In general, super-linear speedups by program transformation are rare and mostly con- cern parallel programs. Our technique applies to sequential progr ams. In a sequential setting, super-linear speedups can sometimes be achieved with memoization , where the results of recursive calls are cached and reused if the same recurs ive call reappears later on.Tupling[HITT97] applies when several recursions operate on the same dat a struc- ture. Then tupling tries to merge these recursions into a single one. Then there is work based on supercompilation for functional programming languages like Refal and Haskell. In advanced forms of this oﬄine program transformation such as d istillation [Ham09] and equality indices [GKN16], sophisticated generalization while unfoldin g increases the chance for folding and can achieve super-linear speedup. In contrast, our technique relies solely on unfolding and simplifying th e combined recursive steps again and again. We add redundant rules this way bu t never remove any. Further related work is discussed in the main paper [Fr¨ u24]. 7 Conclusions In this companion paper to [Fr¨ u24], we generalized runtime repeate d recursion unfolding for multiple recursion and for multiple recursive rules and implemented it in the logic programming language Prolog. We provided a lean implementation of ou r approach in ten clauses, comprising the unfolder, a generalized meta-interpre ter and a novel round- robin rule processor to handle multiple recursive rules. We showed wit h benchmarks for several classical algorithms that the super-linear speedup ind eed holds in practice. We have considered recursive clauses that have green cuts for eﬃ ciency. Preliminary experiments show that our implementation also works without such c uts, but the eﬀects on runtime have to be further investigated. We have not yet cover ed mutual recursion, but do not see problems in doing so. Runtime repeated recursion unfolding hinges on ﬁnding the appropr iate rule template for unfolding for the given problem at compile-time. This requires insig ht into the given problem and cannot be fully automated. In the main paper [Fr¨ u24], w e discuss these and 20 Page 21: other issues and limitations of runtime repeated recursion unfolding and suggest some possible improvements as well. In the main paper we proved runtime repeated recursion unfolding f or single direct recursions correct for CHR (Constraint Handling Rules) by showing the redundancy of unfolded recursive rules and their termination. The proof ideas are expected to carry over to our generalization in Prolog. In the main paper we also proved a suﬃcient and necessary condition for super-linear speedup. For a given recurs ion, then one tries to ﬁnd an unfolding with an improved time complexity that satisﬁes the condit ion. If it can be found, a super-linear speedup is guaranteed. It is left for future work to extend this result to multiple recursion and to Prolog. We think our approach can also be applied to functional programming languages. For mainstream programming languages it should be possible to adapt the technique to loops as well. Overall, our generalized runtime repeated recursion unfoldin g provides a promis- ing strategy for online optimization of recursions in which the suﬃcien t user-deﬁnable simpliﬁcation of combined successive recursive steps leads to predic table speedups that can be super-linear. Acknowledgements. This research was performed during the sabbatical of the author in the winter semester of 2024/25. We thank Jesper Larss on Tr¨ aﬀ for inspiring discussions. References [BLR10] CarlFriedrichBolz, Michael Leuschel, andArminRigo. Towar dsjust-in-time partialevaluationofprolog. In Logic-Based Program Synthesis and Transfor- mation: 19th International Symposium, LOPSTR 2009, Coimbr a, Portugal, September 2009, Revised Selected Papers 19 , pages 158–172. Springer, 2010. [Fr¨ u20] Thom Fr¨ uhwirth. Repeated recursion unfolding for supe r-linear speedup within bounds. Pre-Proceedings of the 30th International Symposium on Logic-Based Program Synthesis and Transformation (LOPSTR 2020), full version arXiv preprint arXiv:2009.05314 , 2020. [Fr¨ u24] Thom Fr¨ uhwirth. Runtime repeated recursion unfolding: A just-in-time on- line program optimization that can achieve super-linear speedup. submitted to Fundamenta Informaticae, arXiv preprint arXiv:2307.02 180, 2024. [Fr¨ u25] ThomFr¨ uhwirth. Principlesof Rule-BasedProgramming .BooksonDemand, 2025. [GKN16] Robert Gl¨ uck, Andrei Klimov, and Antonina Nepeivoda. No nlinear con- ﬁgurations for superlinear speedup by supercompilation. In Fifth Interna- tional Valentin Turchin Workshop on Metacomputation , page 32. University of Pereslavl, 2016. [Ham09] Geoﬀ W Hamilton. Extracting the essence of distillation. In International Andrei Ershov Memorial Conference on Perspectives of Syste m Informatics , pages 151–164. Springer, 2009. 21 Page 22: [HITT97] Zhenjiang Hu, Hideya Iwasaki, Masato Takeichi, and Akihiko Takano. Tu- pling calculation eliminates multiple data traversals. ACM Sigplan Notices , 32(8):164–175, 1997. [HR99] Matthew M Huntbach and Graem A Ringwood. Agent-oriented program- ming: from Prolog to guarded deﬁnite clauses . Springer Science & Business Media, 1999. [JGS93] Neil D. Jones, Carsten K. Gomard, and Peter Sestoft. Partial evaluation and automatic program generation . Prentice Hallinternational series incomputer science. Prentice Hall, 1993. [Kow14] R. Kowalski. Logic for Problem Solving, Revisited . Computer Science Es- sentials. Books on Demand, 2014. [LB02] Michael Leuschel and Maurice Bruynooghe. Logic program s pecialisation through partial deduction: Control issues. Theory and Practice of Logic Programming , 2(4-5):461–515, 2002. [PP99] Alberto Pettorossi and Maurizio Proietti. Synthesis and tra nsformation of logic programs using unfold/fold proofs. The Journal of Logic Programming , 41(2-3):197–230, 1999. [PPFDA24] Alberto Pettorossi, Maurizio Proietti, Fabio Fioravanti, a nd Emanuele De Angelis. A historical perspective on program transformation an d re- cent developments (invited contribution). In Proceedings of the 2024 ACM SIGPLAN International Workshop on Partial Evaluation and P rogram Ma- nipulation , pages 16–38, 2024. [Pre93] Steven David Prestwich. An unfold rule for full prolog. Logic-based Program Synthesis and Transformation , pages 199–213, 1 1993. [SS94] LeonSterlingandEhudYShapiro. The art of Prolog: advancedprogramming techniques . MIT press, 1994. [Vis05] Eelco Visser. A survey of strategies in rule-based program t ransformation systems. Journal of symbolic computation , 40(1):831–873, 2005. [WSTL12] Jan Wielemaker, Tom Schrijvers, Markus Triska, and Torb j¨ orn Lager. Swi- prolog.Theory and Practice of Logic Programming , 12(1-2):67–96, 2012. A Fibonacci Numbers, Derivation of Rule Scheme for Unfolding To ﬁnd the unfolding scheme for simpunf/2, the idea is to unfold both recursive calls, to simplify and merge them so that again only two recursive calls remain . Recall the rule template for Fibonacci numbers 22 Page 23: f(N, F) :- N > A ,!, (N1 is N-A, N2 is N1-1), (f(N1, F1), f(N2, F2)), F is P*F1+Q*F2. In the rule template, we ﬁrst unfold both recursive calls and collect t he guards together with the relevant arithmetic computations. f(N, F) :- N > A, (N1 is N-A, N2 is N1-1), N1 > A, N2 > A ,!, % remaining unfolding of f(N1, F1) (N11 is N1-A, N21 is N11-1), (f(N11, F11), f(N21, F21)), F1 is P*F11+Q*F21, % remaining unfolding of f(N2, F2), (N12 is N2-A, N22 is N12-1), (f(N12, F12), f(N22, F22)), F2 is P*F12+Q*F22, % F is P*F1+Q*F2. According to the original rule template, we want to have just two re cursive calls. Consider the variables N12andN21. They depend on N1in the following way: N12 is N2-A, N2 is N1-1 andN21 is N11-1, N11 is N1-A . We observe that this implies that N21=N12 and therefore F21=F12, since Fibonacci is a function. Hence we do not need to compute f(N12, F12) and can remove it. f(N, F) :- N > A, (N1 is N-A, N2 is N1-1), N1 > A, N2 > A ,!, % remaining unfolding of f(N1, F1) (N11 is N1-A, N21 is N11-1), (f(N11, F11), f(N21, F21)), F1 is P*F11+Q*F21, % remaining unfolding of f(N2, F2), (N12 is N2-A, N22 is N12-1), (F12=F21, f(N22, F22)), F2 is P*F12+Q*F22, % F is P*F1+Q*F2. Thisleavesthreerecursivecallswithsuccessive inputnumbersindec reasingorder, f(N11, F11), f(N21, F21), f(N22, F22) . By deﬁnition of Fibonacci numbers, we have that F11 is F21+F22 . We can use this equation to compute F22byF22 is F11-F21 instead of computing it with the third recursive call f(N22, F22) . f(N, F) :- N > A, (N1 is N-A, N2 is N1-1), N1 > A, N2 > A ,!, % remaining unfolding of f(N1, F1) (N11 is N1-A, N21 is N11-1), (f(N11, F11), f(N21, F21)), F1 is P*F11+Q*F21, 23 Page 24: % remaining unfolding of f(N2, F2), (N12 is N2-A, N22 is N12-1), (F12=F21, F22 is F11-F21), F2 is P*F12+Q*F22, % F is P*F1+Q*F2. We have successfully removed the last two recursive calls from the u nfolding of f(N2,F2) . Now we simplify the arithmetic expressions, ﬁrst by removing variable s that have be- comesuperﬂuous. Thevariables N12, N22 occurringin N12 is N2-A, N22 is N12-1 are no longer needed. The condition N2 > Athat remained from the unfolding of f(N2,F2) is no longer needed and thus also N2can be removed. f(N, F) :- N > A, (N1 is N-A), N1 > A ,!, % remaining unfolding of f(N1, F1) (N11 is N1-A, N21 is N11-1), (f(N11, F11), f(N21, F21)), F1 is P*F11+Q*F21, % remaining unfolding of f(N2, F2), (F12=F21, F22 is F11-F21), F2 is P*F12+Q*F22, % F is P*F1+Q*F2. We continue simpliﬁcation by removing the intermediate variables N1, F1, F2, F12 and F22. f(N, F) :- N > A, N-A > A ,!, % remaining unfolding of f(N1, F1) (N11 is N-A-A, N21 is N11-1), (f(N11, F11), f(N21, F21)), % F is P*(P*F11+Q*F21)+Q*(P*F21+Q*(F11-F21)). We further simplify the remaining arithmetic expressions to arrive ﬁn ally at: f(N, F) :- N > 2*A ,!, % remaining unfolding of f(N1, F1) (N11 is N-2*A, N21 is N11-1), (f(N11, F11), f(N21, F21)), % F is (P*P+Q*Q)*F11+(2*P*Q-Q*Q)*F21. This rule ﬁts the rule template that we have initially devised. B Further Examples The examples for list reversal and sorting are taken from the main p aper [Fr¨ u24], where the details can be found such as the derivation of the unfolding sche me and of the time 24 Page 25: complexity results. We did a straightforward translation from CHR [F r¨ u25] to Prolog and performed the benchmarks. They show the expected super- linear speedups. B.1 List Reversal Example The classical program reverses a given list in a naive way. It takes th e ﬁrst element of the list, reverses its remainder and adds the element to the end of t he reversed list. The Prolog predicate r(A,B)holds if list Bis the reversal of list A. r(E, D) :- E=[C|A], !, r(A, B), append(B, [C], D). r(E, D) :- E=[], !, D=[]. The predicate append(X,Y,Z) concatenates two lists XandYinto a third list Z. Its runtime is linear in the length (number of elements) of the ﬁrst list. B.1.1 Implementation of Runtime Repeated Recursion Unfold ing Unfolding with Simpliﬁcation. The unfolding scheme for list reversal is implemented with the following clause for simpunf/2. simp_unf( (r(A,B) :- A=E ,!, true, r(C,D), append(D,F,B)), % given rul e template (r(Al,Bl) :- Al=El ,!, true, r(Cl,Dl), append(Dl,Fl,Bl)) % unfolded r. ) :- copy_term((E,C,F),(El,Cc,Fc1)), copy_term((E,C,F),(Ec,Cl,Fc2)), Cc=Ec, append(Fc2,Fc1,Fl). During unfolding, in the given rule template, the variable Ein the guard will be instan- tiated with an open list ending in the variable C. The list Finappend/3 then consists of the element variables of Ein reversed order. In the unfolded rule template, the number of elements in these two lists is doubled and their relationship of rever sal is maintained. Recursive Predicate. For list reversal, its rules are replaced by: rev(I,O) :- unf(r(I,O), [ (r(A, E) :- A=[D|B] ,!, true, r(B, C), append(C, [D], E), (r(A, B) :- A=[] ,!, B=[], true, true) ], URs), mip(r(I,O), URs). The list in the second argument of unf/3contains the original recursive rule and the rule for the base case in appropriate template form. Unfolded Rules. The rules that are returned by the unfolder unf/3for a query with 17 list elements are 25 Page 26: r(A, T) :- A=[S,R,Q,P,O,N,M,L,K,J,I,H,G,F,E,D|B] ,!, true, r(B, C), append(C, [D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R ,S], T). r(A, L) :- A=[K,J,I,H,G,F,E,D|B] ,!, true, r(B, C), append(C, [D,E,F,G,H,I,J,K], L). r(A, H) :- A=[G,F,E,D|B] ,!, true, r(B, C), append(C, [D,E,F ,G], H). r(A, F) :- A=[E,D|B] ,!, true, r(B, C), append(C, [D,E], F). r(A, E) :- A=[D|B] ,!, true, r(B, C), append(C, [D], E). r(A, B) :- A=[] ,!, B=[], true, true. The number of element variables in the lists doubles with each unfolding but will never exceedn, the length of the input list in the query. Hence the unfolded rules do not increase overall space complexity. B.1.2 Benchmarks Table 5 shows benchmarks results for the list reversal example. Th e list sizes (lengths) n are powers of 2. Times are in seconds. The experiments conﬁrm the super-linear speedup using runtime repeated recursion unfolding. Original list reversal Inputn Time 290.02 2100.04 2110.09 2120.31 2131.21 2144.81 21519.31Runtime Repeated Recursion Unfolding InputnUnfolder Interpreter Total Time 213−10.0n 0.0n 0.0n 214−10.0n 0.0n 0.01 215−1 0.01 0.01 0.02 216−1 0.02 0.01 0.03 217−1 0.03 0.02 0.06 218−1 0.07 0.05 0.12 219−1 0.13 0.10 0.23 2130.0n 0.0n 0.01 2140.01 0.0n 0.01 2150.02 0.01 0.03 2160.03 0.02 0.05 2170.07 0.04 0.11 2180.13 0.08 0.21 2190.26 0.15 0.42 Table 5: Benchmarks for List Reversal Example Original Recursion. For the original recursion, the benchmarks indicate a com- plexity consistent with the expected O(n2). From 212on, doubling the list size increases the runtime by a factor of about four. Unfolder and Meta-Interpreter. All measured runtimes are consistent with a linear complexity O(n). For list size n= 213, runtime repeated recursion unfolding is already two orders of magnitude faster than the original recursio n. Comparing Recursion Depths 2i−1and2i.We give timings for list lengths n of the form 2iand their predecessor numbers 2i−1. In the meta-interpreter, the runtime of applying allunfolded rules (case of n= 2i−1) is less than of applying just the next 26 Page 27: larger unfolded rule (which has twice the size) (case of n= 2i). The unfolder takes longer than the meta-interpreter. Going from 2i−1to 2i, the unfolder generates one more rule and the time spent roughly doubles. B.2 Sorting Example The classical insertion sort program sorts the numbers given in a list in ascending order: s(L,S) :- L=[A|L1] ,!, s(L1,S1), i(A,S1,S). s([],S):- !, S=[]. The predicate i(A, S1, S) inserts a number Ainto the sorted list S1such that the resulting list Sis sorted. B.2.1 Implementation of Runtime Repeated Recursion Unfold ing In the rule template that we derive in [Fr¨ u24], we use a built-in for me rgingm(S1,S2,S3) instead of insertions. It merges two sorted lists into a third sorted list. Unfolding with Simpliﬁcation. Relyingontheruletemplate, theunfoldingscheme is deﬁned the following clause for simpunf. simp_unf( Rule, % given rule (s(L,S) :- L=AL ,!, MG, s(L2,S2), m(S0,S2,S)) % unfolded rul e template ):- copy_term(Rule, (s(L,S) :- L=AL ,!, MG1, s(L1,S1), m(S3,S1 ,S))), copy_term(Rule, (s(L1,S1) :- L1=AL1 ,!, MG2, s(L2,S2), m(S 4,S2,S1))), clean((MG1, MG2, m(S3,S4,S0)), MG), L1=AL1. We copy the input rule twice onto instances of the rule template to sim ulate the unfolding of the recursive call. The predicate clean/2 removes superﬂuous truebuilt-ins in the resulting mergings. Finally, executing L1=AL1will double the size of the open list AL which ends in L1. Recursive Predicate. For the sorting example, its rules are replaced by: sort(I,O) :- unf(s(I,O), [ (s(A,E) :- A=[C|B] ,!, true, s(B,D), m([C],D,E)), (s([],A):- true ,!, A=[], true, true) ], URs), mip(s(I,O), URs). We write the original recursive clause according to the rule template using merge m/3 instead of insert i/3. Unfolded Rules. The ﬁrst few rules that are returned by the unfolder are 27 Page 28: s(A,S) :- A=[C,B,E,D,I,H,K,J|P] ,!, ((m([B],[C],G), m([D],[E],F), m(F,G,O)), (m([H],[I],M), m([J],[K],L), m(L,M,N)), m(N,O,Q)), s(P,R), m(Q,R,S). s(A,K) :- A=[C,B,E,D|H] ,!, (m([B],[C],G), m([D],[E],F), m(F,G,I)), s(H,J), m(I,J,K). s(A,G) :- A=[C,B|D] ,!, m([B],[C],E), s(D,F), m(E,F,G). s(A,E) :- A=[C|B] ,!, true, s(B,D), m([C],D,E). s([],A):- true ,!, A=[], true, true. As with list reversal, the rule size roughly doubles with each unfolding, but again this does not increase the space complexity. B.2.2 Benchmarks Table 6 shows benchmarks results for the sorting example. Times ar e in seconds. The benchmarks are performed with random permutations of integers from 1 to n. The experiments conﬁrm the super-linear speedup. Original Sorting InputnTime 280.0n 290.02 2100.05 2110.13 2120.48 2131.93 2147.77Runtime Repeated Recursion Unfolding InputnUnfolder Interpreter Total Time 212−1 0.01 0.02 0.03 213−1 0.01 0.03 0.03 214−1 0.01 0.04 0.04 215−1 0.01 0.07 0.08 216−1 0.02 0.15 0.17 217−1 0.05 0.30 0.34 218−1 0.09 0.64 0.74 2120.0n 0.01 0.01 2130.01 0.02 0.02 2140.01 0.03 0.05 2150.02 0.07 0.09 2160.05 0.14 0.18 2170.09 0.29 0.38 2180.18 0.60 0.78 Table 6: Benchmarks for Sorting Example Original Recursion. The experiments for the original version of insertion sort indicate a complexity that is indeed quadratic O(n2). Doubling the list length increases the runtime by a factor of four from n= 211on. Unfolder and Meta-Interpreter. All timings are linear in nor slightly worse. The runtimes of the unfolder are consistent with the expected linea r complexity O(n). The meta-interpreter timings are consistent with the expected log -linear complexity O(nlog(n)). The generation of all rules in the unfolder takes a fraction of th e time of applying one or more rules in the meta-interpreter. 28 Page 29: Comparing Recursion Depths 2i−1and2i.Going from input list length 2i−1 to 2i, the unfolder generates one more rule. It has twice the size of the previous rule. And indeed the runtime for the unfolder roughly doubles. Going from list length 2i−1 to 2i, the meta-interpreter applies all unfolded rules in the ﬁrst case bu t only the next more unfolded rule in the second case. Its runtime decreases some what. But overall, the total runtime increases somewhat. 29