Paper Content:
Page 1:
arXiv:2503.10416v1 [cs.PL] 13 Mar 2025Super-Linear Speedup by Generalizing Runtime
Repeated Recursion Unfolding in Prolog
Thom Fr¨ uhwirth
Ulm University, 89069 Ulm, Germany
thom.fruehwirth@uni-ulm.de
March 14, 2025
Abstract
Runtime repeated recursion unfolding was recently introdu ced in [Fr¨ u24] as a
just-in-timeprogramtransformationstrategythatcanach ievesuper-linearspeedup.
So far, the method was restricted to single linear direct rec ursive rules in the pro-
gramming language Constraint Handling Rules (CHR). In this companion paper,
we generalize the technique to multiple recursion and to mul tiple recursive rules
and provide an implementation of the generalized method in t he logic program-
ming language Prolog.
Thebasicideaoftheapproachisasfollows: Whenarecursive call isencountered
at runtime, the recursive rule is unfolded with itself and th is process is repeated
with each resulting unfolded rule as long as it is applicable to the current call. In
this way, more and more recursive steps are combined into one recursive step. Then
an interpreter applies these rules to the call starting from the most unfolded rule.
For recursions which have sufficiently simplifyable unfoldi ngs, a super-linear can be
achieved, i.e. the time complexity is reduced.
We implement an unfolder, a generalized meta-interpreter a nd a novel round-
robin rule processor for our generalization of runtime repe ated recursion unfolding
with just ten clauses in Prolog. We illustrate the feasibili ty of our technique with
worst-case time complexity estimates and benchmarks for so me basic classical al-
gorithms that achieve a super-linear speedup.
Keywords: Logic Programming, Prolog, Rule-Based Programming, Just-In-Tim e
Program Transformation, Runtime Program Optimization, Online Pro gram Specializa-
tion, RepeatedRecursion Unfolding, Super-Linear Speedup, Recu rsion, Meta-Interpreter,
Time Complexity.
Contents
1 Introduction 2
1
Page 2:
2 Repeated Recursion Unfolding in Prolog 4
2.1 Semantics of Prolog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Rule Unfolding with Simplification . . . . . . . . . . . . . . . . . . . . . 5
2.3 Repeated Recursion Unfolding . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Optimal Rule Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 Implementation of Runtime Repeated Recursion Unfolding i n Prolog 7
3.1 Unfolder Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Generalized Meta-Interpreter Implementation for Multiple Recu rsion . . 9
3.3 Recursive Predicate Implementation . . . . . . . . . . . . . . . . . . . . . 10
4 Generalization to Multiple Recursive Rules 10
4.1 Meta-Interpreter with Continuation Goals . . . . . . . . . . . . . . . . . 11
4.2 Round-Robin Rule Processor . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Recursive Predicate Implementation . . . . . . . . . . . . . . . . . . . . . 12
5 Experimental Evaluation with Benchmarks 12
5.1 Summation Example, Contd. . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2 Fibonacci Number Example . . . . . . . . . . . . . . . . . . . . . . . . . 14
5.3 Greatest Common Divisor Example . . . . . . . . . . . . . . . . . . . . . 16
6 Related Work 19
7 Conclusions 20
A Fibonacci Numbers, Derivation of Rule Scheme for Unfoldin g 22
B Further Examples 24
B.1 List Reversal Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
B.2 Sorting Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1 Introduction
In this companion paper, we generalize our online program optimizatio n strategy of run-
time repeated recursion unfolding from the main paper [Fr¨ u24] to e nable super-linear
speedups for more classes of recursion. We generalize the approa ch to multiple recur-
sion and to multiple recursive rules and we provide an implementation of the generalized
method in the logic programming language Prolog.
Unfolding isaprogramtransformationthatbasically replacesacallin thebody(right-
hand side) of a rule with the body of a rule whose head (left-hand side ) fits the call.
Runtime Repeated recursion unfolding [Fr¨ u20, Fr¨ u24] first unfolds a given recursive rule
with itself andsimplifies it. This results in a specialized recursive rule tha t, when applied,
covers two combined recursive steps instead ofone. The techniqu e continues to unfoldthe
last unfolded recursive rule with itself until the resulting rule is no long er applicable to
the given call. Each unfolding doubles the number of recursive steps covered. Then, the
rules are applied to the given recursive call, with the most unfolded ru le being tried first,
continuing until the original rule and finally a base case are reached. If the unfolded rules
2
Page 3:
admitted sufficient simplification of the combined recursive steps, th is approach results
in a proven super-linear speedup [Fr¨ u24].
In this paper, we assume some familiarity with Prolog [SS94]. For efficie ncy, we
assume that each recursive clause has a green cut . To ease the explanation of the method
in this paper, we use the term guardfor the goals before the cut and the term bodyonly
for the goals after the cut.
Example 1.1 (Summation) Consider the following introductory example adapted from
[Fr¨ u24], a simple recursive program written here in the abs tract syntax of Prolog. It
recursively adds all numbers from 1ton. Rulebcovers the base case and rule rcovers
the recursive case. The headsum(N,S),guard(e.g.N=1) andbodyof a rule are
separated by the symbols ←and cut!, respectively.
b=sum(N,S)←N= 1,!,S= 1
r=sum(N,S)←N >1,!,sum(N−1,S1),S:=N+S1
Unfolding the recursive rule with a copy of itself and simpli fying the resulting rule gives
r1=sum(N,S)←N >2,!,sum(N−2,S1′),S:= 2∗N−1+S1′
Note that this rule r1behaves like applying the original rule rtwice. With rule r1we
only need about half as many recursive steps as with the origi nal rule alone. Because the
arithmetic computation is simplified, we can also expect to h alve the runtime.
We proceed with unfolding rule r1with itself:
r2=sum(N,S)←N >4,!,sum(N−4,S1),S:= 4∗N−6+S1
This rule results in fourfold speedup. We can continue this p rocess, doubling the speed
each time1. This kind of simplification of the combined recursive steps in the unfolded
rules is sufficient to achieve a super-linear speedup. The num ber of unfoldings depends
on the given call. The most unfolded rule should cover as many recursive steps of the call
as possible. For example, for N=4we will unfold till rule r1with guard N>2, forN=5
till ruler2withN>4, forN=50till ruler5withN>32.
Repeated recursion unfolding requires unfolding on-the-fly becau se the number of
unfoldings depends on the current call. We do not want to modify the given program
at runtime. Therefore, the method also introduces an interprete r for the unfolded rules.
Thismeta-interpreter2tries and applies each unfolded rule at most once starting with
the given call and the most unfolded rule.
Overview of the Paper. Asacompanionpaper, thisworkisnotfullyself-contained.
To make it more accessible, we reuse some parts from the main paper [Fr¨ u24].
After this introduction, Section 2 defines the program transformation method of run-
time repeated recursion unfolding with simplification and an optimal ru le application
strategy for the logic programming language Prolog. The exposition follows the one in
the main paper, but replaces CHR with Prolog.
1Clearly there is a closed-form solution for this problem, S=N∗(N+1)/2, but this is not the point
of the example.
2A meta-interpreter interprets a program written in its own implemen tation language (Chapter 8 in
[HR99]).
3
Page 4:
Section 3 presents our lean implementation of the unfolder and meta-interpr eter to
perform repeated recursion unfolding with optimal rule applications at runtime, again
following the main paper. Section 4 generalizes our implementation of runtime repeated
recursion unfolding for multiple recursive rules using a new program la yer called the
round-robin rule processor. This is the main contribution of this com panion paper. The
complete implementation consists of just ten clauses written in Prolo g. It has little
overhead, the runtime mainly depends on the given recursive rule an d its unfoldings.
Section 5 contains the experimental evaluation of our technique on some exa mples,
summation, Fibonacci numbers and greatest common divisor (GCD) . Fibonacci features
multiple recursion and GCD features multiple recursive rules. We deriv e the necessary
simplifications, discuss the time complexity and compare it with the res ult of benchmarks
to verify the super-linear speedup.
Section 6 discusses related work. Finally, the conclusion discusses achieveme nts, lim-
itations and future work of generalized runtime repeated recursio n unfolding. In the
appendix we give two additional examples, list reversal and sorting f rom [Fr¨ u24].
2 Repeated Recursion Unfolding in Prolog
We give a short definition of the semantics of Prolog and of rule unfold ing in Prolog
[SS94]. Then we introduce repeated recursion unfolding based on [F r¨ u24] and define an
optimal rule application strategy.
2.1 Semantics of Prolog
Logic Programming [SS94, Kow14] is a programming and knowledge rep resentation
paradigm based on first-order logic. We assume some familiarity with t he syntax and
semantics of the logic programming language Prolog: Goalsare possibly empty conjunc-
tions of atoms. They are denoted by upper case letters in definition s. To avoid clutter,
we use simple commas to denote logical conjunction. Prolog built-in pr edicates include
trueandfalse, the syntactic equality = /2 between terms and arithmetic operations
using the arithmetic equality := /2. Syntactic equality tries to unify its arguments, i.e.
making them syntactically identical by instantiating their variables ap propriately.
To define the operational semantics of Prolog, we assume a transit ion system with a
transition relation /ma√sto→between states that are goals. An initial state is any goal, a final
state is one that contains only syntactic equalities (including the spe cial cases trueand
false).
Definition 2.1 (Prolog Semantics) A Prolog program Pconsists of clauses of the
formH←B, whereHis an atom and Bis a goal. If Bis empty, it is equivalent
to trueand the clause is called a fact, otherwise it is called a rule. LetSbe a function
that simplifies the syntactic equalities in a goal.
Given a goal, let Gbe an atom chosen from the goal and Dbe the remainder of the
goal.
•Choose a clause from Pand take a copy of the clause with new variables, H←B.
IfGandHare unifiable, then
G,D/ma√sto→S(G=H,B,D).
4
Page 5:
•Otherwise, if Gis a call to a built-in predicate, the transition replaces it by the result
of executing the built-in, where the result is a conjunction of syntactic equalities E,
G,D/ma√sto→S(E,D).
•Otherwise, Ggives rise to a failure transition into the failed state deno ted by false,
G,D/ma√sto→false.
In implementations of Prolog, all clauses are tried for the current g oal in a systematic
manner using chronological backtracking. In the context of this p aper, we assume green
cuts for efficiency [SS94] in the recursive clauses. This means that w e commit to the
chosen clause once the cut is reached.
2.2 Rule Unfolding with Simplification
We now define unfolding with simplification for Prolog rules.
Definition 2.2 (Unfolding) LetPbe a Prolog program and let randvbe copies of two
(not necessarily different) clauses from Psuch that they do not share variables
r=H←D,G,B
v=H′←B′,
whereH,H′andGare atoms, and D,BandB′are goals.
Theunfolding of rule rwith rule v,unfold(r,v),is the rule
r′=H←D,G=H′,B′,BifGandH′are unifiable .
If the goal Gin the body of rule runifies with the head H′of a rule v, unfolding replaces
GbyG=H′and by the body of rule vto obtain the unfolded rule r′.
In the presence of a cut operator in the body of a rule, the correc t unfolding in general
becomes more complicated [Pre93]. Here we assume green cuts in the recursive rules. In
this case, it suffices to remove the cut coming from the rule rin the unfolded rule r′. The
cut from rule vwill stay.
Speedup crucially depends on the amount of simplification of the combined recursive
steps in the unfolded rules. We want to replace goals in the recursive step of the rules by
semantically equivalent ones that can be executed more efficiently. T his includes moving
the green cut to the left as far as semantically possible. For simplifica tion, we can use
any available program transformation technique.
Example 2.1 (Summation, contd.) We unfold the recursive rule for summation with
(a copy of) itself:
r=sum(N,S)←N >1,!,sum(N−1,S1),S:=N+S1
v=sum(N′,S′)←N′>1,!,sum(N′−1,S1′),S′:=N′+S1′
Then the unfolded rule is
r1=sum(N,S)←N >1,sum(N−1,S1)=sum(N′,S′),N′>1,!,
sum(N′−1,S1′),S′:=N′+S1′,S:=N+S1
The unfolded rule can be simplified into the rule
r1=sum(N,S)←N>2,!,sum(N−2,S1′),S:=2∗N−1+S1′
5
Page 6:
2.3 Repeated Recursion Unfolding
We can now define the program optimization strategy of repeated r ecursion unfolding for
Prolog based on rule unfolding with simplification. In our method, we st art from a call
(query) for a Prolog predicate defined by a recursive rule. We unfo ld the recursive rule
with itself and simplify it. Then we unfold the resulting rule. We repeat t his process as
long as the resulting rules are applicable to the query.
Definition 2.3 (Repeated Recursion Unfolding) Letrbe a recursive rule and Gbe
a goal. Let
unfold(r) =unfold(r,r).
Therepeated recursion unfolding of a recursive rule rwith goal Gand with rule simpli-
fication is a maximal sequence of rules r0,r1,...where rule rihas head Hiand guard Ci
and where
r0=r
ri+1=unfold(ri)as long as G=Hi,Ci/negationslash/ma√sto→false(i≥0)
We unfold and simplify the current unfolded rule riif the guard of the rule is successfully
executable with the query G. We add the new rule to the sequence and continue with it.
Note that unfolding may fail to produce a rule. In that case, the un folding process stops.
Example 2.2 (Summation, contd.) Consider a query sum(10,R). Recall the un-
folded simplified rule
r1=sum(N,S)←N>2,!,sum(N−2,S1),S:= 2∗N−1+S1
The query sum(10,R)means that the guard N>2succeeds since 10 =N, so we repeat
the unfolding:
unfold(r1) =sum(N,S)←N>2,sum(N−2,S1)=sum(N′,S′),N′>2,!,
sum(N′−2,S1′),S′:= 2∗N′−1+S1′,S:= 2∗N−1+S1
The unfolded rule can be simplified into the rule
r2=sum(N,S)←N>4,!,sum(N−4,S1′),S:= 4∗N−6+S1′
The rule r2is applicable to the goal. Further recursion unfolding resu lts in rules with
guardsN >8and then N >16. To the latter rule, the goal sum(10,R)is not applicable
anymore. Hence repeated recursion unfolding stops. The rul es for the goal sum(10,R)
are therefore (more unfolded rules come first):
r3=sum(N,S)←N >8,!,sum(N−8,S1),S:= 8∗N−28+S1
r2=sum(N,S)←N >4,!,sum(N−4,S1),S:= 4∗N−6+S1
r1=sum(N,S)←N >2,!,sum(N−2,S1),S:= 2∗N−1+S1
r=r0=sum(N,S)←N >1,!,sum(N−1,S1),S:=N+S1
b=sum(N,S)←N= 1,!,S= 1.
Note that to the goal sum(10,R)we can apply any of the recursive rules. The most
efficient way is to start with the first, most unfolded rule. It c overs more recursive steps
of the original recursive rule than any other rule.
6
Page 7:
2.4 Optimal Rule Applications
Anunfoldedrulecoverstwiceasmanyrecursionstepsthanthegive nrule. Whenweapply
a more unfolded rule, we cover more recursive steps with a single rule application. Based
on this observation we introduce a rule application strategy where w e try to apply more
unfolded rules first. Furthermore each unfolded rule is tried only on ce and is applied
at most once. In [Fr¨ u24], we proved this optimal rule application str ategy sound and
complete in the case of CHR. Here we apply it to Prolog.
Example 2.3 (Summation, contd.) A computation with optimal rule applications for
the goalsum(10,R)is as follows (where we subscript single transitions with th e rule that
was used):
sum(10,R)/ma√sto→r3
S(sum(10,R)=sum(N,S),N >8,sum(N−8,S1),S:= 8∗N−28+S1)/ma√sto→+
sum(2,S1),R:= 52+S1/ma√sto→r0
S(sum(2,S1)=sum(N′,S′),N′>1,sum(N′−1,S1′),S′:=N′+S1′,R:= 52+S1)/ma√sto→+
sum(1,S1′),R:= 54+S1′/ma√sto→b
R= 55
3 Implementation of Runtime Repeated Recursion
Unfolding in Prolog
At compile-time, the rules for the given recursive predicate are replaced by a call to
the unfolder that contains these rules and then to the meta-inter preter that interprets
the unfolded rules. At runtime, the unfolder repeatedly unfolds a r ecursive rule as long
as it is applicable to a given goal using a predefined unfolding scheme that specifies
the simplification of the rule. Then the meta-interpreter applies the resulting unfolded
simplified rules according to the optimal rule application strategy.
In the following code in concrete Prolog syntax, =/2,is/2,copyterm/2andcall/1
are standard built-in predicates of Prolog. The syntactic equality =/2tries to unify its
arguments, i.e. makingthemsyntactically identical byinstantiatingt heirvariablesappro-
priately. The arithmetic equality is/2tries to unify its first argument with the result of
evaluating the arithmetic expression in its second argument. The bu ilt-incopyterm/2
produces a copy of the given term with new variables. The Prolog met a-callcall/1
executes its argument as a goal.
Our implementation in SWI Prolog [WSTL12] together with the examples is fully
listed in this paper. Together with the benchmarking code it is also ava ilable from the
author on request.
3.1 Unfolder Implementation
The unfolder is implemented as a recursive Prolog predicate unf/3. It is the result of
a straightforward translation from the unfolder written in CHR in [Fr ¨ u24]. It repeat-
edly unfolds and simplifies a recursive rule as long as it is applicable to a go al. In
7
Page 8:
unf(G,Rs,URs) , the first argument Gis the goal and Rsis a list of given original clauses.
URsis the resulting list of unfolded and original clauses.
We assume that inthe goal Gthe input arguments are given andtheoutput arguments
are variables. Initially, the list Rsconsists of a recursive rule followed by one or more
clauses for the base cases of the recursion. To simplify the implemen tation, the body of
the rules in the lists syntactically always consists of three conjunct s of goals: the atoms
before the recursive goal, the recursive goal and the atoms afte r the recursive goal. We
use the built-in trueto denote an empty conjunct.
% unf(+RecursiveGoal, +RuleList, -UnfoldedRuleList)
unf(G, [R|Rs], URs) :- % recursive step
R = (H :- Co ,!, _), % get head H and guard Co of rule R
copy_term((H :- Co),(G :- C)), % copy them, unify head copy wi th goal G
call(C) % call instantiated guard copy C
,!,
simp_unf(R, UR), % unfold, simplify rule R into rule UR
unf(G, [UR,R|Rs], URs). % add new rule UR and recurse
unf(_G, [_R|Rs], URs) :- URs=Rs. % otherwise return rules Rs in URs
In the recursive clause for unf/3, we check if the rule Rin the list is applicable to the
query (call, goal) G: If the guard Csucceeds, we unfold the current rule Rwith itself and
and simplify it using the predicate simpunf/2(discussed below) and add the resulting
ruleURto the rule list in the recursive call of unf/3. When the guard has failed, the base
case clause of unf/3returns the rules that have been accumulated in the rule list as the
result list in the third argument (with the exception of the first rule t o which the goal
was not applicable).
The Prolog predicate simpunf(R,UR) takes the current rule and computes its simpli-
fied unfolding. It is defined for each recursive rule. In the head of a clause for simpunf/2
we userule templates for the arguments to ease the implementation. The rules are then
instances of the template. Certain variables in the template repres ent theparameters for
the instance. These parameters will be bound at runtime. In the bo dy of a clause for
simpunf/2, the parameters for the unfolded rule will be computed from the pa rameters
of the current rule.
The following example clarifies the above remarks on the implementatio n.
Example 3.1 (Summation, contd.) We show how we implement simpunf/2for the
summation example. We abbreviate sumto its first letter sto avoid clutter in the code.
The rule template for sumis
s(A,C) :- A>V ,!, B is A-V, s(B,D), C is V*A-W+D % rule template
where the variables VandWare parameters that stand for integers. Its instance for the
original recursive rule is
s(A,C) :- A>1 ,!, B is A-1, s(B,D), C is 1*A-0+D % rule instance V=1, W=0
The implementation of the unfolding scheme for summation is accomplished by the fol-
lowing Prolog clause for simpunf/2.
8
Page 9:
simp_unf(
(s(A,C) :- A>V ,!, B is A-V, s(B,D), C is V*A-W+D),
(s(Al,Cl) :- Al>Vl ,!, Bl is Al-Vl, s(Bl,Dl), Cl is Vl*Al-Wl+ Dl)
) :-
Vl is 2*V, Wl is 2*W+V*V.
For a goal s(100,S) the unfolder is called with
unf(s(100,S), [
(s(A,C):-A>1 ,!, B is A-1, s(B,D), C is 1*A-0+D), % orig. recu rsion
(s(A,B):-A=1 ,!, B=1, true, true) % base case
], URs).
It will return the following rules in the list URs:
s(A,C) :- A>64 ,!, B is A-64, s(B, D), C is 64*A-2016+D
s(A,C) :- A>32 ,!, B is A-32, s(B, D), C is 32*A-496+D
s(A,C) :- A>16 ,!, B is A-16, s(B, D), C is 16*A-120+D
s(A,C) :- A>8 ,!, B is A-8, s(B, D), C is 8*A-28+D
s(A,C) :- A>4 ,!, B is A-4, s(B, D), C is 4*A-6+D
s(A,C) :- A>2 ,!, B is A-2, s(B, D), C is 2*A-1+D
s(A,C) :- A>1 ,!, B is A-1, s(B, D), C is 1*A-0+D % orig. recursi on
s(A,C) :- A=1 ,!, C=1, true, true % base case
3.2 Generalized Meta-Interpreter Implementation for Mult iple
Recursion
We implement the optimal rule application strategy with the help of a me ta-interpreter
for Prolog. Again it closely follows the implementation for CHR in [Fr¨ u24 ]. The meta-
interpreter handles the recursive calls, any other goal will be hand led by the underlying
Prolog implementation. To a recursive goal, the meta-interpreter t ries to apply the
unfolded rules produced by the unfolder and applies each of them at most once. The
meta-interpreter is called with mip(G,Rs) , whereGis the given recursive goal and Rsis
the list of rules from the unfolder unf/3.
% Meta-Interpreter for Multiple Recursion
% mip(+RecursiveGoal, +RuleList)
mip(true,_Rs). % base case, no more recursive goal
mip((G1,G2),Rs) :- % handle conjunction for multiple recur sion
mip(G1,Rs),
mip(G2,Rs).
mip(G,[R|Rs]) :- % current rule is applicable to goal G
copy_term(R, (G :- C ,!, B,G1,D)), % copy rule, unify head cop y with G
call(C) % check guard
,!,
call(B), % execute atoms before recursive call
9
Page 10:
mip(G1,Rs), % recurse with recursive goal and remaining rul es
call(D). % execute atoms after recursive call
mip(G,[_R|Rs]) :- % current rule is not applicable
mip(G,Rs). % try remaining rules on G
We now discuss the four clauses of our meta-interpreter. In the fi rst clause, the base case
is reached since the recursive goal has been reduced to true. The second clause handles
a conjunction of recursive goals by interpreting each conjunct on its own with the rule
listRs. This case can handle multiple recursion and is new, it does not appear in the
main paper [Fr¨ u24].
The third clause tries to apply the rule Rin the rule list to the current goal G. If the
guardCholds, the rule is applied. The conjunct before the recursive goal Bis directly
executed with a meta-call. Next, the recursive goals G1are handled with a recursive call
to the meta-interpreter using the remainder of the rule list. Finally t he conjunct after
the recursive goal Dis directly executed with a meta-call.
Otherwise the first rule from the rule list was not applicable, and so th e last meta-
interpreter clause recursively continues with the remaining rules in t he list.
3.3 Recursive Predicate Implementation
In order to enable runtime repeated recursion unfolding, at compile -time, the clauses for
thegivenrecursive predicate c/karereplacedbyacalltotheunfolder unf/3thatcontains
these clauses and then to the meta-interpreter mip/2that interprets the unfolded rules.
The replacement fits the following rule template where X1,...,Xk are different variables.
% rule template for a recursive predicate c/k
c(X1,...,Xk) :-
unf(c(X1,...,Xk), OriginalRules, UnfoldedRules),
mip(c(X1,...,Xk), UnfoldedRules).
Example 3.2 (Summation, contd.) For the summation example, the corresponding
rule instance is as follows:
sum(N,S) :-
unf(s(N,S), [
(s(A,C) :- A>1 ,!, B is A-1, s(B, D), C is 1*A-0+D),
(s(A,C) :- A=1 ,!, C=1, true, true)
], URs),
mip(s(N,S), URs).
4 Generalization to Multiple Recursive Rules
We handle multiple recursive rules by handling each of the rules in separ ation. This
is the main contribution of this companion paper. Each recursive rule is subjected to
runtime repeated recursion unfolding in a round-robin manner by a n ovel round-robin
rule processor.
10
Page 11:
4.1 Meta-Interpreter with Continuation Goals
To enable this approach, we first generalize the original meta-inter preter by adding an
additional argument that will contain the goal that remains after t rying and applying the
unfolded rules of one specific original recursive rule, we call it the continuation goal .
% Meta-Interpreter with Continuation Goal
% mip(+RecursiveCall, +RuleList, -ContinuationGoal)
mip(true,_LR,true) :- !. % base case, no more goal
mip((R1,R2),LR,(G1,G2)) :- !, % for multiple recursion
mip(R1,LR,G1),
mip(R2,LR,G2).
mip(R,[RR|LR],G) :- % apply current rule
copy_term(RR, (R :- C ,!, B,Rl,RCl)),
call(C)
,!,
call(B),
mip(Rl,LR,G),
call(RCl).
mip(R,[_RR|LR],G) :- % current rule not applicable
mip(R,LR,G).
mip(R,[],R). % no more rule applicable, return current goal
The last clause is new, it is the additional base case mip(R,[],R) . It applies when the
rule list has become empty. Then the remaining unprocessed goal Rin the first argument
is returned as continuation goal in the third argument.
4.2 Round-Robin Rule Processor
To handle the remaining goal, we then continue with generating, tryin g and applying
the unfolded rules stemming from another recursive rule. We repea t this process with
all recursive rules until no goal remains. This behavior is implemented by around-robin
rule processor with a recursive predicate umr(R,RS) , whereRis a recursive call and RSis
a list of lists of unfolded rules. We keep and reuse unfolded rules betw een rounds. In the
comments in the code below, rlstands for rule list and rlsfor rule lists.
% Round-Robin Rule Processor
% umr(+RecursiveCall, +RuleLists)
% unfolding and interpretation rounds with several unfolde d rule lists
umr(true,_RS) :- !. % base case, no more recursive call to pro cess
umr(R,[goal(G)|RS]) :- !, % this clause ensures terminatio n
R \= G, % if goals unchanged, no rule was applied
11
Page 12:
append(RS,[goal(R)],RS1), % add current goal at end of rema ining rls
umr(R,RS). % continue with goal and rls
umr(R,[RL|RS]) :-
unf(R,RL,RL1), % unfold with current rl, return extended rl
mip(R,RL1,G), % interpret with this rl, return contin. goal
append(RS,[RL1],RS1), % add expanded rl at end of remaining rls
umr(G,RS1). % recurse with contin. goal and updated rls
In the first clause, the base case is reached since the goal is empty (which is represented
bytrue).
The second clause ensures termination in case no rule was applicable. The list element
goal(G) is reached after each round and holds the goal from the start of t he round.
When it is still equivalent to the current goal R, no rule was applied to the goal and the
computation fails due to the requirement R \= G.
In the last, main clause, umr/2tries to further unfold rules in the current rule list RL
with respect to the current recursive call Rusing the unfolder unf/3and then interprets
Rwith that rule list using mip/3. We add the extended rule list RL1at the end of the rule
listsRSso that we can reuse it and can go cyclically through all rule lists in a fair manner.
In the recursive call, the resulting continuation goal Gis further processed together with
the updated rule lists RS1.
4.3 Recursive Predicate Implementation
At compile-time, the rules for the given recursive predicate c/kare replaced by a call
toumr/2, whereX1,...,Xk are different variables and OriginalRuleLists is the list of
lists of the given recursive rules of c/k. We also add the clauses for the base cases at the
end of each recursive rule list. Finally, add the end of the lists of lists w e add the list
elementgoal(c(X1,...,Xk)) to ensure termination by recording the current call.
% rule template for a recursive predicate c/k with multiple r ecursion
c(X1,...,Xk) :- umr(c(X1,...,Xk), OriginalRuleLists).
5 Experimental Evaluation with Benchmarks
Our examples will demonstrate that super-linear speedups are inde ed possible. With
sufficient simplification, the time complexity is effectively reduced when applying our
generalized runtime repeated recursion unfolding.
For the worst-case time complexity of our implementation of runtime repeated re-
cursion unfolding, we have to consider the recursions in the original rules of the given
program, in the unfolder, generalized meta-interpreter and roun d-robin rule processor.
We parametrize the time complexity by the number of recursive step s with the original
rule. From the time complexity of the recursive steps we can derive t he time complexity
of the recursion using recurrence equations. For details, see the main paper [Fr¨ u24].
In the unfolder, the complexity of a recursive step mainly depends o n the time for
copying head and guard, for guard checking, and for unfolding and simplification of the
12
Page 13:
current rule. In the meta-interpreter, the time complexity is domin ated by the third
clause that applies a rule from the list to the current goal. The comple xity of a recursive
step in that clause is determined by the time needed for copying the r ule and for the
meta-calls of the guard and of the two body conjuncts of the rule. In the round-robin
rule processor, the third clause calls the unfolder and the generaliz ed meta-interpreter
for each rule list. Note that rule lists are reused in later rounds.
Our complexity estimates are based on the following assumptions tha t hold in SWI
Prolog. Unification and copying take constant time for given terms a nd (near-)linear
time in the size of the involved terms in general. A Prolog meta-call has the same time
complexity as directly executing its goal argument. SWI Prolog uses the GNU multiple
precision arithmetic library (GMP), where integer arithmetic is unbou nded. Comparison
and addition have logarithmic worst-case time complexity in the numbe rs involved. A
variety of multiplication algorithms are used in GMP to get near logarith mic complexity.
If one multiplies with a power of 2, the complexity is reduced to logarith mic.
In our experiments, we used SWI Prolog 9.2.3-1 running on an Apple Ma c mini M1
2020 MacOS Monterey 12.7.4 with 16 GB RAM. We use default settings f or SWI Prolog
except for the command line option -Owhich compiles arithmetic expressions. During
multiple runs of the benchmarks we observed a jitter in timings of at m ost 7%. Because
the runtime improvement is so dramatic, we can only benchmark small inputs with the
original recursion and have to benchmark larger inputs with runtime recursion unfolding.
5.1 Summation Example, Contd.
Wehave alreadyunfolded andsimplified the recursive rule forsummat ion inExample 2.1.
We introduced the implementation in concrete Prolog syntax in Examp le 3.1. In [Fr¨ u24],
we derived estimates for the time complexities for our summation exa mple written in
CHR that we can reuse here when we compare them to benchmark re sults.
5.1.1 Benchmarks
Table 1 shows benchmarks results for the summation example. Times are given in mil-
liseconds. Since the example features only one recursive rule, ther e is no need to use the
round-robin rule processor umr/2. The benchmarks confirm the super-linear speedup.
Original Recursion. In each subsequent table entry, we double the input number.
The runtime roughly doubles. So the runtime is at least linear. This is in lin e with the
expected log-linear time complexity O(nlog(n)): since the numbers are so small, addition
is fast, almost constant time, and the runtime is dominated by the line ar time overhead
of the recursion itself.
Unfolder and Meta-Interpreter. For runtime repeated recursion unfolding of our
summation example, we give the time needed for the unfolding, the tim e needed for the
execution with the meta-interpreter, and the sum of these timings (column ’Total Time’).
Because our method has lower time complexity and is quickly several o rders of magnitude
faster, we start from 225and in each subsequent table entry, we square the input number
instead of just doubling it.
The runtimes of the unfolder and meta-interpreter are similar. For each squaring
of the input number, the their runtimes more than double. The benc hmarks results
13
Page 14:
Original Summation
Inputn Time
2164
2178
21816
21931
22062
221125
222249Runtime Repeated Recursion Unfolding
InputnUnfolder Interpreter Total Time
2250.07 0.02 0.09
2500.07 0.04 0.11
21000.14 0.08 0.23
22000.29 0.17 0.46
24000.58 0.34 0.92
28001.17 0.70 1.88
216002.55 1.53 4.08
225+1 0.04 0.02 0.06
250+1 0.07 0.03 0.10
2100+1 0.15 0.06 0.21
2200+1 0.29 0.12 0.41
2400+1 0.58 0.25 0.83
2800+1 1.20 0.50 1.70
21600+1 2.45 1.00 3.44
Table 1: Benchmarks for Summation Example (times in milliseconds)
obtained are consistent with the expected complexities of O(log(n)2) of the unfolder and
meta-interpreter.
Comparing Recursion Depths 2iand2i+1.In the meta-interpreter, each of the
unfolded rules will be tried by unifying its head and checking its guard, but not all rules
will be necessarily applied. This may lead to the seemingly counterintuit ive behavior
that a larger query where only one unfolded rule is applied runs faste r than a smaller one
where several less unfolded rules are applied.
To see how pronounced this phenomena is, we compare timings for va lues ofnof the
form 2iand 2i+ 1. For input numbers of the form 2i, all unfolded rules are applied.
Input numbers of the form 2i+1, however, will need exactly one application of the most
unfolded rule rito reach the base case. As a consequence the runtime of the meta -
interpreter decreases by a linear factor of about 0 .7. The timings for the unfolder stay
about the same, because only one more rule is generated for 2i+ 1 (e.g. n= 21600+ 1
generates 1601 rules).
5.2 Fibonacci Number Example
This classical program generates the n-th Fibonacci number in a na ive inefficient way
using double recursion.
fib(N, F) :- N > 1 ,!,
N1 is N-1, N2 is N1-1,
fib(N1, F1), fib(N2, F2),
F is F1+F2.
fib(N, F) :- N =< 1 ,!, F=N.
The program features double recursion and exponential complexit y and therefore cannot
14
Page 15:
be handled by runtime repeated recursion unfolding as defined in [Fr¨ u24]. With our
generalization in the meta-interpreter, it becomes possible.
5.2.1 Implementation of Runtime Repeated Recursion Unfold ing
Rule Template. This time we start with the desired rule template. It is a generaliza-
tion of the original recursive rule by keeping the two recursive calls a nd by introducing
parameters in the sum.
f(N, F) :- N > A ,!,
(N1 is N-A, N2 is N1-1),
(f(N1, F1), f(N2, F2)),
F is P*F1+Q*F2.
The parameters of the rule template that change during unfolding a reA, PandQ. For
the original recursive rule, we have that A=P=Q=1.
Unfolding Scheme. To find the rule template and its unfolding scheme, the idea
was to unfold both recursive calls and to simplify and to merge them so that again only
two recursive calls remain. The derivation of this unfolding scheme is g iven in appendix
A. The implementation is accomplished by the following clause for simpunf. We assume
that the input is always a natural number (nonnegative integer).
simp_unf((f(N, F) :- N > A ,!,
(N1 is N-A, N2 is N1-1),
(f(N1, F1), f(N2, F2)),
F is P*F1+Q*F2),
(f(Nc, Fc) :- Nc > Ac ,!,
(N1c is Nc-Ac, N2c is N1c-1),
(f(N1c, F1c), f(N2c, F2c)),
Fc is Pc*F1c+Qc*F2c)
):-
Ac is 2*A, QQ is Q*Q, Pc is P*P+QQ, Qc is 2*P*Q-QQ.
Recursive Predicate. For the Fibonacci example, its rules are replaced by:
fib(I,O) :-
unf(f(I,O), [
(f(N, F) :- N > 1 ,!,
(N1 is N-1, N2 is N1-1), (f(N1, F1),f(N2, F2)), F is 1*F1+1*F2 ),
(f(N, F) :- N=<1 ,!, F=1, true, true)], URs),
mip(f(I,O), URs).
Unfolded Rules. The first few rules that are returned by the unfolder are
f(A,D):-A>16,!, (B is A-16,C is B-1), (f(B,E), f(C,F)), D is 1597*E+987*F.
f(A,D):-A>8 ,!, (B is A-8, C is B-1), (f(B,E), f(C,F)), D is 34 *E+21*F.
f(A,D):-A>4 ,!, (B is A-4, C is B-1), (f(B,E), f(C,F)), D is 5* E+3*F.
f(A,D):-A>2 ,!, (B is A-2, C is B-1), (f(B,E), f(C,F)), D is 2* E+1*F.
f(A,D):-A>1 ,!, (B is A-1, C is B-1), (f(B,E), f(C,F)), D is 1* E+1*F.
15
Page 16:
5.2.2 Complexity
Original Recursion. The recursion depth is determined by the input number n.
The original recursive rule for Fibonacci gives raise to the recurre nce equation f(n) =
f(n−1) +f(n−2) +O(n).O(n) is the complexity of adding the Fibonacci numbers.
These numbers grow exponentially with the input, thus their number of bits is linear to
the input number. The solution of the recurrence results is the well- known exponential
complexity in O(2n).
Unfolder and Meta-Interpreter. In the unfolder, when unfolding rule ri, the
input number nis checked to be larger than parameter Ain the guard, where Acan be
shown to be 2i. Insimpunf/2the multiplications of the parameters of the rule template
dominate the complexity. This leads to the recurrence u(n) =u(n/2) +M(n), where
theM(n) is the time complexity of efficient multiplication involving a number of nbits.
The solution of the recurrence is in M(n). The complexity M(n) can be assumed to be
O(nlog(n)log(log(n))).
In the meta-interpreter, the applications of the unfolded rules de termine the complex-
ity. According to the rule template, we have two recursive calls and t he multiplications
with Fibonacci numbers dominate the complexity of a recursive step . This leads to
the recurrence f(n) =f(n/2) +f(n/2) +M(n). The solution of the recurrence is in
O(log(n)M(n)), i.e.O(nlog(n)2log(log(n))).
Since the example features only one recursive rule, there is no need to use the round-
robin rule processor umr/2.
5.2.3 Benchmarks
Table 2 shows benchmarks results for the Fibonacci example. Times are in seconds. A
time measurement of 0.0n means that it was below 0.01 but more than z ero. The timings
for the original version show that theruntime is exponential as exp ected. The runtimes of
theunfolderandinterpreter roughlydoublewitheachdoublingofth einputnumber which
is consistent with the estimated complexities that are somewhat wor se than log-linear in
the input number n.
Comparing Recursion Depths 2iand2i+ 1.Fornof the form 2iall unfolded
rules are applied in the meta-interpreter. The meta-interpreter is two orders of mag-
nitude slower than the unfolder. The sum of runtimes is therefore d ominated by the
meta-interpreter. For nof the form 2i+1 on the other hand, only the most unfolded rule
is applied in the meta-interpreter. Its runtime is neglectable becaus e the multiplications
involve only the Fibonacci number 1. The sum of runtimes is therefor e very much domi-
nated by the unfolder. The total time is almost two orders of magnit ude faster for 2i+1
than for 2i.
5.3 Greatest Common Divisor Example
The Euclidean algorithm is a method for finding the GCD of two numbers by repeatedly
subtracting the smaller number from the larger number until both a re the same. The
implementation naturally lends itself to using two recursive rules.
g(M, N, X) :- M<N ,!, L is N-M, g(M, L, X).
16
Page 17:
Original Fibonacci
InputnTime
250n
25+1 1
25+2 1
25+3 2
25+4 3
25+5 5
25+6 8
25+7 13
25+8 22Runtime Repeated Recursion Unfolding
InputnUnfolder Interpreter Total Time
2180,0n 0.18 0.18
2190,0n 0.34 0.34
2200,0n 0.66 0.66
2210.01 1.28 1.28
2220.02 2.48 2.50
2230.03 4.84 4.88
2240.07 9.42 9.49
218+1 0,0n 0,000n 0,0n
219+1 0,0n 0,000n 0,0n
220+1 0.01 0.0001 0.01
221+1 0.02 0.0001 0.02
222+1 0.03 0.0001 0.04
223+1 0.07 0.0002 0.07
224+1 0.15 0.0003 0.15
Table 2: Benchmarks for Fibonacci Example
g(M, N, X) :- M>N ,!, L is M-N, g(L, N, X).
g(M, M, M).
Since we have multiple recursive rules, we use ourextended implement ation ofgeneralized
runtime repeated recursion unfolding from Section 4 that relies on t he predicate umr/2.
5.3.1 Implementation of Runtime Repeated Recursion Unfold ing
Rule Template. Notethat the two recursive rules have a similar structure and ther efore
can be unfolded in an analogous way. Unfolding the rules once with the mselves results
in the two rules:
g(M,N,X) :- 2*M<N ,!, L is N-2*M, g(M,L,X).
g(M,N,X) :- M>2*N ,!, L is M-2*N, g(L,N,X).
The generalization is apparent. It is the introduction of a paramete rAin place of the
number2with which we multiply the smaller number. This results in the following
template for the two rules:
g(M,N,X) :- A*M<N ,!, L is N-A*M, g(M,L,X).
g(M,N,X) :- M>A*N ,!, L is M-A*N, g(L,N,X)
Unfolding Scheme. From unfolding the rule templates we derive the following two
clauses for simpunf, where parameter Ais doubled with each unfolding.
simp_unf((g(M,N,X) :- A*M<N ,!, L is N-A*M, g(M,L,X), true) ,
(g(M1,N1,X1) :- A1*M1<N1 ,!, L1 is N1-A1*M1, g(M1,L1,X1), t rue)) :-
A1 is 2*A.
simp_unf((g(M,N,X) :- M>A*N ,!, L is M-A*N, g(L,N,X), true) ,
17
Page 18:
(g(M1,N1,X1) :- M1>A1*N1 ,!, L1 is M1-A1*N1, g(L1,N1,X1), t rue)) :-
A1 is 2*A.
The call is as follows:
umr(g(A,B,_),[
[(g(M,N,Z) :- 1*M<N ,!, L is N-1*M, g(M,L,Z), true),
(g(M,M,M) :- true ,!, true, true, true)],
[(g(M,N,Z) :- M>1*N ,!, L is M-1*N, g(L,N,Z), true),
(g(M,M,M) :- true ,!, true, true, true)]
])
The first few unfolded rules for both recursive clauses are:
g(M,N,Z) :- 1*M<N ,!, L is N-1*M, g(M,L,Z), true.
g(M,N,Z) :- 2*M<N ,!, L is N-2*M, g(M,L,Z), true.
g(M,N,Z) :- 4*M<N ,!, L is N-4*M, g(M,L,Z), true.
g(M,N,Z) :- 8*M<N ,!, L is N-8*M, g(M,L,Z), true.
g(M,N,Z) :- M>1*N ,!, L is M-1*N, g(L,N,Z), true.
g(M,N,Z) :- M>2*N ,!, L is M-2*N, g(L,N,Z), true.
g(M,N,Z) :- M>4*N ,!, L is M-4*N, g(L,N,Z), true.
g(M,N,Z) :- M>8*N ,!, L is M-8*N, g(L,N,Z), true.
5.3.2 Complexity
Original Recursion. The worst case for the original GCD program is when the input
numbers are nand 1. Then the repeated subtraction results in a recursion depth ofn.
Subtraction like addition has logarithmic complexity in the number. The n the recurrence
g(n) =O(log(n))+g(n−1) leads to the worst-case time complexity of O(n log(n)).
Unfolder, Meta-Interpreter and Round-Robin Processor. In the unfolder and
meta-interpreter, the worst-case time complexity is dominated by the generation and
application of the unfolded rules. In the unfolder, the multiplications of the parameters
in the rule templates of simpunf/2dominate the complexity. These multiplications are
by a power of 2 and in this special case have the same complexity as ad ditions. The
recurrence is of the form g(n) =O(log(n)) +g(n/2) with a resulting worst-case time
complexity O(log(n)2). Similarly, in the meta-interpreter, the multiplications, again by
a power of 2, in the unfolded rules lead to the same complexity.
In the round-robin rule processor umr/2, we apply the unfolder and meta-interpreter
for several rounds (recursive steps). Since each round applies a t least one rule in the
meta-interpreter, a crude upper bound for the number of round s isnwhen we alternate
between applying the two original rules because no unfolding is possib le. For a more
precise measure, assume a pair of numbers nandmfor GCD with n > m. One round
withumrwill apply one recursive rule and its unfoldings to exhaustion and thus will lead
to a pair of numbers n′andmwheren′< m. Now it holds that n′< n/2, because we
keep subtracting mfromnuntiln′< mand we can subtract at least once, so n′≤n−m.
Adding these two inequations gives 2 n′< nand thus n′< n/2. Hence each round, i.e.
each change in rules, at least halves the larger number. Hence the n umber of rounds is in
18
Page 19:
O(log(n)). Thus the overall worst-case time complexity is in O(log(n))O(log(n)2) which
isO(log(n)3). This is sufficient for a super-linear speedup compared to O(n log(n)).
5.3.3 Benchmarks
Times are in seconds. We discuss two sets of benchmarks. Table 3 sh ows some bench-
marks results for the GCD example. We use powers of 2 for the large r input number n
and a 37 for the second input number. For the original rules we use s uccessive exponents
up to 33, the runtime of the original rules is consistent with the estim ated worst-case
complexity O(n log(n)). With our generalized runtime repeated recursion unfolding, the
runtime is so fast that we start from an exponent 5000 and increas e by a factor of about√
2 up to 40000. The runtime is in the estimated complexity of O(log(n)3), but also
consistent with the better O(log(n)2). Since the second input number is fixed to 37, after
one round in umrthe first number will be smaller than 37 and the remaining runtime will
be neglectable.
Original GCD
Inputn,37Time
2270.35
2280.64
2291.27
2302.52
2315.04
23210.09
23320.40Runtime Repeated Recursion Unfolding
Inputn,37 Total Time
250000.015
270000.021
2100000.032
2140000.045
2200000.067
2280000.094
2400000.148
Table 3: Benchmarks for GCD Example 1
Table 4 shows results for another pair of inputs, n= 2kandm= 2k//2+2k//4−1, i.e.
mis somewhat larger than the square root of n. For the original rules we use successive
exponents up to 46, the runtime is consistent with the estimated wo rst-case complexity
O(n log(n)). If we multiply the inputs by about 4, the runtime roughly doubles. So the
actualcomplexityforthistypeofnumberpairsisin O(√n). Withourgeneralizedruntime
repeated recursion unfolding, the runtime is in the estimated comple xity ofO(log(n)3),
but also again consistent with the better O(log(n)2).
6 Related Work
Program transformation to improve efficiency is usually concerned with a strategy for
combining unfolding [SS94, Pre93] and folding to replace code (for an overview see e.g.
[Vis05, PPFDA24]). The transformations are typically performed offl ine, at compile-
time. Program transformation for specific aims and applications is ab undant in logic
programming in general [PP99].
Partial evaluation (partial deduction, program specializ ation)[LB02] is a program
transformation to execute programs with partially known input to s pecialize it, typi-
cally at compile-time. Compile-time partial evaluation alone cannot ach ieve super-linear
19
Page 20:
Original GCD
InputnTime
2400.07
2410.13
2420.13
2430.27
2440.27
2450.53
2460.52Runtime Repeated Recursion Unfolding
Inputn Total Time
250000.015
270000.023
2100000.039
2140000.072
2200000.138
2280000.277
2400000.611
Table 4: Benchmarks for GCD Example 2
speedup up (Chapter 6 in [JGS93]). This result does not apply to our a pproach, because
we use a runtime transformation and strongly rely on simplification. I n [BLR10] just-in-
time partial evaluation of Prolog is introduced consisting of 1500 lines of Prolog code. A
linear speedup of up to a factor of 5 was reported in benchmarks.
In general, super-linear speedups by program transformation are rare and mostly con-
cern parallel programs. Our technique applies to sequential progr ams. In a sequential
setting, super-linear speedups can sometimes be achieved with memoization , where the
results of recursive calls are cached and reused if the same recurs ive call reappears later
on.Tupling[HITT97] applies when several recursions operate on the same dat a struc-
ture. Then tupling tries to merge these recursions into a single one. Then there is work
based on supercompilation for functional programming languages like Refal and Haskell.
In advanced forms of this offline program transformation such as d istillation [Ham09]
and equality indices [GKN16], sophisticated generalization while unfoldin g increases the
chance for folding and can achieve super-linear speedup.
In contrast, our technique relies solely on unfolding and simplifying th e combined
recursive steps again and again. We add redundant rules this way bu t never remove any.
Further related work is discussed in the main paper [Fr¨ u24].
7 Conclusions
In this companion paper to [Fr¨ u24], we generalized runtime repeate d recursion unfolding
for multiple recursion and for multiple recursive rules and implemented it in the logic
programming language Prolog. We provided a lean implementation of ou r approach in
ten clauses, comprising the unfolder, a generalized meta-interpre ter and a novel round-
robin rule processor to handle multiple recursive rules. We showed wit h benchmarks
for several classical algorithms that the super-linear speedup ind eed holds in practice.
We have considered recursive clauses that have green cuts for effi ciency. Preliminary
experiments show that our implementation also works without such c uts, but the effects
on runtime have to be further investigated. We have not yet cover ed mutual recursion,
but do not see problems in doing so.
Runtime repeated recursion unfolding hinges on finding the appropr iate rule template
for unfolding for the given problem at compile-time. This requires insig ht into the given
problem and cannot be fully automated. In the main paper [Fr¨ u24], w e discuss these and
20
Page 21:
other issues and limitations of runtime repeated recursion unfolding and suggest some
possible improvements as well.
In the main paper we proved runtime repeated recursion unfolding f or single direct
recursions correct for CHR (Constraint Handling Rules) by showing the redundancy of
unfolded recursive rules and their termination. The proof ideas are expected to carry
over to our generalization in Prolog. In the main paper we also proved a sufficient and
necessary condition for super-linear speedup. For a given recurs ion, then one tries to find
an unfolding with an improved time complexity that satisfies the condit ion. If it can be
found, a super-linear speedup is guaranteed. It is left for future work to extend this result
to multiple recursion and to Prolog.
We think our approach can also be applied to functional programming languages. For
mainstream programming languages it should be possible to adapt the technique to loops
as well. Overall, our generalized runtime repeated recursion unfoldin g provides a promis-
ing strategy for online optimization of recursions in which the sufficien t user-definable
simplification of combined successive recursive steps leads to predic table speedups that
can be super-linear.
Acknowledgements. This research was performed during the sabbatical of the
author in the winter semester of 2024/25. We thank Jesper Larss on Tr¨ aff for inspiring
discussions.
References
[BLR10] CarlFriedrichBolz, Michael Leuschel, andArminRigo. Towar dsjust-in-time
partialevaluationofprolog. In Logic-Based Program Synthesis and Transfor-
mation: 19th International Symposium, LOPSTR 2009, Coimbr a, Portugal,
September 2009, Revised Selected Papers 19 , pages 158–172. Springer, 2010.
[Fr¨ u20] Thom Fr¨ uhwirth. Repeated recursion unfolding for supe r-linear speedup
within bounds. Pre-Proceedings of the 30th International Symposium on
Logic-Based Program Synthesis and Transformation (LOPSTR 2020), full
version arXiv preprint arXiv:2009.05314 , 2020.
[Fr¨ u24] Thom Fr¨ uhwirth. Runtime repeated recursion unfolding: A just-in-time on-
line program optimization that can achieve super-linear speedup. submitted
to Fundamenta Informaticae, arXiv preprint arXiv:2307.02 180, 2024.
[Fr¨ u25] ThomFr¨ uhwirth. Principlesof Rule-BasedProgramming .BooksonDemand,
2025.
[GKN16] Robert Gl¨ uck, Andrei Klimov, and Antonina Nepeivoda. No nlinear con-
figurations for superlinear speedup by supercompilation. In Fifth Interna-
tional Valentin Turchin Workshop on Metacomputation , page 32. University
of Pereslavl, 2016.
[Ham09] Geoff W Hamilton. Extracting the essence of distillation. In International
Andrei Ershov Memorial Conference on Perspectives of Syste m Informatics ,
pages 151–164. Springer, 2009.
21
Page 22:
[HITT97] Zhenjiang Hu, Hideya Iwasaki, Masato Takeichi, and Akihiko Takano. Tu-
pling calculation eliminates multiple data traversals. ACM Sigplan Notices ,
32(8):164–175, 1997.
[HR99] Matthew M Huntbach and Graem A Ringwood. Agent-oriented program-
ming: from Prolog to guarded definite clauses . Springer Science & Business
Media, 1999.
[JGS93] Neil D. Jones, Carsten K. Gomard, and Peter Sestoft. Partial evaluation and
automatic program generation . Prentice Hallinternational series incomputer
science. Prentice Hall, 1993.
[Kow14] R. Kowalski. Logic for Problem Solving, Revisited . Computer Science Es-
sentials. Books on Demand, 2014.
[LB02] Michael Leuschel and Maurice Bruynooghe. Logic program s pecialisation
through partial deduction: Control issues. Theory and Practice of Logic
Programming , 2(4-5):461–515, 2002.
[PP99] Alberto Pettorossi and Maurizio Proietti. Synthesis and tra nsformation of
logic programs using unfold/fold proofs. The Journal of Logic Programming ,
41(2-3):197–230, 1999.
[PPFDA24] Alberto Pettorossi, Maurizio Proietti, Fabio Fioravanti, a nd Emanuele
De Angelis. A historical perspective on program transformation an d re-
cent developments (invited contribution). In Proceedings of the 2024 ACM
SIGPLAN International Workshop on Partial Evaluation and P rogram Ma-
nipulation , pages 16–38, 2024.
[Pre93] Steven David Prestwich. An unfold rule for full prolog. Logic-based Program
Synthesis and Transformation , pages 199–213, 1 1993.
[SS94] LeonSterlingandEhudYShapiro. The art of Prolog: advancedprogramming
techniques . MIT press, 1994.
[Vis05] Eelco Visser. A survey of strategies in rule-based program t ransformation
systems. Journal of symbolic computation , 40(1):831–873, 2005.
[WSTL12] Jan Wielemaker, Tom Schrijvers, Markus Triska, and Torb j¨ orn Lager. Swi-
prolog.Theory and Practice of Logic Programming , 12(1-2):67–96, 2012.
A Fibonacci Numbers, Derivation of Rule Scheme
for Unfolding
To find the unfolding scheme for simpunf/2, the idea is to unfold both recursive calls,
to simplify and merge them so that again only two recursive calls remain . Recall the rule
template for Fibonacci numbers
22
Page 23:
f(N, F) :- N > A ,!,
(N1 is N-A, N2 is N1-1),
(f(N1, F1), f(N2, F2)),
F is P*F1+Q*F2.
In the rule template, we first unfold both recursive calls and collect t he guards together
with the relevant arithmetic computations.
f(N, F) :- N > A, (N1 is N-A, N2 is N1-1), N1 > A, N2 > A ,!,
% remaining unfolding of f(N1, F1)
(N11 is N1-A, N21 is N11-1),
(f(N11, F11), f(N21, F21)),
F1 is P*F11+Q*F21,
% remaining unfolding of f(N2, F2),
(N12 is N2-A, N22 is N12-1),
(f(N12, F12), f(N22, F22)),
F2 is P*F12+Q*F22,
%
F is P*F1+Q*F2.
According to the original rule template, we want to have just two re cursive calls.
Consider the variables N12andN21. They depend on N1in the following way: N12 is
N2-A, N2 is N1-1 andN21 is N11-1, N11 is N1-A . We observe that this implies that
N21=N12 and therefore F21=F12, since Fibonacci is a function. Hence we do not need to
compute f(N12, F12) and can remove it.
f(N, F) :- N > A, (N1 is N-A, N2 is N1-1), N1 > A, N2 > A ,!,
% remaining unfolding of f(N1, F1)
(N11 is N1-A, N21 is N11-1),
(f(N11, F11), f(N21, F21)),
F1 is P*F11+Q*F21,
% remaining unfolding of f(N2, F2),
(N12 is N2-A, N22 is N12-1),
(F12=F21, f(N22, F22)),
F2 is P*F12+Q*F22,
%
F is P*F1+Q*F2.
Thisleavesthreerecursivecallswithsuccessive inputnumbersindec reasingorder, f(N11,
F11), f(N21, F21), f(N22, F22) . By definition of Fibonacci numbers, we have that
F11 is F21+F22 . We can use this equation to compute F22byF22 is F11-F21 instead
of computing it with the third recursive call f(N22, F22) .
f(N, F) :- N > A, (N1 is N-A, N2 is N1-1), N1 > A, N2 > A ,!,
% remaining unfolding of f(N1, F1)
(N11 is N1-A, N21 is N11-1),
(f(N11, F11), f(N21, F21)),
F1 is P*F11+Q*F21,
23
Page 24:
% remaining unfolding of f(N2, F2),
(N12 is N2-A, N22 is N12-1),
(F12=F21, F22 is F11-F21),
F2 is P*F12+Q*F22,
%
F is P*F1+Q*F2.
We have successfully removed the last two recursive calls from the u nfolding of f(N2,F2) .
Now we simplify the arithmetic expressions, first by removing variable s that have be-
comesuperfluous. Thevariables N12, N22 occurringin N12 is N2-A, N22 is N12-1 are
no longer needed. The condition N2 > Athat remained from the unfolding of f(N2,F2)
is no longer needed and thus also N2can be removed.
f(N, F) :- N > A, (N1 is N-A), N1 > A ,!,
% remaining unfolding of f(N1, F1)
(N11 is N1-A, N21 is N11-1),
(f(N11, F11), f(N21, F21)),
F1 is P*F11+Q*F21,
% remaining unfolding of f(N2, F2),
(F12=F21, F22 is F11-F21),
F2 is P*F12+Q*F22,
%
F is P*F1+Q*F2.
We continue simplification by removing the intermediate variables N1, F1, F2, F12 and
F22.
f(N, F) :- N > A, N-A > A ,!,
% remaining unfolding of f(N1, F1)
(N11 is N-A-A, N21 is N11-1),
(f(N11, F11), f(N21, F21)),
%
F is P*(P*F11+Q*F21)+Q*(P*F21+Q*(F11-F21)).
We further simplify the remaining arithmetic expressions to arrive fin ally at:
f(N, F) :- N > 2*A ,!,
% remaining unfolding of f(N1, F1)
(N11 is N-2*A, N21 is N11-1),
(f(N11, F11), f(N21, F21)),
%
F is (P*P+Q*Q)*F11+(2*P*Q-Q*Q)*F21.
This rule fits the rule template that we have initially devised.
B Further Examples
The examples for list reversal and sorting are taken from the main p aper [Fr¨ u24], where
the details can be found such as the derivation of the unfolding sche me and of the time
24
Page 25:
complexity results. We did a straightforward translation from CHR [F r¨ u25] to Prolog
and performed the benchmarks. They show the expected super- linear speedups.
B.1 List Reversal Example
The classical program reverses a given list in a naive way. It takes th e first element of
the list, reverses its remainder and adds the element to the end of t he reversed list. The
Prolog predicate r(A,B)holds if list Bis the reversal of list A.
r(E, D) :- E=[C|A], !, r(A, B), append(B, [C], D).
r(E, D) :- E=[], !, D=[].
The predicate append(X,Y,Z) concatenates two lists XandYinto a third list Z. Its
runtime is linear in the length (number of elements) of the first list.
B.1.1 Implementation of Runtime Repeated Recursion Unfold ing
Unfolding with Simplification. The unfolding scheme for list reversal is implemented
with the following clause for simpunf/2.
simp_unf(
(r(A,B) :- A=E ,!, true, r(C,D), append(D,F,B)), % given rul e template
(r(Al,Bl) :- Al=El ,!, true, r(Cl,Dl), append(Dl,Fl,Bl)) % unfolded r.
) :-
copy_term((E,C,F),(El,Cc,Fc1)),
copy_term((E,C,F),(Ec,Cl,Fc2)),
Cc=Ec,
append(Fc2,Fc1,Fl).
During unfolding, in the given rule template, the variable Ein the guard will be instan-
tiated with an open list ending in the variable C. The list Finappend/3 then consists of
the element variables of Ein reversed order. In the unfolded rule template, the number
of elements in these two lists is doubled and their relationship of rever sal is maintained.
Recursive Predicate. For list reversal, its rules are replaced by:
rev(I,O) :-
unf(r(I,O), [
(r(A, E) :- A=[D|B] ,!, true, r(B, C), append(C, [D], E),
(r(A, B) :- A=[] ,!, B=[], true, true)
], URs),
mip(r(I,O), URs).
The list in the second argument of unf/3contains the original recursive rule and the rule
for the base case in appropriate template form.
Unfolded Rules. The rules that are returned by the unfolder unf/3for a query
with 17 list elements are
25
Page 26:
r(A, T) :- A=[S,R,Q,P,O,N,M,L,K,J,I,H,G,F,E,D|B] ,!,
true, r(B, C), append(C, [D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R ,S], T).
r(A, L) :- A=[K,J,I,H,G,F,E,D|B] ,!,
true, r(B, C), append(C, [D,E,F,G,H,I,J,K], L).
r(A, H) :- A=[G,F,E,D|B] ,!, true, r(B, C), append(C, [D,E,F ,G], H).
r(A, F) :- A=[E,D|B] ,!, true, r(B, C), append(C, [D,E], F).
r(A, E) :- A=[D|B] ,!, true, r(B, C), append(C, [D], E).
r(A, B) :- A=[] ,!, B=[], true, true.
The number of element variables in the lists doubles with each unfolding but will never
exceedn, the length of the input list in the query. Hence the unfolded rules do not
increase overall space complexity.
B.1.2 Benchmarks
Table 5 shows benchmarks results for the list reversal example. Th e list sizes (lengths) n
are powers of 2. Times are in seconds. The experiments confirm the super-linear speedup
using runtime repeated recursion unfolding.
Original list reversal
Inputn Time
290.02
2100.04
2110.09
2120.31
2131.21
2144.81
21519.31Runtime Repeated Recursion Unfolding
InputnUnfolder Interpreter Total Time
213−10.0n 0.0n 0.0n
214−10.0n 0.0n 0.01
215−1 0.01 0.01 0.02
216−1 0.02 0.01 0.03
217−1 0.03 0.02 0.06
218−1 0.07 0.05 0.12
219−1 0.13 0.10 0.23
2130.0n 0.0n 0.01
2140.01 0.0n 0.01
2150.02 0.01 0.03
2160.03 0.02 0.05
2170.07 0.04 0.11
2180.13 0.08 0.21
2190.26 0.15 0.42
Table 5: Benchmarks for List Reversal Example
Original Recursion. For the original recursion, the benchmarks indicate a com-
plexity consistent with the expected O(n2). From 212on, doubling the list size increases
the runtime by a factor of about four.
Unfolder and Meta-Interpreter. All measured runtimes are consistent with a
linear complexity O(n). For list size n= 213, runtime repeated recursion unfolding is
already two orders of magnitude faster than the original recursio n.
Comparing Recursion Depths 2i−1and2i.We give timings for list lengths n
of the form 2iand their predecessor numbers 2i−1. In the meta-interpreter, the runtime
of applying allunfolded rules (case of n= 2i−1) is less than of applying just the next
26
Page 27:
larger unfolded rule (which has twice the size) (case of n= 2i). The unfolder takes longer
than the meta-interpreter. Going from 2i−1to 2i, the unfolder generates one more rule
and the time spent roughly doubles.
B.2 Sorting Example
The classical insertion sort program sorts the numbers given in a list in ascending order:
s(L,S) :- L=[A|L1] ,!, s(L1,S1), i(A,S1,S).
s([],S):- !, S=[].
The predicate i(A, S1, S) inserts a number Ainto the sorted list S1such that the
resulting list Sis sorted.
B.2.1 Implementation of Runtime Repeated Recursion Unfold ing
In the rule template that we derive in [Fr¨ u24], we use a built-in for me rgingm(S1,S2,S3)
instead of insertions. It merges two sorted lists into a third sorted list.
Unfolding with Simplification. Relyingontheruletemplate, theunfoldingscheme
is defined the following clause for simpunf.
simp_unf(
Rule, % given rule
(s(L,S) :- L=AL ,!, MG, s(L2,S2), m(S0,S2,S)) % unfolded rul e template
):-
copy_term(Rule, (s(L,S) :- L=AL ,!, MG1, s(L1,S1), m(S3,S1 ,S))),
copy_term(Rule, (s(L1,S1) :- L1=AL1 ,!, MG2, s(L2,S2), m(S 4,S2,S1))),
clean((MG1, MG2, m(S3,S4,S0)), MG),
L1=AL1.
We copy the input rule twice onto instances of the rule template to sim ulate the unfolding
of the recursive call. The predicate clean/2 removes superfluous truebuilt-ins in the
resulting mergings. Finally, executing L1=AL1will double the size of the open list AL
which ends in L1.
Recursive Predicate. For the sorting example, its rules are replaced by:
sort(I,O) :-
unf(s(I,O), [
(s(A,E) :- A=[C|B] ,!, true, s(B,D), m([C],D,E)),
(s([],A):- true ,!, A=[], true, true)
], URs),
mip(s(I,O), URs).
We write the original recursive clause according to the rule template using merge m/3
instead of insert i/3.
Unfolded Rules. The first few rules that are returned by the unfolder are
27
Page 28:
s(A,S) :- A=[C,B,E,D,I,H,K,J|P] ,!,
((m([B],[C],G), m([D],[E],F), m(F,G,O)),
(m([H],[I],M), m([J],[K],L), m(L,M,N)), m(N,O,Q)),
s(P,R), m(Q,R,S).
s(A,K) :- A=[C,B,E,D|H] ,!,
(m([B],[C],G), m([D],[E],F), m(F,G,I)),
s(H,J), m(I,J,K).
s(A,G) :- A=[C,B|D] ,!, m([B],[C],E), s(D,F), m(E,F,G).
s(A,E) :- A=[C|B] ,!, true, s(B,D), m([C],D,E).
s([],A):- true ,!, A=[], true, true.
As with list reversal, the rule size roughly doubles with each unfolding, but again this
does not increase the space complexity.
B.2.2 Benchmarks
Table 6 shows benchmarks results for the sorting example. Times ar e in seconds. The
benchmarks are performed with random permutations of integers from 1 to n. The
experiments confirm the super-linear speedup.
Original Sorting
InputnTime
280.0n
290.02
2100.05
2110.13
2120.48
2131.93
2147.77Runtime Repeated Recursion Unfolding
InputnUnfolder Interpreter Total Time
212−1 0.01 0.02 0.03
213−1 0.01 0.03 0.03
214−1 0.01 0.04 0.04
215−1 0.01 0.07 0.08
216−1 0.02 0.15 0.17
217−1 0.05 0.30 0.34
218−1 0.09 0.64 0.74
2120.0n 0.01 0.01
2130.01 0.02 0.02
2140.01 0.03 0.05
2150.02 0.07 0.09
2160.05 0.14 0.18
2170.09 0.29 0.38
2180.18 0.60 0.78
Table 6: Benchmarks for Sorting Example
Original Recursion. The experiments for the original version of insertion sort
indicate a complexity that is indeed quadratic O(n2). Doubling the list length increases
the runtime by a factor of four from n= 211on.
Unfolder and Meta-Interpreter. All timings are linear in nor slightly worse.
The runtimes of the unfolder are consistent with the expected linea r complexity O(n).
The meta-interpreter timings are consistent with the expected log -linear complexity
O(nlog(n)). The generation of all rules in the unfolder takes a fraction of th e time
of applying one or more rules in the meta-interpreter.
28
Page 29:
Comparing Recursion Depths 2i−1and2i.Going from input list length 2i−1
to 2i, the unfolder generates one more rule. It has twice the size of the previous rule.
And indeed the runtime for the unfolder roughly doubles. Going from list length 2i−1
to 2i, the meta-interpreter applies all unfolded rules in the first case bu t only the next
more unfolded rule in the second case. Its runtime decreases some what. But overall, the
total runtime increases somewhat.
29