Authors: Jiachi Chen, Zhenzhe Shao, Shuo Yang, Yiming Shen, Yanlin Wang, Ting Chen, Zhenyu Shan, Zibin Zheng
Paper Content:
Page 1:
1
NumScout: Unveiling Numerical Defects in Smart
Contracts using LLM-Pruning Symbolic Execution
Jiachi Chen, Zhenzhe Shao, Shuo Yang, Yiming Shen, Yanlin Wang, Ting Chen, Zhenyu Shan,
Zibin Zheng, Fellow, IEEE
Abstract —In recent years, the Ethereum platform has wit-
nessed a proliferation of smart contracts, accompanied by
exponential growth in total value locked (TVL). High-TVL
smart contracts often require complex numerical computations,
particularly in mathematical financial models used by many
decentralized applications (DApps). Improper calculations can
introduce numerical defects, posing potential security risks.
Existing research primarily focuses on traditional numerical
defects like integer overflow, and there is currently a lack of
systematic research and effective detection methods targeting
new types of numerical defects. In this paper, we identify five
new types of numerical defects through the analysis of 1,199
audit reports by utilizing the open card method. Each defect
is defined and illustrated with a code example to highlight its
features and potential consequences. We also propose NumScout,
a symbolic execution-based tool designed to detect these five
defects. Specifically, the tool combines information from source
code and bytecode, analyzing key operations such as comparisons
and transfers, to effectively locate defects and report them based
on predefined detection patterns. Furthermore, NumScout uses
a large language model (LLM) to prune functions which are
unrelated to numerical operations. This step allows symbolic
execution to quickly enter the target function and improve
runtime speed by 28.4%. We run NumScout on 6,617 real-
world contracts and evaluated its performance based on manually
labeled results. We find that 1,774 contracts contained at least
one of the five defects, and the tool achieved an overall precision
of 89.7%.
Index Terms —Smart Contracts, Numerical Defects, LLM,
Symbolic Execution
I. I NTRODUCTION
Since the launch of Ethereum [1] in 2015, smart contracts
have emerged as a key technology in the blockchain space.
Smart contracts are computer programs that automatically
enforce predefined agreements on the blockchain, executing
transactions without requiring intermediaries. With the rapid
development of the Ethereum ecosystem, the number of smart
contracts on Ethereum and other blockchain platforms has
Jiachi Chen, Zhenzhe Shao, Shuo Yang, Yiming Shen, Yanlin Wang, Zibin
Zheng are with the School of Software Engineering, Sun Yat-sen Univer-
sity, Zhuhai 519082, China (e-mail: chenjch86@mail.sysu.edu.cn; shaozhzh3
@mail2.sysu.edu.cn; yangsh233@mail2.sysu.edu.cn; shenym7@mail2.sysu
.edu.cn; wangylin36@mail.sysu.edu.cn; zhzibin@mail.sysu.edu.cn)
Ting Chen is with the School of Computer Science and Engineering(School
of Cyber Security), University of Electronic Science and Technology of China,
Chengdu 611731, China, and also with Kashi Institute of Electronics and
Information Industry, Kashi, 844000, China (e-mail: brokendragon@uestc.edu
.cn)
Zhenyu Shan is with the Intelligent Transportation and Information Security
Laboratory, Hangzhou Normal University, Hangzhou 311121, China (e-mail:
20100119@hznu.edu.cn)
Zhenyu Shan is the corresponding author.grown significantly, giving rise to numerous token contracts
and decentralized applications(DApps) [2]. Meanwhile, the
digital assets involved in these contracts and applications have
grown exponentially.
During the design and development of smart contracts, de-
velopers frequently handle various numerical computations. In
particular, many DApps rely on mathematical financial models
that require highly complex computations [3]. However, owing
to the characteristics of the Solidity programming language [4]
and the inherent limitations of blockchain platforms, smart
contracts are susceptible to various numerical defects. In this
paper, we define numerical defects as all numerical-related
errors, vulnerabilities, or flaws that can lead to unexpected
outcomes or deviate from the original code’s intent [5]. No-
tably, numerical defects involve not only security issues but
also design flaws, which can increase the long-term risk of the
smart contracts.
Numerous real-world hacking incidents caused by numerical
defects have already resulted in severe financial losses for
both project teams and users. Although common numerical
security defects, such as integer overflow and type conversion
errors [6], have been identified and mitigated through solu-
tions like the SafeMath library [7] and the introduction of
new security mechanisms in Solidity v0.8 [8], new types of
numerical defects continue to emerge in practice. For example,
over $2.12 million in assets were stolen from Balancer [9] due
to a precision-related issue [10]. These numerical defects pose
significant threats to the security and reliability of contracts.
However, a systematic study that classifies new types of nu-
merical defects and provides corresponding detection methods
and tools is still lacking.
To fill the gap, we first conducted an empirical study to
define new types of numerical defects by analyzing 1,199
audit reports using an open card sorting method [11]. Based
on this analysis, we identified five categories of new numerical
defects, i.e., Div In Path ,Operator Order Issue ,Minor Amount
Retention ,Exchange Problem , and Precision Loss Trend . We
present examples for each defect type and propose correspond-
ing mitigation strategies to enhance the quality and robustness
of smart contracts.
Then, we developed a tool named NumScout, designed to
detect the five new types of numerical defects in real-world
contracts. NumScout leverages the reasoning capabilities of
Large Language Models (LLMs) [12] and combines source
code level information with bytecode analysis to enhance
detection efficiency in complex contracts. Specifically, Num-
Scout first uses LLM-based pruning to exclude functionsarXiv:2503.10041v1 [cs.SE] 13 Mar 2025
Page 2:
2
unrelated to numerical operations or token transfers. This step
is designed to mitigate the path explosion problem in symbolic
execution and accelerate the analysis process of the tool. Due
to the complex semantics and call relationships of contracts,
static pruning methods based on simple rule matching fail
to meet the requirements. LLMs can perform reasoning and
analysis at the high-level semantic layer and across multi-level
calls. By leveraging a multi-role collaboration strategy, they
reduce response randomness and error, thereby accomplishing
the pruning task effectively. Then, based on predefined patterns
and a symbolic execution framework, the tool performs sym-
bolic execution at the bytecode level, incorporating features
from the source code for further analysis. It focuses on key
operations such as comparisons and transfers, and identifies
defects through various methods, including constructing and
analyzing expression operator order trees, extracting compar-
ison statements from bytecode, and analyzing token flows.
To demonstrate the prevalence of the five defined numerical
defects and evaluate the efficacy of NumScout, we filter
6,617 real-world smart contracts which are frequently used
by users on Ethereum [13], ensuring that the contracts in
our experimental dataset have actual value rather than toy
contracts. We apply NumScout to these 6,617 smart contracts
and find that 1,774 contracts contain at least one of the five
defined defects. Then, we randomly sample contracts with
a 95% confidence level and a 10% confidence interval for
manual labeling. The results show that the tool achieves an
overall precision of 89.7%. In addition, we conduct ablation
experiments to verify the effectiveness of GPT-based pruning.
The experiments demonstrate that pruning enables symbolic
execution to quickly enter the target functions, improving
runtime speed by 28.4% and detecting more defects.
The main contributions of our work are as follows:
•We summarize and define five new types of numerical
defects based on analyzing 1,199 audit reports. For each
defect, we provide its definition with a code example
for better illustration. Furthermore, we outline possible
solutions to enhance development security.
•We develop NumScout, the first tool designed for the de-
fined numerical defects. NumScout employs LLM prun-
ing functions and recovers source-level features from
bytecode during symbolic execution to identify designed
defect patterns more efficiently.
•We evaluate NumScout’s performance on 6,617 real-
world smart contracts and discover that 1,774 contracts
contain at least one defined defect. Moreover, in a man-
ually labeled dataset created through random sampling,
our approach achieves an overall precision of 89.7%.
•We make the source code of NumScout, all experimental
data, and analysis results publicly available, along with
detailed Markdown files at https://github.com/NumScout/
NumScout.
II. B ACKGROUND
A. Numerical Operations in Solidity and Integer Overflow
Solidity is the most popular programming language for
smart contracts on Ethereum. The computations in Soliditysmart contracts are performed using arithmetic opcodes, e.g.,
ADD andMUL [14]. Due to the inherent characteristics of the
language and the limitations of the blockchain platform, for
example, maintaining the consistency of the public ledger and
reducing computational resource consumption, Solidity only
supports integers and does not support floating-point numbers,
which can introduce certain numerical issues. In traditional
numerical detection, integer overflow is one of the most
common defects in smart contracts [15], [16], [17], [18]. An
integer overflow defect occurs when the result of an arithmetic
operation exceeds the range of its data type, producing an
outcome that deviates from expectations. Since smart contracts
typically use integers to represent asset amounts and other
numerical values, calculations involving these numbers may
experience overflow or underflow under malicious input from
attackers, resulting in asset loss. Several notable attacks have
occurred due to this defect, including BeautyChain token
(BEC) [19] attack, SmartMesh token (SMT) [20] attack, and
UselessEthereumToken token (UET) [21] attack.
The developer community has built security libraries to
prevent overflows, such as the widely adopted SafeMath
library [7], developed by the well-known blockchain security
team OpenZeppelin [22], which ensures the correctness of
calculation through boundary checks. Starting from version
v0.8.0, the Solidity compiler introduces arithmetic checking
mechanisms [8], which embed overflow detection into the
compiled bytecode. If an overflow occurs during a transaction,
the EVM [1] will throw an error and revert. However, although
traditional integer overflows have been largely mitigated, in-
creasingly complex contract scenarios are giving rise to new
types of numerical defects that are easily overlooked.
B. Smart Contract Audit Report
Smart contract auditing is an important process in the
blockchain ecosystem, focusing on identifying vulnerabilities
and defects in smart contract code. Auditors from professional
auditing teams assess the code to identify potential defects,
ensuring that the contract operates as intended and adheres to
best practices. Audit reports provide a comprehensive analysis
of smart contracts, detailing all identified defects and their
impacts, assigning severity levels, and offering recommended
remediation strategies. These reports serve as essential docu-
mentation for developers, investors, and users, enhancing the
transparency and trustworthiness of the project’s contracts.
Given the irreversibility of blockchain transactions, thorough
auditing is vital to prevent financial losses and maintain the
integrity of DApps.
C. Large Language Models
Large Language Models (LLMs) [23], [24] are deep
learning-based natural language processing models that pos-
sess powerful language understanding and generation capabil-
ities. The GPT (Generative Pre-trained Transformer) series,
developed by OpenAI [25], is a prominent representative
of LLMs. GPT utilizes the Transformer [26] architecture
and is trained on extensive corpora, including source code
descriptions of various programming languages and known
Page 3:
3
defects. With this knowledge, GPT can understand and inter-
pret source code, enabling zero-shot learning [27]. The latest
version, GPT-4o [28], supports a 128k context length, making
it suitable for complex and multi-step tasks. While LLMs and
GPT have shown significant potential in fields such as smart
contract analysis, trustworthiness and accuracy remain critical
research challenges [29], [30].
Multiple studies have demonstrated that LLMs exhibit ex-
cellent code understanding capabilities. They have great ability
in understanding code syntax and semantics, including Ab-
stract Syntax Tree (AST) and Control Flow Graph (CFG) [31].
LLMs have been applied in multiple fields that require code
understanding [32]. For example, they are used for analyzing
inconsistencies in code comments [33]. They also serve as the
foundation for developer assistance tools [34]. Furthermore,
in the field of smart contract vulnerability detection, LLMs
act as code understanding tools to identify logical vulnerabil-
ities [35], [36], [37], [38], [39].
D. Symbolic Execution
Symbolic execution-based defect detection for smart con-
tracts primarily involves symbolizing the storage variables
and external inputs within the contract. Smart contracts are
typically executed on the Ethereum Virtual Machine (EVM).
The EVM features a stack-based architecture and is responsi-
ble for interpreting and executing the opcodes of contracts.
To describe the execution flow of contracts more clearly,
the Control Flow Graph (CFG) is often utilized. The CFG
represents the program’s basic blocks and their control flow
relationships, aiding the analysis of the reachability of different
execution paths. During the symbolic execution process, a set
of path constraints is maintained for each explored execution
paths. These constraints consist of conditions related to sym-
bolic variables, which describe the current execution state of
the contract. The satisfiability modulo theories (SMT) solver
is used to evaluate these constraints and determine whether
specific conditions are satisfied, such as identifying inputs that
may trigger vulnerabilities or verifying the solvability of con-
straints after adding new conditions. The typical workflow of
symbolic execution tools is as follows: Execute the program’s
opcodes sequentially, symbolize variables and external inputs
as they are encountered, update the program context state and
add new conditions to path constraints during the process.
While exploring all the executable paths of the program, the
tool assesses the satisfiability of security-related conditions to
detect potential security issues.
Traditional symbolic execution methods often encounter the
path explosion problem, which can prevent the completion of
detection within a reasonable timeframe. To address this issue,
pruning methods are necessary to mitigate path explosion
and accelerate the analysis process. As the complexity of
smart contracts has increased in recent years, particularly in
terms of semantics and call relationships, traditional pattern-
based pruning methods tend to be less effective. In contrast,
LLMs can recognize high-level semantics and multi-level
calls, making them well-suited for completing the pruning
tasks of complex contracts.III. N EWNUMERICAL DEFECTS
In this section, we explain how the five new types of
numerical defects are identified and provide definitions and
examples for each defect.
A. Data Source
To identify and define new types of numerical defects,
we analyze 1,199 audit reports collected by DAppScan [40].
DAppScan is a public dataset containing audit reports collected
from the official websites, social media, and Web3 sites of
29 well-known blockchain security teams, such as Openzep-
pelin [22] and Consensys [41]. These audit reports serve as a
rich resource, revealing numerous numerical defects found in
real-world projects. We adopt a keyword matching approach
to filter reports content related to numerical defects, while
employing Snowball Sampling [42] strategy to ensure the
completeness of the keyword list. Initially, we filter the audit
reports by matching the keywords “precision” and “rounding”.
During the review of the report content, we record new
keywords related to numerical defects and add them to the
keyword list for filtering new reports. Ultimately, we filter
a total of 194 audit reports using 25 keywords for further
analysis. For the complete keyword list, please refer to our
online repository.
B. Audit Report Analysis
1) Manual Filtering: In the previous subsection, we de-
scribe the collection of 194 publicly available audit reports
from renowned blockchain security teams. However, some of
these reports are not directly related to numerical defects. For
example, certain reports mention “precision loss” but only
discuss its risk or offer general advice to users, instead of
QSP-6 Truncation of fixed -point could result in sensitive
collateral liquidation calculation
Severity: Medium Risk
Status: Fixed
Description: contracts/oracle/ProxyOracle.sol : multiplication is performed
after a truncation division in a series of integer calculations. This leads to miscalculation
and will lead to a financial loss over time or cause unexpected results. For instance,
contracts/oracle/ProxyOracle.sol : L77 , L89 -L90 , and L96 -L97 . In
addition, taking function asETHCollateral() as an example:
1. getETHPx() = 100.5 and amount = 0.05.
2. Before truncation = 50.25
3. After truncation = 50.00
4. collateralFactor = 10,000
5. Final value = 50.000 and value lost close to 0.5% from original 50.25
2020 -12-18 update: Alpha team stated that it is intended. The deviation is bounded by
borrowFactor /10000 (in wei). The maximum value for borrowFactor value will
be in the order of 10^6, bounding the error by ~100 wei, which will be less than a block’s
interest accrued.
Recommendation: Examine the influence of precision loss to the position health check
carefully. Make sure to perform multiplications before the divisions. In addition, could
make use of standard fixed -point libraries to enlarge the precision as much as pos sible.
There is no native or favorite standard implementation yet. OpenZeppelin has future
plans to include one but there are a few current widely -used libraries. Reference: Title
Label
Description &
Root Cause
Recommendation
Fig. 1: Example of a card of audit reports
Page 4:
4
detailing specific defects in the code. Therefore, we manually
remove reports that lack specific defect descriptions. After
filtering, we find 109 reports directly related to numerical
defects from the initial set of 194 security reports.
2) Open Card Sorting: To ensure accuracy, we use the open
card sorting [11] approach to analyze and categorize the fil-
tered audit reports related to numerical defects. In this process,
we consider two aspects to ensure the representativeness and
significance of the defects, i.e., the reproducibility of the code
issue and severity as assessed by the security teams. Some
issues may be tightly coupled with specific applications and
not reproducible; we do not classify these as representative
defects. Additionally, we focus on the labels assigned by
security teams in the reports to assess the severity of the
identified defects.
For each numerical defect mentioned in the audit reports, we
create a card comprising four sections to organize the content.
Following the detailed steps outlined in [43], we begin by
randomly selecting 40% of the cards for the first round of
classification. First, we read the titles and descriptions of the
reports to understand the relevant defects. Next, we inspect the
problematic code to identify the root cause and cross-reference
it with the audit reports. Finally, we review the recommended
solutions suggested by the security teams to understand how
to address the defects and record the severity level assigned
by the team.
In the second round of classification, two authors indepen-
dently categorize the remaining 60% of the cards following the
same steps described in the first round. We then compare their
results and discuss discrepancies. Next, we remove uncommon
defects and ultimately classify the remaining defects into five
types. Among the classified reports, 12 are labeled as high,
33 as medium, and 75 as low severity.
Figure 1 shows a card example of an audit report describing
a numerical defect. The card contains a title, assigned label,
description, root cause, and recommended solution. From the
report, we learn that the contract contains a defect where
division is incorrectly performed before multiplication. We
then locate the referenced code (i.e., contracts/oracle/Prox-
yOracle.sol: L77, L89-L90, and L96-97) to further confirm
the root cause and verify the presence of this defect. The
report also provides an example of miscalculations caused by
this defect, demonstrating its exploitability and the potential
consequences. Due to the reproducibility of this defect and its
frequent occurrence in audit reports, we classify it as a distinct
defect type named “ Operator Order Issue ”.
C. Defects Definition
Based on the analysis of the audit reports mentioned in
the previous section, we have summarized five new types of
numerical defects. Table I provides a brief definition of each
defect, followed by a detailed definition and code example for
each defect pattern.
(1) Div In Path. The Solidity programming language does
not support floating-point numbers, so all division operations
result in integer division [44]. When the result is not a whole
number, only the integer part is retained, leading to precisionTABLE I: Definitions of the Five Defects
Contract Defect Definition
Div In PathThe use of division in comparison condi-
tions affects the execution path.
Operator Order
IssueDividing before multiplying amplifies pre-
cision loss.
Minor Amount
RetentionWhen multiple parties share profits, in-
divisible amounts remain trapped in the
contract and cannot be withdrawn.
Exchange
ProblemErrors in token amount calculations during
token exchanges create rounding issues or
profit opportunities.
Precision Loss
TrendIncorrect rounding methods lead to unrea-
sonable allocation of precision loss.
loss. If division is used within a conditional statement, this
inherent precision loss can potentially alter the program’s
execution path and cause unexpected results. Consequently,
users may be misled by this defect and pass incorrect values.
The blockchain security team ChainSecurity [45] issues a
warning about this defect in their audit report for the Angle
Protocol Borrowing Module project.
Example: As shown in Figure 2, users can purchase tokens
by sending ether through the getTokens function. The internal
conditional check restricts the minimum amount of ether,
with the minAmount set to 3. However, users who are
not familiar with Solidity may assume that sending more
than 3ether will satisfy the condition, and the program
will enter the expected execution branch for token purchase.
In reality, the requirement is that msg.value must exceed
4ether . For amount between 3ether and4ether , e.g.,
3.5ether/ 1ether = 3 instead of 3.5, this condition is still
not met, preventing users from buying tokens. If the contract
does not handle such situations, users may lose their funds
without receiving any tokens. Malicious contracts can exploit
this defect to scam inexperienced users.
1function getTokens( address _to, uint256 _amount)
public payable returns (bool ) {
2 if(msg.value / 1 ether > minAmount) {
3 /*buy tokens */}}
Fig. 2: An example of Div In Path defect.
(2) Operator Order Issue. The most common defect regard-
ing calculation order is performing division before multiplica-
tion. This defect leads to incorrect calculation results because
multiplication can amplify the precision loss introduced by
division. Therefore, in programming practices, when both mul-
tiplication and division appear in an expression, it is generally
recommended to perform multiplication first and then division
to minimize precision loss. However, in today’s increasingly
complex contracts, developers often overlook this principle,
and cases where division is done before multiplication fre-
quently occur. Operator Order Issue is also the most common
defect in audit reports. The security team QuillAudits [46]
includes a warning about this defect in their audit report for
Page 5:
5
theAlium Finance Smart Contract project.
Example: Figure 3 shows the updatePool function, which
handles the logic for retrieving staking rewards almReward ,
with 10% of the rewards allocated to the developer, i.e.,
devReward . This calculation results in a precision loss of
one decimal. If almReward = 199 , then devReward =
(199/100)∗10 = 10 . However, the calculation of 10% of
almReward should result in 19. If the code devReward
= almReward.mul(10).div(100) is used instead, then
the result will be correct. This defect can lead to financial
losses for developers over time.
1function updatePool( uint256 _pid) public {
2 // deduct 10% for the developers
3 uint256 devReward = almReward.div(100).mul(10);
4 _safeAlmTransfer(devaddr, devReward);}
Fig. 3: An example of Operator Order Issue defect.
(3) Minor Amount Retention. This defect typically arises
in scenarios where multiple participants share rewards or
withdraw funds. On blockchain platforms, numerous game
contracts involve players investing funds to participate, with
winners dividing the rewards. During the distribution of re-
wards, if the total amount is not divisible by the number of
users, a small portion of the funds will remain stuck in the
contract, unable to be withdrawn. If the withdrawn tokens are
tied to a liquidity pool, leftover tokens could affect the ratio,
leading to economic losses. The security team Dedaub [47]
issues a warning about this defect in their audit report for the
GoodGhosting project.
Example: The code shown in Figure 4 comes from an
investment game contract that incentivizes players to partic-
ipate in the game and maintain the investment plans. Player
can withdraw their funds and claim interest rewards generated
within the game through the withdraw function. These inter-
est rewards are evenly distributed among all winning players.
There is a defect where the totalGameInterest may not be
divisible by winners.length , resulting in a minor amount
of funds remaining in the contract and being unable to be
withdrawn. The retained amount can affect the contract’s state,
making actions that reference that state unsafe. Specifically, if
there are other contracts associated with the daiToken balance
of this investment pool, such as trading pairs composed of
daiToken and other tokens, the retained amount can impact
the ratio between the two, leading to financial security issues.
1function withdraw() external virtual {
2 // calc interest reward shared by all winners
3 payout = payout.add(totalGameInterest.div(
winners. length ));
4 require (IERC20(daiToken). transfer (msg.sender ,
payout),"Fail to transfer");}
Fig. 4: An example of Minor Amount Retention defect.
(4) Exchange Problem. Token exchanges are fundamental
in various scenarios, such as purchasing tokens, providing
liquidity, and trading tokens. If the numerical operationsinvolved in the exchange process are not handled properly,
issues may arise, including exchange rounding and zero-cost
profit opportunities. The former results in users losing their
input tokens while receiving zero output tokens. The latter
allows users to obtain output tokens without providing any
input tokens. The security team Trail of Bits [48] reports this
defect in their audit of the Balancer Finance project.
Example: As shown in Figure 5, the function joinPool
allows users to inject assets into the liquidity pool and receive
corresponding pool shares (which is also an ERC20 [49] token,
referred to as pool token). Therefore, there is a token exchange
process involved here. The user inputs poolAmountOut to
indicate the amount of pool tokens they want to receive.
Internally, the function calculates the exchange ratio based on
the desired amount of pool tokens and the total amount of
pool tokens. It then calculates the number of liquidity tokens
that the user needs to contribute based on the current total
of liquidity tokens held by the pool. During this process,
users may receive pool tokens without having to contribute
any liquidity tokens. The calculation method for bmul is
as follows c=(a∗b)+BONE
2
BONE. Thus, the final expression for
tokenAmountIn is:
tokenAmountIn =bal∗poolAmountOut
poolTotal+BONE
2
BONE
BONE is set to 1018. Suppose the condition bal∗
poolAmountOut
poolTotal<5∗1017is satisfied, a quantity of
poolAmountOut pool tokens will be generated, while
the user contributes no liquidity tokens, resulting in
tokenAmountIn = 0. This situation occurs if the token has
low liquidity or has decimals precision lower than 18, e.g.,
USDT [50], USDC [51], and XRP [52], which all hold high
market values, have only 6 decimals. According to data from
Etherscan, these three tokens are all ranked in the top 5 by
market capitalization [53].
1function joinPool( uint poolAmountOut, uint []
calldata maxAmountsIn) external _logs_ _lock_ {
2 // calc swap ratio with input and poolTotal
3 uint poolTotal = totalSupply();
4 uint ratio = bdiv(poolAmountOut, poolTotal);
5 uint bal = _records[t]. balance ;
6 // calc amount user should contribute with ratio
7 uint tokenAmountIn = bmul(ratio, bal);}
Fig. 5: An example of Exchange Problem defect.
(5) Precision Loss Trend There are three rounding methods
for division: rounding down, rounding to the nearest integer,
and rounding up. In Solidity, the default behavior for division
is rounding down, which returns the largest integer less than
or equal to the exact division result, denoted as floor (x).
Rounding to the nearest integer adjusts the division result up
or down based on the decimal place, and in formulas, it is
expressed by adding (denominator/ 2)to the numerator ,
denoted as round (x). Rounding up returns the smallest integer
greater than or equal to the normal division result, expressed
as adding (denominator −1)to the numerator , denoted as
Page 6:
6
ceil(x). Using different rounding methods can cause different
tendencies in the calculation results, and incorrect tendencies
can lead to unexpected consequences. The security team Peck-
Shield’s [54] audit report on OneSwap includes this defect.
Example: In Figure 6, the function dealWithPoolAnd
CollectFee is responsible for handling transactions within
the trading pool and collecting fees. At this point, the fee
calculation uses the standard floor (x)method, rounding the
result down. The tokens amounts user receives is the total
amount minus the fee, and user obtains the small fractional
amount discarded during the rounding down process. This
means the calculation tends to favor users, allowing users
to receive more tokens. However, in AMM-based DEX sce-
narios [55], the calculation should favor the liquidity pool to
protect the interests of liquidity providers. Therefore, the fee
should be rounded up, ensuring that more tokens remain in the
liquidity pool. Use fee = (amountToTaker *feeBPS
+ 9999) / 10000 to replace the original code.
1function _dealWithPoolAndCollectFee(Context memory
ctx, bool isBuy) internal returns (uint ) {
2 // calc transaction fee
3 uint fee = amountToTaker *feeBPS / 10000;
4 // calc amount user gets after deducting fee
5 amountToTaker -= fee;
6 _transferToken(token, ctx.order. sender ,
amountToTaker, ctx.isLastSwap);
7 return amountToTaker;}
Fig. 6: An example of Precision Loss Trend defect.
Another scenario affected by this defect is unfair distribution
of benefits as shown in the Figure 7 below. At this point,
BB receives a rounded-up value, while AA corresponds
the rounded-down value. It is crucial to carefully consider
the tendency of precision loss and apply the most suitable
rounding method for different scenarios.
1function _updatePrices() internal {
2 // calc AA’s earnings based on ratio
3 AAGain = gain *trancheAPRSplitRatio /
FULL_ALLOC;
4 // sub AA’s earnings from total to obtain BB’s
5 BBGain = gain - AAGain;}
Fig. 7: Another example of Precision Loss Trend defect.
IV. M ETHODOLOGY
In this section, we introduce the methods for detecting
the aforementioned defects. We first provide an overview of
our approach, followed by detailed explanations of two main
components: GPT-based pruning and symbolic execution. For
the latter, we further elaborate on instruction-level details and
operational features.
A. Overview
Figure 8 presents an overview of NumScout. NumScout
consists of two main components: the GPT-based functionpruning and the symbolic execution detection tool. Specif-
ically, users provide Solidity source code as input. If the
code is too lengthy and exceeds the input limit of GPT,
we perform subgraph segmentation. This involves analyzing
the contract’s abstract syntax tree (AST) and constructing a
call graph starting from each entry function based on the
internal call relationships. The call graph is then used to
break the large contract into several smaller contracts, each
with a complete function call chain. The segmented code is
subsequently passed to GPT for pruning.
The pruning component of GPT involves four roles: a Clas-
sifier, two Verifiers, and a Combiner, which work collabora-
tively to enhance the accuracy of pruning. The Classifier gen-
erates preliminary relevance judgments based on whether the
functions in the contract involve numerical operations or fund
transfers, classifying them as either “related” or “unrelated”
and sending the results to the two Verifiers. Each Verifier
independently verifies a specific subset of the classification
results. One Verifier focuses on the “related” part, ensuring that
all functions classified as related are indeed associated with
computations or fund transfers. The other Verifier focuses on
the “unrelated” part to prevent mistakenly discarding functions
that may have implicit relevance. After the verification is
completed, the two Verifiers send their respective results to
the Combiner. The Combiner integrates the feedback from
both Verifiers and makes final adjustments to the classification
results to ensure the accuracy of pruning. The pruning results
are subsequently passed to the symbolic execution detection
tool for further defect analysis.
The symbolic execution detection tool contains four main
components [56]:the Inputter ,Feature Detector ,CFG Builder ,
andDefect Identifier . The Inputter accepts user-provided So-
lidity source code and GPT’s pruning results as input. It
compiles the source code using various versions of Solidity
compiler to obtain the bytecode and AST, and utilizes the API
provided by Geth [57] to disassemble bytecode into opcodes.
The AST is analyzed to extract source mappings [58] for
further analysis by other components. The CFG Builder per-
forms symbolic execution and dynamically constructs the CFG
while skipping the pruned function paths. It records key events
(i.e., stack events, memory events, and call events) to detect
defect features. During the CFG construction, the Feature
Detector identifies feature operations and maintains required
data structures for detection (i.e., expression information,
conditional comparisons, token flows, and internal&external
function calls). For each defect, the Defect Identifier provides
the final detection results based on the predefined features with
the help of satisfiability modulo theories (SMT) solver [59].
B. Functions Pruning with GPT
Motivations for Pruning .Traditional symbolic execution
methods suffer from the problem of path explosion. To miti-
gate this problem and speed up the analysis process, we utilize
LLMs for pruning by discarding entry function paths that are
unrelated to numerical or transfer operations in the contract.
It allows the symbolic execution framework to reach target
functions more quickly.
Page 7:
7
Functions Pruning with GPT Symbolic Execution Detection Tool
Source Code
{"related": […],
"unrelated": […]}
{"correct":[…],
"wrong": […]} {"correct":[…],
"wrong": […]}
{"related": […],
"unrelated": […]}Role: Merge results
from partners.Role: Classify function
selectors by relevance
to numerical operations.
Role: Verify related
functions part.Role: Verify unrelated
functions part.
The Split ContractsCode Graph
Partitioning
Feature DetectorConditional
Comparison
Expression
Analysis
Rounding mode
Internal &
external callsToken
Flow
Operators
Order Tree
CFG
CFG Builder
Classifier
CombinerVerifier_1 Verifier_2
AST Analysis & Disassembly
Pruning
resultsSymbolic Execution
Inputter
Defect
Identifier
Exchange
ProblemDiv In Path…
Defect
Results
Fig. 8: An overview of the approach of NumScout.
Existing pattern-based pruning methods or manually de-
signed heuristics struggle to handle scenarios involving com-
plex semantics and intricate call relationships [60], [38], [37].
Take transfer operations as an example, the implementation of
transfer functions in smart contracts is diverse, using different
variables for accounting and may be hidden in multi-level
function calls, executed conditionally based on complex logic.
These complexities make it difficult for pattern-based methods
to accurately capture the semantics of such operations. In
contrast, as stated in Section II-C, multiple studies have
demonstrated that LLMs possess powerful code understanding
and reasoning capabilities and can perform outstandingly in
various software engineering tasks across multiple fields [32].
LLMs can recognize complex semantics, analyze intricate
calls, and determine whether a function is related to numerical
operations or transfer operations from the source code level.
Therefore, we use LLMs to accomplish the pruning task. For
the possible inaccuracy and randomness in LLMs responses,
we employ a multi-role collaboration strategy [36] to mitigate
these issues. It assigns distinct roles to different LLMs, with
each role focusing on a specific aspect of the analysis. They
cross-verify the outputs of others to enhance overall accuracy
and reliability.
The pruning process .Although LLMs are capable of un-
derstanding and analyzing complex semantics, their responses
may exhibit hallucinations [61] and uncertainty [62]. A multi-
role collaboration strategy can reduce errors and randomness
in the output [36] and improve the effectiveness of pruning.
The pruning process involves four roles: a Classifier, two
Verifiers, and a Combiner. Considering both performance and
cost, GPT-4o [28] is selected as the large language model
to implement these roles. As shown in Figure 9, to further
enhance the reliability of the answers and minimize the impact
of randomness in GPT’s output, we adopt the “mimic-in-the-
background” prompting method, which is inspired by Sun
et al. [35]. With this method, GPT is prompted to simulate
answering the same question five times in the background. Themost frequently occurring answer is then selected to ensure
higher consistency.
1) Rough Classification by Classifier: The Classifier is
responsible for preliminary analysis and classification of func-
tion selectors, with the prompt template shown in Figure 9.
The prompt follows the zero-shot chain-of-thought (CoT) [27]
method. CoT guides LLMs to break down complex tasks
into step-by-step logical reasoning sequences when answering
questions, which enhances their reasoning capabilities and im-
proves response reliability. [%Classification Requirements%]
specify that the Classifier’s task is to roughly categorize
function selectors based on whether they are related to token
numerical operations or balance changes, dividing them into
two categories: “related” and “unrelated”. The function selec-
tor is a 4 byte id that Solidity uses to identify functions. [%CoT
Steps%] provides specific reasoning steps to guide GPT’s
inference process. First, it identifies which functions perform
numerical operations on token or ether amounts. Then, it
examines the relationships between function calls to determine
which public functions from [%Function Selectors list%] call
those numerical operation functions. Finally, it classifies them
as either “related” or “unrelated”. [%Constraints%] section
highlights errors that GPT must avoid, e.g., “Do not add
functions that are not in the list”.
2) Secondary verification by the Verifiers: Verifier 1 is
responsible for verifying “related” part of the classification
results from Classifier, while Verifier 2 is responsible for
“unrelated” part. The results are categorized into two cate-
gories: “correct” and “wrong”. The prompt templates used
by both are shown in Figure 9. Following the approach in
[36], we define more detailed capabilities, responsibilities, and
constraints for GPT. [%Abilities%] require GPT to act as an
excellent smart contract code reviewer, familiar with function
calls. [%Responsibilities%] describe the task of examining
the results from the Classifier and making judgments based
on those results. [%Constraints%] define the required output
format and certain types of errors that are not permissible.
Page 8:
8
Classifier prompt Template
You need t o answer [%Classification
Requirements %]
Think step by step: [%CoT Steps%]
Your response : [%Output JSON
Format%]
Do not: [%Constraints%]
All public functions :[%Function
Selectors List%]
The code I provide is:
[%Code%]Verifier prompt Template
Your responsibility is to [%Verification
Requirements%]
# [%Abilities%]
# [%Responsibilities%]
# [%Constraints%]
# Classifier’s result:
The code I provide is:
[%Code%]Combiner prompt Template
Your responsibility is to [%Combination
Requirements%]
# [%Constraints%]
# Classifier’s result:
# Verifiers_1’s result:
# Verifiers_2’s result:
The code I provide is:
[%Code%]System prompt
You are a smart contract auditor. You will be asked questions related to code properties. You can mimic answering them in the
background five times and provide me with the most frequently appearing answer. Furthermore, please strictly adhere to the ou tput
format specified in the question; there is no need to explain your answer.
Fig. 9: The Prompt Template Used by Roles.
3) Merging Results by Combiner: Combiner synthesizes
the reports from other partners to derive the final pruning
results. [%Constraints%] emphasize that Combiner cannot
simply merge the results but must exercise its own judgment.
Additionally, it is essential for GPT to focus on the functions
that highlight contradictions between the Classifier and the
Verifiers when making the final determination.
C. Symbolic Execution Detection Tool
1) Operational Semantics Modeling: In this subsection, we
model the syntax of several basic instructions, variables, and
functions that form the foundation of the core analysis module
for our detection tool.
We first present the operational semantics of two instruc-
tions related to function calls as follows. CALL (f, o, l ): Calls
the target function f, with parameters loaded from memory
starting at address owith length l.JUMPI (c, t): If the jump
condition cis satisfied, it jumps to location tin the program.
During symbolic execution, certain key data structures
are updated according to each executed EVM instruction.
Specifically, Srepresents the operand stack, Mdenotes the
simulated temporary memory space, and GS represents the
storage values of state variables. During path exploration, the
constraints accumulated in the SMT solver are defined as the
variable cons . Our method utilizes source-level information
provided by source mappings to assist in identifying defective
code locations during symbolic execution.
Expression Information Recovery. During the execution of
the symbolic execution framework, expressions are simplified,
resulting in the loss of expression information, i.e., operands
and operators. Therefore, it is necessary to maintain a structure
vto store the information before the expression simplification.
To utilize the expression information, we define a function
e(a, v)to recursively retrieve and recover information of the
given expression afrom structure v, represented as x, op, y :=e(a, v). This allows us to analyze the original logical structure
and meaning of expressions during symbolic execution.
Comparison Semantic Recognition. In Solidity, there are
three types of conditional statements: if,require , andassert .
Theifstatement enters the else branch when the condition
is not satisfied, whereas both require andassert revert the
transaction and throw an exception. Compared to require and
assert , which have rollback protection and thus minimize the
impact on the user in case of errors, ifstatements pose a
greater risk as they alter the execution path. Therefore, when
analyzing patterns involving division within paths, the primary
focus is on ifstatements, making the identification of if
statements crucial.
The key to recognizing ifstatements lies in identifying
the comparison operator and retaining the two elements being
compared. The comparison operator recognition is accom-
plished by matching source code with opcode sequences.
Before executing the opcodes in a basic block, we first do
a match process. If a corresponding opcode sequence and
source code are matched, a trigger is set, and the relevant
comparison operator is recorded. When executing the cor-
responding comparison opcode, e.g., GT orLT, the two
top values on the stack are captured. At this point, the two
expressions being compared are obtained, which are used for
subsequent detection. We define function cs(S)to retrieve
the comparison elements x,y, and the comparison operator
cop. The seqrepresents the opcode sequence of comparison
operation. Once these comparison elements are identified, we
can further analyze the logic behind the conditional statements
and track the execution flow during symbolic execution. This
is crucial for defect detection, especially in scenarios where
conditional logic may lead to different execution paths.
x, cop, y =cs(S),S:=< x, y >, cop :=match (seq)
GT|LT, JUMPI (jc,∗)
Page 9:
9
amount =ea(S, M),
S:=<∗,∗, amount, ∗,∗,∗,∗>
ctx(transfer ), CALL (f,∗,∗)
amount :=M[o+x], x > 4 & x < l
ctx(transfer ), CALL (f, o, l ), S:=<∗,∗,∗, o, l,∗,∗>
S:=< t, amount, arg 2, ... >
ctx(transfer ), JUMP (t)(1)
External & Internal Function Calls. We focus on two
types of transfer function calls: one where the token contract
calls its own transfer function and another where it calls an
external token contract’s transfer function. For the former, the
key opcode is JUMP , with parameters obtained from the
stack. For example, when calling its own transfer function
transferFrom(from, to, amount) , the elements on
the stack are arranged from top to bottom as follows: jump
destination, amount, to and from.
For the latter, the key opcode is CALL , which first needs
to retrieve the parameters’ memory locations from the stack
and then access the parameters from memory. For example,
when calling an external token contract’s transfer function
token.transfer(to, amount) , the seven elements on
the stack, from top to bottom, are: gas consumed, token
contract address, ether amount to transfer, starting memory
position of parameters ( o), length of parameters( l), starting
memory position and length for storing the return data. Our
tool reads the memory values to extract parameters based on
the fourth and fifth elements. Specifically, the first four bytes
at the starting memory position represent the function selector,
i.e.,memory [o:o+ 4], followed by 32 bytes for amount ,
and another 32 bytes for to.
Recognizing internal and external calls is crucial for Num-
Scout to acquire token transfer parameters, which is essential
for detecting defects related to transfer amounts. Thus, we
define the function eain Eq. 1 to retrieve the expression for
the transfer amount from stack and memory.
2) Defects Detection: To detect five new types of numerical
defects in smart contracts, NumScout utilizes a symbolic exe-
cution framework to explore contract paths. A predefined se-
mantic model assists in identifying execution states to capture
key features and locate defects. In the following paragraphs,
we provide a detailed explanation of how these defects are
discovered in smart contracts.
(1) Div In Path: The tool first needs to locate the if
comparison statements. Using the opcode sequence matching
method mentioned above, it applies function csto extract
the comparison operator and two elements. The presence of
this defect requires three conditions to be satisfied. First, the
division operation must be on the left side of >operator, and
if subtraction is involved, the sides must be switched. Second,
the division must be indivisible. Third, the two expressions
being compared must contain user input values, meaning that
the user input can influence the comparison result and thusaffect the program’s execution path. The reason why the first
condition is necessary is that if the division operation appears
on the left side of <, e.g., if(a/100<3) , then users may
believe that the condition is satisfied when a <3∗100, which
is indeed correct. The case where the division operation is
on the right side of >is discussed in Section III-C, where
inexperienced users are more likely to be misled. The ≤
operator can be seen as an alternative branch of >, and both
can be viewed as the same situation; similarly, ≥and<
represent another equivalent situation.
Function Eis used to extract the symbolic variables within
the expressions. Function i(a, b)checks whether acan divide
bwithout leaving a remainder, which could lead to precision
loss. The SMT solver adds this condition to the constraints and
solves for the satisfiability. The detection rule for this defect
is shown in Eq. 2.
(2) Operator Order Issue: The defect of computation order
requires recovering the information of the expression amount
from e(amount, v )to construct an operator order tree, which
is accomplished by function bt. This function ensures that all
operators are organized in a hierarchical structure, preserving
their precedence as defined in the original expression. Then,
a depth-first traversal of the tree is performed. This step is
carried out by the d fsfunction, which takes the operator order
tree returned by btas input. It begins traversal from the root
node and explores each node following a depth-first strategy,
inspecting the operator types along the way. If the pattern that
division occurs before multiplication appears along any path,
it indicates the presence of Operator Order Issue .
tree:=bt(e(amount, v )), d fs(tree)
ctx(transfer ), amount =ea(S, M)
(3) Minor Amount Retention: The defect of Minor Amount
Retention requires not only that the transfer amount expression
presents the possibility of being indivisible, but also that
no other path exists to transfer the total ether or tokens
held by the contract; otherwise, the retention defect does not
exist. The expressions below illustrate our detection logic
forMinor Amount Retention , where all possible paths for
transferring ether and tokens are defined as path , and the path
for transferring all ether or tokens is denoted as P.
(x,÷, y) :=e(amount, v ), i(x, y),∄P∈path
ctx(transfer ), amount =ea(S, M)
T:=E(x)∪E(y),∃s∈T, s∈Input, i (a, b)
ctx(if), amount =ea(S, M),(x, >| ≤, y) :=cs(S),(a,÷, b) =e(amount, v )(2)
Page 10:
10
(4) Exchange Problem: Our tool records the token flow in
a structure tduring the symbolic execution if it identifies
a transfer operation. This is used to detect token exchange
defects. Each flow consists of three elements: from ,to, and
amount . The function fis defined to find the two types
of tokens involved in the exchange and their corresponding
exchange amounts from structure t. From the contract’s per-
spective, in= 0∧out̸= 0 indicates that the user may gain
profit for free. Conversely, the scenario in̸= 0∧out= 0
indicates that rounding errors may occur during the exchange.
Both conditions are passed to the SMT solver for checking. If
the result is satisfiable, it can be confirmed that the Exchange
Problem defect exists.
(in, out ) :=f(t), in= 0∧out̸= 0|in̸= 0∧out= 0
ctx(transfer )
(5) Precision Loss Trend: To detect the defect of Precision
Loss Trend , the function rtis first used to parse the expression
and determine the rounding method. Specifically, it analyzes
the operands and operators restored from the expression. If the
numerator has been incremented by (denominator-1) before the
division operation, the rounding type is identified as rounding
up, i.e., ceil(x). Then, the token flow’s from ,to, and rounding
method are analyzed together using function ct. If the rounding
method is ceil(x)and the flow is outgoing from the contract,
or if there are two or more flows with the same from
but different toand different rounding methods, the contract
contains this defect. The former indicates that the direction of
precision loss does not meet the requirements for maintaining
the liquidity pool, while the latter suggests that there is unfair
reward distribution.
r:=rt(e(amount, v )), ct(from, to, r )
ctx(transfer ), amount =ea(S, M)
V. E XPERIMENT
In this section, based on an open-source dataset, we first
conduct a small-scale experiment and evaluate the effective-
ness of NumScout. We also perform detection on the large-
scale dataset to confirm the situation of numerical defects
in real-world contracts. Ablation experiments are conducted
to demonstrate the effectiveness of the GPT-based pruning
component.
A. Experimental Setup
The experiment is conducted on a server running Ubuntu
22.04.2 LTS, with a configuration of 20 Intel Xeon Platinum
8360H CPUs and 200GB of memory.
Dataset. To determine the prevalence of the defined defects
in real-world Ethereum smart contracts, we utilize an open-
source dataset from a GitHub repository [13], which stores
the source code of all verified smart contracts on Etherscan up
to July 13th, 2023. We downloaded this dataset on Septem-
ber 20th, 2024, and selected contract files deployed on the
Ethereum mainnet, totaling 331,382 mainnet contracts. The
dataset provides a summary file with basic information about
each contract, e.g., contract’s address, ether balance, compilerversion, and total number of transactions. To filter for valuable
contracts, we apply two criteria: total transactions >100 and
ether balance >0. We further classify contracts by compiler
version and remove those that cannot be compiled. These
two filtering conditions ensure that the selected contracts are
actively used by users in real-world scenarios, rather than
toy contracts. This selection allows the experimental results
to better reflect the tool’s performance in detecting defects
in widely used real-world contracts. Ultimately, we obtain
6,617 contracts.
We compare our dataset with SmartBugs [63], a widely
used dataset. Table II highlights key characteristics of both
datasets. It is evident that our dataset contains more complex
smart contracts than those in SmartBugs. Specifically, the
average lines of code (LOC) and the number of instructions of
contracts in our dataset are 11.5X and5.5X higher compared
to SmartBugs, respectively. Additionally, the average number
of public/external functions and state variables in our dataset
are approximately 3Xand 4Xhigher than in SmartBugs.
83.4% of the contracts in our dataset require a Solidity
compiler version higher than v0.8.0, whereas 99.4% of the
contracts in SmartBugs rely on versions below v0.5.0.
TABLE II: Features of Our Dataset vs. SmartBugs.
DatasetFeaturesLOC #of Instrs #of Funs #of State Vars
Ours 1155.9 8505.4 35.5 25.4
SmartBugs 99.9 1545.5 12.5 6.6
Evaluation Metrics. We outline the following research ques-
tions (RQs) to assess the effectiveness of NumScout.
•RQ1: What is the efficacy of NumScout in detecting the
five new types of defined numerical defects?
•RQ2: How effective is NumScout in detecting defects
within our large-scale dataset?
•RQ3: How effective is the pruning component based on
GPT?
B. Answer to RQ1: Evaluation of NumScout
To answer RQ1, we randomly sample a subset from the
large-scale dataset for a small-scale experiment, where all
samples are checked and labeled manually. Specifically, to
determine the sample size, we follow a sampling method based
on confidence intervals [64] to generalize the detection results
from the sample to the overall dataset. We set a 10 confidence
interval and 95% confidence level, calculating the required
sample size to be 95. We randomly select a sample dataset
and run NumScout on it. Two of the authors manually label
the results of all samples carefully. We first collaboratively
discuss and label 30% of the sample results to establish and
confirm the labeling criteria. Then, we independently label
the remaining 70% of the results, followed by a comparison
and integration of the final results. We separate true positives
(TP), false positives (FP), true negative(TN) and false negative
(FN) during the labeling process to analyze the performance
of NumScout. This method is also employed in other related
works [65], [66], [67].
Page 11:
11
TABLE III: Defects in Samples and Evaluation of NumScout.
Defect all TP FP FN Precision(%) Recall(%) F1-score(%)
Div In Path 7 7 0 1 100.0 87.5 93.3
Operator Order Issue 7 7 0 3 100.0 70.0 82.4
Minor Amount Retention 19 15 4 5 78.9 75.0 76.9
Exchange Problem 3 3 0 1 100.0 75.0 85.7
Precision Loss Trend 3 3 0 1 100.0 75.0 85.7
Table III displays the performance of NumScout on the
labeled samples. The fifth to seventh columns show the
number of TP, FP, FN in the samples, respectively. We use
Precision P=TP
TP+FP, Recall R=TP
TP+FN, and F1-score
F1 =2∗P∗R
P+Rto measure the detection performance for each
type of defect. Additionally, we calculate the overall precision
to demonstrate the effectiveness of NumScout. It can be
calculated asPn
i=1pci×|ci|Pn
i=1|ci|, where pcirepresents the precision
of detecting defect i, and|ci|is the number of defect iin
our dataset. NumScout achieves 100% precision in detecting
theDiv In Path ,Operator Order Issue ,Exchange Problem
andPrecision Loss Trend . For the Minor Amount Retention , it
reports them at 78.9% precision. Overall, the comprehensive
precision reaches 89.7%.
False Positives. Our experimental results contain some
false positives in the detection of Minor Amount Retention
due to an inability to recognize specific transfer paths. In
certain contracts, users input a percentage number to withdraw
funds from the contract as Figure 10. The contract provides
aclearStuckBalance function, where the input parameter
amountPercentage represents the percentage of the total
balance that the owner intends to withdraw. The function first
ensures that the input amountPercentage does not exceed
100%. It then calculates the withdrawal amount by multiplying
the contract’s total balance amountBNB by the input per-
centage and dividing by 100. Finally, it transfers the computed
amount to the designated wallet marketingWallet . Users
can withdraw funds at a 100% ratio, which creates a path that
allows all funds to be transferred out. However, our tool fails
to identify this special path for proportional fund withdrawal.
Instead, due to its inability to track and analyze dynamic fund
withdrawal conditions, it interprets the contract as potentially
retaining a minor amount of funds. Consequently, this lim-
itation leads to false positives when identifying the Minor
Amount Retention defect.
1function clearStuckBalance( uint256 amountPercentage)
external onlyOwner {
2 require (amountPercentage <= 100);
3 uint256 amountBNB = address (this ).balance ;
4 payable (_marketingWallet). transfer (amountBNB.mul
(amountPercentage).div(100));}
Fig. 10: A FP case of Minor Amount Retention defect.
False Negatives. We find that among the 95 samples, 11
are false negatives. All of the missed reports are caused by
path explosion. Specifically, these contracts contain multiple
branches in their CFG, leading to a huge search space. Toavoid path explosion, we limit the tool’s maximum loop
iterations, the depth of path exploration, and the execution
time. As a result, NumScout fails to detect the locations of
these defects. It is worth noting that the main purpose of
GPT-based pruning is to discard unrelated entry function paths,
allowing the symbolic execution framework to reach the target
function more quickly. However, it does not address the issue
of overly deep search paths within the function.
For example, in one of the missed contracts1, there are seven
require statements and five ifconditional statements (lines
903-929) preceding the defective code, making the search
paths extremely complex and causing NumScout to miss the
defect as a consequence.
To mitigate false negatives caused by path explosion, the
following optimization strategies can be considered. One ap-
proach is to introduce a heuristic search strategy. LLMs rank
all functions based on their relevance to numerical operations
and the risk level of fund transfers, prioritizing the exploration
of paths more likely to contain defects. LLMs can also inte-
grate with dynamic symbolic execution to intelligently adjust
subsequent search directions based on previously explored
paths. Additionally, preprocessing complex control flow struc-
tures helps simplify the search space by flattening excessively
nested loops and conditional branches where appropriate.
C. Answer to RQ2: Defects Detection in a Large-Scale
Dataset
To address RQ2, we run NumScout on the source code of
all the collected 6,617 verified smart contracts, which includes
the 95 samples in RQ1. Table IV provides the numbers and
frequency of each new type of numerical defect in contracts on
Ethereum. NumScout only identifies whether a defect exists
in the contract, so if the same type of defect appears multiple
times, we count it only once.
Minor Amount Retention is the most common defect in our
dataset, present in approximately 15.1% of the smart contracts.
About 8.6%, 4.7%, and 1.7% of contracts contain the Div
In Path ,Minor Amount Retention , and Precision Loss Trend
defects, respectively. Moreover, the proportions of Exchange
Problem are all below 1%, with 39 (0.60%) smart contracts
containing this defect.
Additionally, the experimental results indicate that 45 smart
contracts contain 3 types of the 5 defined defects, while 194
smart contracts include 2 types of defects. Overall, as reported
by NumScout, there are 1,774 smart contracts that have at least
one type of defect in our dataset, which accounts for 26.8%
of all contracts.
1aaf740FD71093520C457642eb9219A4F6dA22190
Page 12:
12
TABLE IV: Defects in Large-Scale Dataset.
Defect # Defects Percentage(%)
Div In Path 561 8.6
Operator Order Issue 306 4.7
Minor Amount Retention 983 15.1
Exchange Problem 39 0.60
Precision Loss Trend 114 1.7
Contracts with Minor Amount Retention Defects. The
large-scale experiment reveals that the number of contracts
containing Minor Amount Retention defects is significantly
higher than that of other defects. We find that many projects
encounter cases where the profits cannot be divided evenly
during distribution. Although, in the long term, the retained
amount is only a small portion, represented as a random
variable parameterized by the number of players, we consider
that these small retained balances might be referenced by
other contracts. Attackers may exploit this situation to inflict
potentially substantial losses on other contracts.
D. Answer to RQ3: Ablation Experiment Results
In RQ3, we evaluate the effectiveness of the GPT-based
pruning component. Specifically, we conduct an ablation ex-
periment on the selected samples by removing the GPT-based
pruning component. In this setup, the tool does not receive the
list of functions unrelated to numerical operations or transfers,
forcing it to explore all execution paths. The results show
that the tool with pruning runs 28.4% faster than the version
without pruning and identifies two additional Operator Order
Issue defects. Specifically, to detect more defects, we set a
time limit of 1,800 seconds and a search depth limit of 500
during the experiment. Additionally, we allow a longer SMT
solver satisfiability checking time of 600 seconds at critical
verification points. For each contract, the average runtime of
the tool without the GPT-based pruning component reaches
1,518.56 seconds, while incorporating the GPT-based pruning
component reduces the average runtime to 1,182.19 seconds.
The average cost of the entire pruning process for a single
contract on GPT-4o is only $0.008. Figure 11 illustrates a
defect detected in the RQ1 experiment but missed during the
ablation experiment. The tool must execute the sell function
first, following specific paths to modify certain variables
before the defect condition at line 6 is satisfied. The sell
function contains 9 function calls, 7 conditional statements,
and about 40 numerical operations, resulting in a huge search
space for symbolic execution.
Notably, our tool not only identifies the code location where
the defect is triggered but also provides the entire call path,
helping developers trace the defect’s origin. From the results,
we observe that the pruned version of the tool reaches the
defect trigger point twice through different paths within the
time limit. In contrast, without pruning, the tool wastes exe-
cution time in other unrelated functions, preventing symbolic
execution from reaching the critical path within the time limit.
The ablation experiment confirms that pruning enables the toolto enter target functions more quickly, improving detection
speed and identifying more defects.
1function exit() public {
2 if(_tokens > 0) sell(_tokens);
3 withdraw();}
4function withdraw() onlyStronghands() public {
5 uint256 _dividends = myDividends( false );
6 _customerAddress. transfer (_dividends);}
Fig. 11: A case of undetected Operator Order Issue defect in
the ablation experiment.
VI. D ISCUSSION
A. Case Study
We present a real-world case2from the tool’s report to
illustrate how a user loses funds due to the new types of
numerical defect and demonstrate the importance of detecting
these defects reported by NumScout. Figure 12 displays a
simplified code snippet from the affected contract.
The user can purchase tokens by sending ether when
calling the sale function, but the token amount is calculated
using a divide-then-multiply order. Given that the value of
cloudsPerEth on the current blockchain is 800,000, inex-
perienced users may assume that 1,000,000,000,000,000 wei
(i.e., 0.001 ether) is equivalent to 800,000 tokens, which means
1,250,000,000 wei is sufficient to buy 1 token. However,
if the user sends less than 0.001 ether, the integer division
results in amount = 0. Since the contract does not check the
token amount exchanged, the transaction does not revert but
continues to execute, leaving the users without any received
tokens and causing them to lose the ether they sent. Our
tool identifies this defect from two aspects: Operator Order
Issue and Exchange Problem . The former is detected by
analyzing the expression operator tree, while the latter is
detected through token flow analysis.
1function sale() payable {
2 uint256 amount = ( msg.value / 1000000000000000)
*cloudsPerEth;
3 balances[ msg.sender ] += amount;
4 balances[owner] -= amount;
5 Transfer (owner, msg.sender , amount);}
Fig. 12: Code snippet of the Operator Order Issue and Ex-
change Problem case.
We verify the funds loss process on the local test network
with two test accounts. The first account deploys the contract,
being the holder of the total token supply (i.e., owner ), sets
cloudsPerEth to match its current value on the mainnet,
and enables the trading switch. The second account (i.e.,
msg.sender ) calls the sale function, sending 0.0005 ether
and expecting to receive 400,000 tokens. The result shows
that while the ether balance of msg.sender decreases, token
balance remains 0. Meanwhile, the ether balance held by
the contract increases by 0.0005 eth, and the owner ’s token
20x3c07b3f4a6e253915d83c86707f0af07521d1cd8
Page 13:
13
balance does not decrease. The verification script is available
in our online repository.
B. Implications
For Researchers. Blockchains based on the EVM and sup-
porting smart contract development in Solidity may exhibit
similar numerical defects, though they may exhibit different
patterns due to variations in blockchain characteristics. This
possibility enables researchers to conduct further analysis and
suggests new directions for future research.
For Practitioners. For developers, the defined defects aid
in gaining a deeper understanding of numerical operations
involved in smart contracts, particularly issues related to
rounding and precision loss. It reminds developers to pay
attention to minor precision losses and to improve testing
efforts. These numerical defects can serve as coding guidance
for developers during contract development to ensure robust-
ness. For auditors, it enhances their awareness of the security
about numerical operations, encouraging them to adopt more
comprehensive auditing strategies.
For Investors and Users. It is important for investors and
users to be cautious about potential numerical defects in con-
tracts, which are often hidden within complex mathematical
operations and can be difficult to detect. Additionally, our tool
can help identify losses that may arise from numerical defects
and in flagging contracts that might exploit these defects for
fraudulent purposes.
For Educators. In smart contract development courses, ed-
ucators should provide best practices and share known cases
for avoiding numerical defects. This helps students recognize
these defects and the serious consequences they may cause.
C. Threats to Validity
Internal Threats. One potential internal threat in our study is
that we did not analyze all available audit reports, which may
have led to the omission of some numerical defects. However,
we mitigated this risk by utilizing an iterative information
retrieval strategy to extract as many audit reports related to nu-
merical defects as possible. The reports collected through this
keyword-based approach help ensure comprehensive coverage
and minimize the risk of missing relevant defects. Another
internal threat arises from the high complexity of the smart
contracts in our dataset, which makes the symbolic execution
process highly time-consuming. Additionally, new types of
numerical defects often involve division operations, which
are computationally difficult for SMT solvers and require
significant time to process. We address the execution time
issue using GPT-based pruning, and ablation results confirm
the effectiveness of this approach.
External Threats. Our dataset is filter based on specified
criteria, which may have excluded numerical defects present in
other contracts. However, by filtering contracts with more than
100 transactions and non-zero balances, our dataset reflects
the numerical defects found in frequently used real-world
contracts rather than those in test or toy contracts, providing
a better evaluation of our tool’s effectiveness. During the
manual labeling process, there may be instances of incorrectlyclassifying false negatives and true negatives. To address this,
we adopt a double-check mechanism and update the labeled
dataset in a timely manner to ensure accuracy.
D. Possible Solutions for the five numerical defects
In this subsection, we provide recommendations for de-
velopers to avoid introducing the defined five types of nu-
merical defects in contracts. Section III presents defect code
examples from audit reports, along with suggestions provided
by security teams. We summarize the recommended fixes
from the remaining audit reports, listing brief solutions for
each type of defect in Table V. It is worth noting that in
theOperator Order Issue defect, when changing the code
from division before multiplication to multiplication before
division, it requires careful consideration of overflow risks.
For example, in the defective code shown in Figure 12, if
line 2 is modified to uint256 amount = msg.value *
cloudsPerEth / 1000000000000000 , it is important
to note that msg.value ∗cloudsPerEth might exceed the
maximum value of uint256. As a result of the overflow, it
will become a small number. This situation could occur when
a user transfers a large amount of ETH to exchange for tokens,
potentially causing losses of user funds. If the contract uses
Solidity version v0.8.x, the compiler will automatically insert
overflow checks into the bytecode, eliminating the need for
developers to handle overflow risks. For versions lower than
v0.8.0, developers should use the SafeMath library to prevent
potential overflow.
TABLE V: Possible Solutions for the five Defects.
Defect Possible Solution
Div In PathUse multiplication instead of division in
conditional statements.
Operator Order IssueMultiply first and divide later, but be
cautious of overflow.
Minor Amount
RetentionImplement a function to withdraw all
funds.
Exchange ProblemCheck the calculation results before the
transfer.
Precision Loss Trend Consider who bears the loss of precision.
General AdviceConduct thorough rounding tests. Avoid
letting the liquidity pool bear preci-
sion loss. Ensure consistent precision
between both tokens in the swap.
Several additional suggestions for numerical operations
are as follows: Developers need to thoroughly test rounding
boundaries and rounding effects before deploying the contract.
If there is a precision rounding issue in trading pool or lending
pool, it is best not to let the liquidity pool bear the precision
loss. Instead, calculations should favor the liquidity pool to
ensure the pool remains balanced. Check the precision of the
two assets being exchanged to prevent unexpected results due
to differing precisions. For example, most ERC20 standard
tokens have 18 decimals [49], while tokens like USDT have
only 6 decimals. Some contracts do not handle these situation,
leading to security problem. If calculations indeed involve
Page 14:
14
assets with different precisions, prioritize using the asset with
lower decimals for calculations. Then, derive the amount of
the asset with higher decimals through multiplication. This
way, all mathematical operations are based on multiplication,
avoiding the creation of decimal units. To validate the effec-
tiveness of these solutions, we randomly select 10 contracts
for each type of defect and apply the recommended fixes. We
then analyze these revised contracts using NumScout, and the
results show that NumScout reports no defects.
VII. R ELATED WORK
A. Smart Contract Defects
Chen et al. propose the first research that defines smart
contract defects from the developers’ perspective [43]. They
collect posts from StackExchange and use an open card sorting
method to discover and categorize 20 types of contract de-
fects. Additionally, they design a survey to gather developers’
feedback and concerns regarding these defects. In another
work, they introduce a tool named DefectChecker [68], which
detects these defined defects by analyzing the bytecode of
contracts. However, their research do not cover the new types
of numerical defect that arise in smart contracts. Specifically,
they defines UnmatchedTypeAssignment , which focuses
on mismatches between assignments and types, potentially
leading to integer overflow. This differs from the new nu-
merical defects we focus on, which can result in transaction
execution errors.
B. Tools for smart contract defects detection
Many program analysis tools focus on detecting traditional
numerical defects. Luu et al. proposed the first symbolic
execution-based tool, Oyente [65], which simulates EVM
instruction execution and explores different execution paths
to construct CFG. It uses the Z3 SMT solver to determine
whether vulnerability conditions are satisfied, enabling the
detection of overflow vulnerabilities. Torres et al. introduced
a framework called Osiris [69] that identifies three types of
integer-related defects in Ethereum smart contracts through
taint analysis: Arithmetic Bugs, Truncation Bugs, and Signed-
ness Bugs. Additionally, other static analysis tools such as
MAIAN [70], Securify [71], Ethainter [72], Sailfish [73],
Mythril [74], and Slither [75] have also been developed to
detect defects in Solidity smart contracts. Meanwhile, tools
like ContractFuzzer [66], sFuzz [76], Smartian [77], and
Echidna [78] are based on dynamic testing and analysis.
C. Accounting Errors
Another type of smart contract defect involving numerical
operations is the Accounting Errors. They are specific to the
financial logic of the contract, focusing on incorrect financial
logic operations, such as adding fees to a user’s balance instead
of deducting them or directly summing tokens of different
units. The tool ScType [79] models financial operations and
high-level information in DeFi, e.g., token units, scaling
factors, and financial types, and leverages type propagation
and checking to detect Accounting Errors. ScType relies onspecific business contexts and requires manual completion of
initial type annotations. In contrast, our work complements
this work by focusing on issues arising from the nature
of numerical calculations themselves, such as precision loss
and improper operator order, which may lead to unexpected
behaviors during smart contract execution.
D. LLMs in Smart Contract Defect Detection
Currently, LLMs are widely used in smart contract defect
detection. Sun et al. propose GPTScan [35], the first tool
that integrates GPT with static analysis for detecting log-
ical vulnerabilities in smart contracts. This tool uses GPT
to identify key variables and statements, followed by static
analysis to verify potential vulnerabilities. Ma et al. introduce
the iAudit framework [36], which combines LLM fine-tuning
with a multi-role strategy to audit contracts through iterative
debates. Ding et al. propose SmartGuard [37], a framework
that retrieves semantically similar code, generates CoT, and
then utilizes LLMs for vulnerability identification. Wang et
al. present ContractTinker [38], which also employs CoT and
program static analysis to guide LLMs in repairing real-world
smart contract vulnerabilities. Wu et al. develop AdvSCanner
method [39], which uses static analysis to extract attack
flows related to reentrancy vulnerabilities and utilizes them
to guide LLMs in generating attack contracts that can exploit
reentrancy vulnerabilities in victim contracts. Our work uses
LLMs for pruning and combines symbolic execution tools to
confirm new types of numerical defects, which expands and
complements these existing works.
VIII. C ONCLUSION
There are two main parts in this paper: the definition of
defects and their detection. We summarize five new types
of numerical defect patterns from the audit reports provided
by the DAppScan dataset, which are collected from multiple
renowned blockchain security teams. These issues are consid-
ered high-risk and impact the execution results of programs.
For each defect, we provide code examples and possible
solutions. To identify defects in real-world smart contracts,
we develop a tool called NumScout, which utilizes GPT-based
pruning and symbolic execution to detect the aforementioned
five defined defects.
NumScout uses GPT-4o for pruning, removing functions
unrelated to numerical operations and transfers, thus enhancing
the efficiency of subsequent symbolic execution. The tool
performs symbolic execution at the bytecode level, combined
with source code features for analysis. Specifically, the tool
constructs and analyzes expression operator order tree, extract
the conditional statements of comparison from the bytecode,
analyzes token flows, and other methods to extract key fea-
tures. It reports defects based on predefined defect patterns
combined with source code mapping. Moreover, NumScout
supports all compiler versions and is extensible, allowing de-
velopers to write additional detection patterns to identify more
defects. Experimental results show that NumScout identifies
1,774 smart contracts containing at least one defined defect
in the dataset. Furthermore, NumScout achieves an overall
detection precision of 89.7%.
Page 15:
15
ACKNOWLEDGMENT
This work is partially supported by the Zhejiang Provin-
cial Key Project of Undergraduate Education and Teaching
Reform (JGZD2024060), the Zhejiang Provincial Higher Ed-
ucation Research Project & Special Research Project on Ar-
tificial Intelligence Empowering Education and Teaching Ap-
plications (KT2024007), the Sichuan Provincial Natural Sci-
ence Foundation for Distinguished Young Scholars (2023NS-
FSC1963), and the National Natural Science Foundation of
China (62332004).
REFERENCES
[1] V . Buterin etal., “A next-generation smart contract and decentralized
application platform,” white paper, vol. 3, no. 37, pp. 2–1, 2014.
[2] Z. Zheng, S. Xie, H.-N. Dai, X. Chen, and H. Wang, “Blockchain
challenges and opportunities: A survey,” International journal ofweb
andgrid services, vol. 14, no. 4, pp. 352–375, 2018.
[3] Z. Zheng, S. Xie, H. Dai, X. Chen, and H. Wang, “An overview of
blockchain technology: Architecture, consensus, and future trends,” in
2017 IEEE international congress onbigdata (BigData congress). Ieee,
2017, pp. 557–564.
[4] “Solidity programming language,” 2024. [Online]. Available: https:
//soliditylang.org/
[5] M. R. Lyu etal.,Handbook ofsoftware reliability engineering. IEEE
computer society press Los Alamitos, 1996, vol. 222.
[6] XBlock, “Smart contract defects-arithmetic issue,” 2024. [Online].
Available: https://xblock.pro/#/article/55
[7] OpenZeppelin, “The safemath library for solidity smart contracts,”
2020. [Online]. Available: https://docs.openzeppelin.com/contracts/3.x/
api/math
[8] E. Foundation, “Solidity v0.8.0 breaking changes,solidity 0.8.0
documentation,” 2020. [Online]. Available: https://docs.soliditylang.org/
en/v0.8.0/080-breaking-changes.html
[9] “Balancer,” 2024. [Online]. Available: https://docs.balancer.fi/
[10] BlockSec, “Tiny rounding down, big fund losses: An in-
depth analysis of the recent balancer incident,” 2023.
[Online]. Available: https://blocksec.com/blog/tiny-rounding-down-big-
fund-losses-an-in-depth-analysis-of-the-recent-balancer-incident
[11] D. Spencer, Card sorting: Designing usable categories. Rosenfeld
Media, 2009.
[12] J. Wei, Y . Tay, R. Bommasani, C. Raffel, B. Zoph, S. Borgeaud,
D. Yogatama, M. Bosma, D. Zhou, D. Metzler etal., “Emergent abilities
of large language models,” arXiv preprint arXiv:2206.07682, 2022.
[13] M. Ortner and S. Eskandari, “Smart contract sanctuary.” [Online].
Available: https://github.com/tintinweb/smart-contract-sanctuary
[14] smlXL, “Evm codes - an ethereum virtual machine opcodes interactive
reference,” 2024. [Online]. Available: https://www.evm.codes/
[15] N. Atzei, M. Bartoletti, and T. Cimoli, “A survey of attacks on
ethereum smart contracts (sok),” in Principles ofSecurity and Trust:
6thInternational Conference, POST 2017, Held asPart oftheEuropean
Joint Conferences onTheory andPractice ofSoftware, ETAPS 2017,
Uppsala, Sweden, April 22-29, 2017, Proceedings 6. Springer, 2017,
pp. 164–186.
[16] Z. Wang, H. Jin, W. Dai, K.-K. R. Choo, and D. Zou, “Ethereum smart
contract security research: survey and future research opportunities,”
Frontiers ofComputer Science, vol. 15, pp. 1–18, 2021.
[17] Z. A. Khan and A. S. Namin, “A survey on vulnerabilities of ethereum
smart contracts,” arXiv preprint arXiv:2012.14481, 2020.
[18] N. F. Samreen and M. H. Alalfi, “A survey of security vulnerabilities in
ethereum smart contracts,” arXiv preprint arXiv:2105.06974, 2021.
[19] “Bec token,” 2018. [Online]. Available: https://etherscan.io/address/
0xc5d105e63711398af9bbff092d4b6769c82f793d#code
[20] “Smartmesh,” 2018. [Online]. Available: https://etherscan.io/address/
0x55f93985431fc9304077687a35a1ba103dc1e081#code
[21] “Uselessethereumtoken,” 2018. [Online]. Available: https://etherscan.io/
address/0x27f706edde3ad952ef647dd67e24e38cd0803dd6#code
[22] “Openzeppelin,” 2024. [Online]. Available: https://www.openzeppelin.
com/
[23] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y . Hou, Y . Min, B. Zhang,
J. Zhang, Z. Dong etal., “A survey of large language models,” arXiv
preprint arXiv:2303.18223, 2023.[24] Y . Chang, X. Wang, J. Wang, Y . Wu, L. Yang, K. Zhu, H. Chen, X. Yi,
C. Wang, Y . Wang etal., “A survey on evaluation of large language
models,” ACM Transactions onIntelligent Systems and Technology,
vol. 15, no. 3, pp. 1–45, 2024.
[25] “Openai,” 2024. [Online]. Available: https://openai.com/
[26] A. Vaswani, “Attention is all you need,” Advances inNeural Information
Processing Systems, 2017.
[27] T. Kojima, S. S. Gu, M. Reid, Y . Matsuo, and Y . Iwasawa, “Large lan-
guage models are zero-shot reasoners,” Advances inneural information
processing systems, vol. 35, pp. 22 199–22 213, 2022.
[28] Openai, “gpt-4o,” 2024. [Online]. Available: https://platform.openai.
com/docs/models
[29] C. Chen, J. Su, J. Chen, Y . Wang, T. Bi, J. Yu, Y . Wang, X. Lin, T. Chen,
and Z. Zheng, “When chatgpt meets smart contract vulnerability detec-
tion: How far are we?” arXiv preprint arXiv:2309.05520, 2023.
[30] J. Chen, C. Chen, J. Hu, J. Grundy, Y . Wang, T. Chen, and Z. Zheng,
“Identifying smart contract security issues in code snippets from stack
overflow,” in Proceedings ofthe33rd ACM SIGSOFT International
Symposium onSoftware Testing andAnalysis, 2024, pp. 1198–1210.
[31] W. Ma, S. Liu, M. Zhao, X. Xie, W. Wang, Q. Hu, J. Zhang, and
Y . Liu, “Unveiling code pre-trained models: Investigating syntax and
semantics capacities,” ACM Transactions onSoftware Engineering and
Methodology, vol. 33, no. 7, pp. 1–29, 2024.
[32] Z. Zheng, K. Ning, Q. Zhong, J. Chen, W. Chen, L. Guo, W. Wang,
and Y . Wang, “Towards an understanding of large language models in
software engineering tasks,” Empirical Software Engineering, vol. 30,
no. 2, p. 50, 2025.
[33] D. Nam, A. Macvean, V . Hellendoorn, B. Vasilescu, and B. Myers,
“Using an llm to help with code understanding,” in Proceedings of
theIEEE/ACM 46th International Conference onSoftware Engineering,
2024, pp. 1–13.
[34] Y . Zhang, “Detecting code comment inconsistencies using llm and
program analysis,” in Companion Proceedings ofthe 32nd ACM
International Conference ontheFoundations ofSoftware Engineering,
2024, pp. 683–685.
[35] Y . Sun, D. Wu, Y . Xue, H. Liu, H. Wang, Z. Xu, X. Xie, and
Y . Liu, “Gptscan: Detecting logic vulnerabilities in smart contracts by
combining gpt with program analysis,” in Proceedings oftheIEEE/ACM
46th International Conference onSoftware Engineering, 2024, pp. 1–13.
[36] W. Ma, D. Wu, Y . Sun, T. Wang, S. Liu, J. Zhang, Y . Xue, and Y . Liu,
“Combining fine-tuning and llm-based agents for intuitive smart contract
auditing with justifications,” arXiv preprint arXiv:2403.16073, 2024.
[37] H. Ding, Y . Liu, X. Piao, H. Song, and Z. Ji, “Smartguard: An llm-
enhanced framework for smart contract vulnerability detection,” Expert
Systems with Applications, vol. 269, p. 126479, 2025.
[38] C. Wang, J. Zhang, J. Gao, L. Xia, Z. Guan, and Z. Chen, “Contract-
tinker: Llm-empowered vulnerability repair for real-world smart con-
tracts,” in Proceedings ofthe39th IEEE/ACM International Conference
onAutomated Software Engineering, 2024, pp. 2350–2353.
[39] Y . Wu, X. Xie, C. Peng, D. Liu, H. Wu, M. Fan, T. Liu, and
H. Wang, “Advscanner: Generating adversarial smart contracts to exploit
reentrancy vulnerabilities using llm and static analysis,” in Proceedings
ofthe39th IEEE/ACM International Conference onAutomated Software
Engineering, 2024, pp. 1019–1031.
[40] Z. Zheng, J. Su, J. Chen, D. Lo, Z. Zhong, and M. Ye, “Dappscan:
building large-scale datasets for smart contract weaknesses in dapp
projects,” IEEE Transactions onSoftware Engineering, 2024.
[41] “Consensys,” 2024. [Online]. Available: https://consensys.io/
[42] L. A. Goodman, “Snowball sampling,” The annals ofmathematical
statistics, pp. 148–170, 1961.
[43] J. Chen, X. Xia, D. Lo, J. Grundy, X. Luo, and T. Chen, “Defining
smart contract defects on ethereum,” IEEE Transactions onSoftware
Engineering, vol. 48, no. 1, pp. 327–345, 2020.
[44] Solidity, “Division — solidity 0.8.29 documentation,” 2024. [Online].
Available: https://docs.soliditylang.org/en/latest/types.html#division
[45] “Chainsecurity,” 2024. [Online]. Available: https://www.chainsecurity.
com/
[46] “Quillaudits,” 2024. [Online]. Available: https://www.quillaudits.com/
[47] “Dedaub,” 2024. [Online]. Available: https://dedaub.com/
[48] “Trail of bits,” 2024. [Online]. Available: https://www.trailofbits.com/
[49] V . B. Fabian V ogelsteller, “Erc-20: Token standard,” 2015. [Online].
Available: https://eips.ethereum.org/EIPS/eip-20
[50] “Usdt,” 2024. [Online]. Available: https://tether.to/
[51] “Usdc,” 2024. [Online]. Available: https://www.circle.com/en/usdc
[52] “Xrp,” 2024. [Online]. Available: https://www.ibcprotocol.dev/
[53] Etherscan.io, “Token tracker,” 2024. [Online]. Available: https:
//etherscan.io/tokens
Page 16:
16
[54] “Peckshield,” 2024. [Online]. Available: https://peckshield.com/
[55] “Uniswap,” 2024. [Online]. Available: https://uniswap.org
[56] S. Yang, J. Chen, and Z. Zheng, “Definition and detection of defects
in nft smart contracts,” in Proceedings ofthe32nd ACM SIGSOFT
International Symposium onSoftware Testing andAnalysis, 2023, pp.
373–384.
[57] “ethereum/go-ethereum,” 2024. [Online]. Available: https://github.com/
ethereum/go-ethereum
[58] Solidity, “Source mappings — solidity 0.8.29 documentation,” 2024.
[Online]. Available: https://docs.soliditylang.org/en/latest/internals/
source mappings.html
[59] L. De Moura and N. Bjørner, “Z3: An efficient smt solver,” in
International conference onTools andAlgorithms fortheConstruction
andAnalysis ofSystems. Springer, 2008, pp. 337–340.
[60] J. He, G. Sivanrupan, P. Tsankov, and M. Vechev, “Learning to explore
paths for symbolic execution,” in Proceedings ofthe2021 ACM SIGSAC
Conference onComputer and Communications Security, 2021, pp.
2526–2540.
[61] L. Huang, W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen,
W. Peng, X. Feng, B. Qin etal., “A survey on hallucination in large
language models: Principles, taxonomy, challenges, and open questions,”
ACM Transactions onInformation Systems, 2023.
[62] S. Ouyang, J. M. Zhang, M. Harman, and M. Wang, “Llm is like a
box of chocolates: the non-determinism of chatgpt in code generation,”
arXiv preprint arXiv:2308.02828, 2023.
[63] J. F. Ferreira, P. Cruz, T. Durieux, and R. Abreu, “Smartbugs: A frame-
work to analyze solidity smart contracts,” in Proceedings ofthe35th
IEEE/ACM international conference onautomated software engineering,
2020, pp. 1349–1352.
[64] Wikipedia, “Confidence interval,” 2024. [Online]. Available: https:
//en.wikipedia.org/wiki/Confidence interval
[65] L. Luu, D.-H. Chu, H. Olickel, P. Saxena, and A. Hobor, “Making smart
contracts smarter,” in Proceedings ofthe2016 ACM SIGSAC conference
oncomputer andcommunications security, 2016, pp. 254–269.
[66] B. Jiang, Y . Liu, and W. K. Chan, “Contractfuzzer: Fuzzing smart con-
tracts for vulnerability detection,” in Proceedings ofthe33rd ACM/IEEE
international conference onautomated software engineering, 2018, pp.
259–269.
[67] S. Kalra, S. Goel, M. Dhawan, and S. Sharma, “Zeus: analyzing safety
of smart contracts.” in Ndss, 2018, pp. 1–12.
[68] J. Chen, X. Xia, D. Lo, J. Grundy, X. Luo, and T. Chen, “Defectchecker:
Automated smart contract defect detection by analyzing evm bytecode,”
IEEE Transactions onSoftware Engineering, vol. 48, no. 7, pp. 2189–
2207, 2021.
[69] C. F. Torres, J. Sch ¨utte, and R. State, “Osiris: Hunting for integer bugs in
ethereum smart contracts,” in Proceedings ofthe34th annual computer
security applications conference, 2018, pp. 664–676.
[70] I. Nikoli ´c, A. Kolluri, I. Sergey, P. Saxena, and A. Hobor, “Finding
the greedy, prodigal, and suicidal contracts at scale,” in Proceedings of
the34th annual computer security applications conference, 2018, pp.
653–663.
[71] P. Tsankov, A. Dan, D. Drachsler-Cohen, A. Gervais, F. Buenzli, and
M. Vechev, “Securify: Practical security analysis of smart contracts,” in
Proceedings ofthe2018 ACM SIGSAC conference oncomputer and
communications security, 2018, pp. 67–82.
[72] L. Brent, N. Grech, S. Lagouvardos, B. Scholz, and Y . Smaragdakis,
“Ethainter: a smart contract security analyzer for composite vulnera-
bilities,” in Proceedings ofthe41st ACM SIGPLAN Conference on
Programming Language Design and Implementation, 2020, pp. 454–
469.
[73] S. Rao, R. Ramakrishnan, A. Silberstein, M. Ovsiannikov, and
D. Reeves, “Sailfish: A framework for large scale data processing,” in
Proceedings oftheThird ACM Symposium onCloud Computing, 2012,
pp. 1–14.
[74] Mythril, “Mythril,” 2023. [Online]. Available: https://mythril-classic.
readthedocs.io/en/master/module-list.html
[75] J. Feist, G. Grieco, and A. Groce, “Slither: a static analysis framework
for smart contracts,” in 2019 IEEE/ACM 2ndInternational Workshop on
Emerging Trends inSoftware Engineering forBlockchain (WETSEB).
IEEE, 2019, pp. 8–15.
[76] T. D. Nguyen, L. H. Pham, J. Sun, Y . Lin, and Q. T. Minh, “sfuzz: An
efficient adaptive fuzzer for solidity smart contracts,” in Proceedings of
theACM/IEEE 42nd International Conference onSoftware Engineering,
2020, pp. 778–788.
[77] J. Choi, D. Kim, S. Kim, G. Grieco, A. Groce, and S. K. Cha,
“Smartian: Enhancing smart contract fuzzing with static and dynamicdata-flow analyses,” in 2021 36th IEEE/ACM International Conference
onAutomated Software Engineering (ASE). IEEE, 2021, pp. 227–239.
[78] G. Grieco, W. Song, A. Cygan, J. Feist, and A. Groce, “Echidna:
effective, usable, and fast fuzzing for smart contracts,” in Proceedings of
the29th ACM SIGSOFT international symposium onsoftware testing
andanalysis, 2020, pp. 557–560.
[79] B. Zhang, “Towards finding accounting errors in smart contracts,”
inProceedings oftheIEEE/ACM 46th International Conference on
Software Engineering, 2024, pp. 1–13.