Authors: Peipei Liu, Jian Sun, Li Chen, Zhaoteng Yan, Peizheng Zhang, Dapeng Sun, Dawei Wang, Dan Li
Paper Content:
Page 1:
Control Flow-Augmented Decompiler based on Large Language Model
Peipei Liu1,Jian Sun1,Li Chen1,Zhaoteng Yan1,Peizheng Zhang1
Dapeng Sun1,Dawei Wang1,Dan Li2
1Zhongguancun Laboratory2Tsinghua University
{liupp, sunjian, lichen }@zgclab.edu.cn,
Abstract
Binary decompilation plays a crucial role in var-
ious tasks related to security threat analysis and
software engineering, such as binary vulnerabil-
ity detection and software supply chain analysis.
Current prevalent binary decompilation methods
primarily rely on large language models (LLMs)
and can be broadly classified into two main ap-
proaches: prompt-based decompilation and end-to-
end decompilation. Prompt-based methods typi-
cally require significant effort to analyze and sum-
marize the predicted data to extract aspect-specific
expert knowledge, which is then fed into a general-
purpose large language model to address specific
decompilation tasks. End-to-end methods, on the
other hand, carefully construct training datasets
or neural networks to perform post-training on
general-purpose large language models, thereby
obtaining domain-specific large language models
for decompiling the predicted data. However,
both existing approaches still face significant chal-
lenges, including the absence of rich semantic rep-
resentations of the input code and the neglect of
control flow information, which is crucial for ac-
curate decompilation. Furthermore, most current
decompilation techniques are specifically tailored
for the x86 architecture, making it difficult to effi-
ciently adapt and generalize them to other bit width
or instruction architectures. To address these lim-
itations, we propose a novel end-to-end decom-
pilation LLM, CFADecLLM , which aims to en-
hance existing end-to-end decompilation methods.
We conduct extensive experiments on the public
dataset Humaneval and Exebench across four op-
timization levels, and results demonstrate that our
approach outperforms existing methods in multiple
metrics, validating its effectiveness and superiority.
1 Introduction
Binary reverse engineering is the technical process of extract-
ing a program’s structure, logic, and functional information
from the compiled binary code [Kruegel et al. , 2004 ]. It
serves a critical role in the field of software security, with itsprimary applications spanning across the several areas: vul-
nerability analysis and exploitation [Szekeres et al. , 2013 ],
malware analysis [Egele et al. , 2008 ], software intellectual
property protection and copyright disputes, software security
assessment, protocol reverse engineering, and compatibility
analysis [Bossert et al. , 2014 ].
Decompilation, as a core technique of binary reverse engi-
neering, aims to transform machine-readable binary code into
human-readable high-level language code [Fuet al. , 2019 ].
By analyzing the control flow and data flow within the binary
code, decompilation reconstructs program structures, infers
variable names and types, and interprets function semantics,
ultimately producing a high-level representation that closely
approximates the original source code [David et al. , 2020 ].
Advanced decompilation techniques enable researchers to
quickly grasp program logic, thereby significantly improving
the efficiency of software security research.
Existing decompilation techniques can be broadly cate-
gorized into three types: (1) Tool and Expert Knowledge-
Based Methods. These approaches rely on specialized re-
verse engineering tools and the in-depth knowledge of ex-
perts regarding system architecture, compiler behavior, and
program semantics, and achieve structured analysis of the
program with control flow graph (CFG) construction, data
flow analysis, and symbolic execution [Yakdan et al. , 2015 ].
(2) Statistical Machine Learning and Deep Neural Network-
Based Methods. By modeling and learning the context and
structural features of the code, these approaches predict pat-
terns within binary code, thus significantly enhancing the
automation level in decompilation. However, their general-
ization ability remains limited when dealing with complex
control flows or heavily obfuscated code [Zhu et al. , 2024;
Liang et al. , 2021 ]. (3) Large Language Model-Based Meth-
ods. The success of large language models (LLMs) in natural
language processing tasks has inspired research into their ap-
plication in decompilation, bringing new heights in decompi-
lation efficiency and performance. Two main technical chan-
nels exist for LLM-based decompilation: prompt-based chan-
nel[Wong et al. , 2023; Hu et al. , 2024; Feng et al. , 2024 ]
and end-to-end channel [Tanet al. , 2024; Jiang et al. , 2024;
Feng et al. , 2024 ]. Within the prompt-based channel, tra-
ditional decompilation tools (such as Ghidra, IDA) are first
used to generate initial pseudo-code from binary code. Expert
knowledge is then applied to check the decompiled pseudo-arXiv:2503.07215v1 [cs.SE] 10 Mar 2025
Page 2:
code and design prompts. Finally, these prompts, along with
the pseudo-code, are input into a general-purpose large lan-
guage model to generate more precise decompilation results.
Different from the former approach, end-to-end methods first
disassemble binary code and regard the generated assembly
code as a domain-specific language (DSL). A training dataset
is then constructed in the form of 〈assembly code, source
code〉, which is used for further post-training on a general-
purpose large model to develop a decompilation-specific
large language model (e.g., CodeBERT, LLM4Decompile).
During inference, the model can directly capture the seman-
tics of assembly code and generate the corresponding high-
level language representation, thereby enhancing the accu-
racy and generalization capability of decompilation.
Currently, large language model-based decompilation
methods are gradually emerging as a mainstream approach.
Despite notable advancements in automating decompilation
and handling complex code, current LLM-based decompila-
tion methods face a key limitation: inadequate consideration
and utilization of the global control flow structure. This struc-
ture is important for capturing a program’s execution logic
and understanding the semantics of complex functions, as
evidenced by its critical role in previous research and many
applications such as binary code similarity detection, binary
code search, and third-party library identification [Yang et al. ,
2022; Jin et al. , 2022 ]. Moreover, existing methods are pri-
marily designed for the x86 architecture and are difficult to
extend to other instruction set architectures.
Based on the observation, we propose integrating the con-
trol flow structure information of binary code with large lan-
guage models to build a Control Fow-Agmented Decompiler
based on LLM (CFADecLLM ), which can also be applied
across different instruction set architectures, enabling effi-
cient cross-architecture decompilation.
Specifically, the achievement of CFADecLLM consists of
four steps. First, we compile the source code into binary
code across different instruction architectures and bit-widths.
Then, we disassemble the binary code using widely-used
tools to obtain control flow graphs and assembly code. To
leverage the strengths of different tools and enable comple-
mentary advantages in the generated assembly information,
we use objdump to generate assembly code and IDA to pro-
duce control flow graphs. Next, we represent decompilation-
related information, including the control flow graph and as-
sembly code, in a structured format and transform it into a
form that can be effectively interpreted and understood by
natural language models. Finally, we carefully design precise
natural language prompts, incorporating as many instruction
requirements and assembly details as possible to dynamically
guide the decompilation process during both training and in-
ference.
After training, we conduct experiments and analysis on two
publicly available datasets, ExeBench and HumanEval. The
results demonstrate the superiority of both our models com-
pared to previous research.
In summary, our contributions can be summarized as fol-
lows:
1. We propose integrating control flow graphs into large
models for decompilation, enabling us to fully leveragecontrol flow information to recover program logic. To
the best of our knowledge, this is the first work to incor-
porate control flow information into LLMs for decompi-
lation.
2. We meticulously construct a dataset where each sample
incorporates control flow information and instruction ar-
chitecture details through rich and flexible natural lan-
guage descriptions. Based on this dataset, we perform
post-training on a general-purpose large model to obtain
a domain-specific large model tailored for decompila-
tion.
3. Through extensive experiments and analyses, we
demonstrate the competitive performance of our model
compared to the current top-performing models.
2 Related Works
2.1 Decompilation
The origins of decompilation can be traced back to Cristina
Cifuentes’ groundbreaking work [Cifuentes, 1994 ], which in-
troduced interval theory-based techniques.
Traditional decompilation tools analyze a program’s CFG
for reverse engineering. The commercial Hex-Rays extension
for IDA Pro [Hex-Rays, 2025 ]is an industry standard, offer-
ing a balance of efficiency and usability for structural anal-
ysis. Open-source alternatives like Ghidra [Ghidra, 2025 ]
improve accessibility and transparency but often fail to ef-
fectively reverse compiler optimizations, reflecting assem-
bly code structure instead of reconstructing high-level source
code.
Phoenix [Brumley et al. , 2013 ]enhances control flow anal-
ysis through iterative techniques. DREAM [Yakdan et al. ,
2015 ]removes goto statements to generate more structured
code, though sometimes at the expense of readability. Revng
[Federico et al. , 2018 ]employs the Control Flow Combing
algorithm to minimize goto usage, which can result in code
bloat, while Revng-c [Gussoni et al. , 2020 ]generates goto -
free code to reduce complexity. In contrast, SAILR [Basque
et al. , 2024 ]argues that decompilers should preserve the
structure of the original source code, aiming to reverse trans-
formations that induce goto statements for better alignment
with the original code.
Building on DREAM, DREAM++ [Yakdan et al. , 2016 ]
introduces function inlining transformations but is limited by
its predefined handling of library functions, restricting scala-
bility. ERASE [Zhang et al. , 2024a ]addresses these limita-
tions by optimizing function inlining and improving data flow
handling, particularly in recursive inlining scenarios.
2.2 Neural Network-Based Decompilation
Drawing inspiration from neural machine translation (NMT),
decompilation has been reformulated as a translation prob-
lem between two programming languages. Early efforts uti-
lized recurrent neural networks to decompile binary code into
higher-level code, marking a significant departure from tra-
ditional rule-based decompilers [Katz et al. , 2018a ]. While
these approaches demonstrated the feasibility of applying
NMT to decompilation, they were limited in both accuracy
Page 3:
int imc_fits_dtype_string (char * dtype )
{
int fitsdtype = -1;
if (dtype != NULL && dtype [0] != ' \0')
{
if (strstr (dtype , "SHORT"))
{
…
return fitsdtype ;
}
Source Code
CFG
Disassembly Toolsendbr64
test % rdi,%rdi
je 87 <imc_fits_dtype_string+0x87>
push % rbp
cmpb $0x0,(% rdi)
mov % rdi,%rbp
je 80 <imc_fits_dtype_string+0x80>
lea 0x0(%rip),% rsi
callq 1e <imc_fits_dtype_string+0x1e>
mov %rax,%r8
...
Assembly Code
Structured Data InstanceThe assembly code is obtained by disassembling the
binary file (compiled by GCC) using objdump , and
its content is represented as a JSON dictionary:
\n{assembly_prompt }\n\n-The control flow graph
(CFG) information is extracted by IDA from the
binary file and is represented as another JSON
dictionary: \n{cfg_prompt }\n\\nThe high-level
source code is:
Prompt
LLM
int imc_fits_dtype_string (char * dtype )
{
int fitsdtype = -1;
if (dtype != NULL && dtype [0] != ' \0')
{
if (strstr (dtype , "SHORT"))
{
…
return fitsdtype ;
}
Source Code
Figure 1: Architecture of our proposed CFADecLLM
and complex binary structures. Building on this foundation,
Katz et al. [Katz et al. , 2019 ]identified the core challenge
as the information asymmetry between high-level program-
ming languages (PLs) and low-level PLs, and they proposed
the TraFix to improve sequence prediction tasks by enhanc-
ing alignment between low-level instructions and high-level
constructs. However, the automation and precision were still
limited.
Coda [Fuet al. , 2019 ]introduced an end-to-end neural
framework for decompilation, employing multiple models for
specific statement types, yet struggled with complex binary
code. Neutron [Liang et al. , 2021 ], an attention-based ap-
proach, improved accuracy by better mapping assembly to
high-level constructs, but a gap still remained. NeurDP [Cao
et al. , 2022 ]utilized graph neural networks to bridge the
gap for compiler-optimized binaries, introducing intermedi-
ate representations and Optimized Translation Units (OTU)
to enhance the decompilation of optimized binary code.
Recent advancements in decompilation have been moti-
vated by the success of Large Language Models, exploring
two primary approaches: prompt-based decompilation and
end-to-end decompilation. Prompt-based decompilation uses
LLMs to enhance traditional decompilers like Ghidra and
IDA Pro, improving readability and re-executability. For in-
stance, DeGPT [Huet al. , 2024 ]reduced cognitive load in
Ghidra by 24.4%, and DecGPT [Wong et al. , 2023 ]boosted
IDA Pro’s re-executability rate to over 75% by integrating er-
ror messages into the refinement process.
End-to-End decompilation fine-tunes LLMs to directly
generate high-level code from binaries. Early efforts, like
BTC [Hosseini and Dolan-Gavitt, 2022 ], demonstrated the
potential of transformers but struggled with complex binaries.
Slade [Armengol-Estap ´eet al. , 2024 ]introduced a smaller,
efficient model optimized for assembly, while Nova [Jiang
et al. , 2024 ]scaled the approach with a 1-billion-parameter
model, improving accuracy. However, open-source models
are still limited to around 200 million parameters. Feng et
al.[Feng et al. , 2024 ]proposed the sc2dec method to im-
prove performance without fine-tuning by recompiling de-
compilation results for in-context learning and aligning as-
sembly code with source code using debugging informa-tion. LLM4Decompile [Tanet al. , 2024 ], built on DeepSeek-
Coder, introduced models for both direct binary decompila-
tion and refining Ghidra’s outputs. It achieved notable gains
in readability and executability, and set a new performance
standard for decompilation.
2.3 Binary Comprehension
Focusing on recovering semantic information to optimize bi-
nary comprehension, previous studies have proposed various
optimization methods. These approaches aim to recover de-
tails such as function names, variable types, and data struc-
tures from stripped binaries. For instance, Nero [David et
al., 2020 ]and NFRE [Gao et al. , 2021 ]used encoder-decoder
frameworks, with Nero leveraging graph neural network
(GNN) and long short-term memory (LSTM) for function
name prediction and NFRE incorporating instruction-level
control flow information to reduce analysis costs. DEBIN [He
et al. , 2018 ]employed Conditional Random Field (CRF)
based dependency graphs to predict both function names and
variable information, while SYMLM [Jinet al. , 2022 ]inte-
grated calling context into function embeddings for enhanced
name predictions.
OSPREY [Zhang et al. , 2021 ]focused on variable type re-
covery, combining deterministic rules with probabilistic in-
ference for improved accuracy. TYGR [Zhu et al. , 2024 ]uti-
lized graph-based data-flow analysis with GNNs to enhance
type inference, and Epitome [Zhang et al. , 2024b ]applied
multi-task learning and vote-based tokenization to improve
function name prediction across different optimization levels,
using pre-trained assembly models and fine-grained CFGs for
better semantic understanding.
3 Method
As shown in Figure 1, this section presents the detailed imple-
mentation of our CFADecLLM . Section 3.1 presents the data
construction process and provides a detailed explanation of
its specific format. Section 3.2 explains how the control flow
graph is transformed into a natural language representation,
and Section 3.3 demonstrates the construction of instructions
for guiding the large language model in the decompilation
task.
Page 4:
1{
2 "arch": XXX, # arm/mips/x86
3 "mode":XXX, # 32/64
4 "opts": XXX, #O0,O1,O2,O3
5 "comassem": XXX # concatenate the entire assembly code of the function body (
compilable) using "\n" for line breaks.
6 "sourcecode": XXX # concatenate the target high-level C code using "\n" for line
breaks.
7 "assemgraph": {
8 "nodes":
9 [
10 {
11 "bytecode": ["55", "89", "e5", "83", "e4", "f0"], # the bytecode of all
instructions in a CFG block.
12 "addr": (Ox000001, Ox200001) # the starting and ending addresses of the
instructions in a CFG block.
13 "assemblock": ["push rbp", "mov ebp, esp", "and esp, 0xfffffff0"]
14 },# assembly instructions in a CFG block.
15 ... ...
16 ],
17 "edges":[(1,3), (0,4), (4,6), (1,5), ...], # a list of tuples, where each tuple
represents the indices of CFG blocks that have a connecting edge.
18 "nodeNum": XXX # the number of CFG blocks contained in the function body.
19 }
20}
Figure 2: Data instance within dictionary format
3.1 Dataset
We here first present the process of dataset collection and or-
ganization, then describe the representation of each data in-
stance including CFG, assembly code, and source code gen-
erated based on multi-optimization and dual-bitwidth compi-
lation1.
(1) Data Collection
We build our datasets on top of existing source code datasets,
Exebench [Armengol-Estap ´eet al. , 2022 ], a ML-scale,
function-level dataset that pairs real-world C code sourced
from GitHub with input-output (IO) examples enabling these
programs to be executed. Using 0.8 million C functions from
Exebench, we generate our datasets through the following
steps:
Step1 Resolve Data Type Conflicts. To generate 32-bit
executables of C functions, we address data type conflicts
between 32-bit standard libraries and Exebench-generated
.c files. For instance, we replace the 64-bit-specific def-
inition typedef unsigned long size t; used in
Exebench with typedef unsigned int size t;to
comply with the 32-bit program standards.
Step2 Compile. The .c files containing the source code of
the target functions are compiled using GCC into executables
with four different optimization levels (O0, O1, O2, and O3)
and two bitwidths (32-bit and 64-bit). After compilation, the
1Here, we use dual bit-width as an example to illustrate cross-X
(including different architectures) data processing.executables are stripped to remove debugging information for
further processing.
Step3 Disassemble. To ensure variety and accuracy in the
disassembled assembly code, we utilize both Objdump [Binu-
tils, 2025 ]and IDA Pro [Hex-Rays, 2025 ]for disassembly.
We also develop a custom IDA plugin to facilitate this pro-
cess. Additionally, when IDA cannot correctly identify the
boundaries of a target function, we use function boundaries
identified by Objdump to augment the IDA-generated as-
sembly code, ensuring comprehensive coverage of the target
function.
Step4 Generate CFG. Recognizing the importance of se-
mantically accurate CFGs, we develop an IDA plugin to con-
struct CFGs from the target function’s assembly code. Basic
blocks are formed by grouping instructions between branch
instructions. Using the disassembled code, IDA Pro identi-
fies transitions between blocks by analyzing jump and branch
instructions, thereby creating a precise representation of the
target function’s control flow.
(2) Structured Data Representation
After all the processing steps, we represent each data in-
stance in a structured dictionary format that contains vari-
ous functional and attribute information, making it easy to
use in following downstream tasks. The data representation
can be seen in Figure 2. This data structure comprises six
main features, including the instruction set architecture, bit
width, optimization level, assembly code generated by Obj-
dump, and the CFG. The CFG feature encodes information
about all basic blocks (“nodes”), the number of basic blocks
Page 5:
Python Code
1def convert_cfg(datapoint):
2 assembly_prompt = {
3 "instruction set architecture": datapoint["arch"],
4 "bit width": datapoint["mode"],
5 "compiler optimization level": datapoint["opts"],
6 "assembly code": datapoint["comassem"]}
7 cfg_prompt = {
8 "alist ofcfg_blocks": datapoint["assemgraph"]["nodes"],
9 "edges between two connected cfg_blocks": datapoint["assemgraph"]["edges"],
10 "cfg_block count": datapoint["assemgraph"]["nodenum"]
11 }
12 assembly_prompt = json.dumps(assembly_prompt)
13 cfg_prompt = json.dumps(cfg_prompt)
14
15 prompt4input = f"The assembly code isobtained bydisassembling the binary file (
compiled byGCC) using objdump, and its content isrepresented asaJSON
dictionary: \n{assembly_prompt}\n\n- The control flow graph (CFG) information is
extracted byIDA from the binary file and isrepresented asanother JSON
dictionary: \n{cfg_prompt}\n\nEach ’cfg_block’ in’alist ofcfg_blocks’
contains: \n -’assemblock’: The assembly instructions within the block.\n -’
bytecode’: The hexadecimal bytecode corresponding tothe block. \n -’addr’:
The tuple ofstart and end addresses ofthe instructions.\n\nThe high-level
source code is: "
Figure 3: Python function for converting CFG to NLP description
(“nodeNum”), and the transition edges between basic blocks
(“edges”). The “edges” represent an undirected graph, where
each value corresponds to the index of a basic block in the
“nodes”. Additionally, each basic block includes its byte-
code (“bytecode”), boundary addresses (“addr”), and assem-
bly code (“assemblock”). This unified representation pro-
vides not only low-level details about the function’s imple-
mentation logic but also rich contextual data, enabling more
accurate and interpretable model training during the decom-
pilation process.
3.2 Natural Language Transformation of
Structured CFG
In this section, we transform assembly code and the struc-
tured CFG information into human-readable natural language
descriptions to facilitate processing and understanding by
large language models. The details are shown in Figure 3.
We first improve the initial data construction format by us-
ing key-value pairs to clearly describe the specific roles and
functions of each assembly information in the decompilation
process, ensuring that each field has a clear and easily inter-
pretable semantic meaning.
Then, to enable the large model to better understand and
process this structured data, we convert the dictionary into
a JSON string with natural language description. Through
this conversion, we not only preserve the hierarchical struc-
ture of the original data but also make it more compatible
with the model’s input requirements, enhancing its ability
to comprehend instruction attributes. Additionally, this ap-
proach reduces the constraints imposed by structured data oninput formatting, allowing the model to perform reasoning
in a more natural textual environment, thereby improving its
performance in complex tasks.
3.3 Instruction Prompt
In this section, we concatenate the prompt4input with
the training instruction “You are a professional decompila-
tion assistant. Your task is to analyze the input assembly
code and control flow graph (CFG) information, understand
the function’s logic, and derive the corresponding high-level
source code. Please ensure that the output source code is
well-structured, syntactically correct, and accurately reflects
the logic and functionality of the assembly code. ” , which in-
cludes LLM’s roles and task objectives to design an optimized
prompt. The prompt aims to enhance the multidimensional
features of the input information, enabling the large model
to effectively parse the syntax, semantics, and control flow
structure of the assembly code during decompilation, thereby
improving the model’s performance in the source code gen-
eration task.
Finally, the prompt is input into the large model for end-
to-end fine-tuning with source code as the target output.
Through this process, we expect the large model to accurately
recover the corresponding high-level source code while pre-
serving the original assembly code features.
Page 6:
Benchmark Human-eval 64-bit
Model gpt-4o Deepseek Coder LLM4Decompile Deepseek V3 ours(SF)
Metric Re-exe(%) ES(%) Re-exe(%) ES(%) Re-exe(%) ES(%) Re-exe(%) ES(%) Re-exe(%) ES(%)
O0 42.68 20.72 9.15 20.75 47.20 28.21 45.73 41.01 59.76 46.43
O1 14.02 18.54 3.66 14.03 20.61 14.94 16.46 28.33 42.07 38.49
O2 14.63 16.93 6.10 12.01 21.22 7.01 17.68 27.56 33.54 38.47
O3 11.59 14.39 3.05 9.45 20.24 9.47 10.37 25.16 28.66 38.3
A VG 20.73 17.65 5.49 14.06 27.32 14.91 22.56 30.02 41.51 40.92
Table 1: Main comparison between our method with the baselines models at O0, O1, O2, O3 on Human-eval 64-bit
Benchmark Human-eval 32-bit
Model gpt-4o Deepseek Coder LLM4Decompile Deepseek V3 ours(SF)
Metric Re-exe(%) ES(%) Re-exe(%) ES(%) Re-exe(%) ES(%) Re-exe(%) ES(%) Re-exe(%) ES(%)
O0 31.1 35.07 3.66 28.1 0 13.48 39.63 35.99 54.88 45.64
O1 14.63 29.72 2.44 27.42 0 21.33 17.68 30.46 29.88 38.54
O2 14.63 29.13 3.05 25.66 0 19.38 11.59 30.72 32.93 38.74
O3 14.63 27.88 4.27 25.04 0 19.02 13.41 29.63 29.27 37.77
A VG 18.25 30.95 3.36 26.56 0.00 18.80 20.08 31.20 36.74 40.67
Table 2: Main comparison between our method with the baselines models at O0, O1, O2, O3 on Human-eval 32-bit
4 Experiments
4.1 Experimental Setups
Benchmarks. We use HumanEval-Decompile [Tan et al. ,
2024 ]and ExeBench [Armengol-Estap ´eet al. , 2022 ]as our
benchmarks. HumanEval-Decompile includes 164 C func-
tions derived from Python solutions and assertions in Hu-
manEval [Chen et al. , 2021 ], a widely used benchmark for
code generation evaluation. Since HumanEval only supports
compilation for x86-64, we extended it to include 32-bit com-
pilations as well. The original test dataset of ExeBench com-
prises 5,000 C programs extracted from real-world GitHub
repositories. These programs include not only complete C
function definitions but also provide input-output (IO) ex-
amples and external functions/header files, ensuring the exe-
cutability of each function. However, due to unresolved con-
flicts preventing some functions from compiling into 32-bit
executables, we randomly selected 300 compilable functions
as an actual benchmark for our experiments.
Metrics. Following prior research, we adopt the Re-
executability Rate (%) and Edit Similarity (%) as metrics
[Tanet al. , 2024; Armengol-Estap ´eet al. , 2024 ]to provide
a comprehensive evaluation of decompilation methods’ per-
formance.
The Re-executability Rate evaluates whether the decom-
piled code can be executed to produce the same behavior as
the original program. This is achieved by running the decom-
piled C code of a test function alongside its corresponding
assertions to verify the correctness of the decompilation re-
sults.
The Edit Similarity is a key metric for evaluating the read-
ability of decompiled code, and it quantifies the similarity be-
tween the decompiled code and original code. Following the
work of [Armengol-Estap ´eet al. , 2024 ], we use edit distance,a standard metric employed in other neural approaches [Katz
et al. , 2018b; Hosseini and Dolan-Gavitt, 2022 ], to define edit
similarity. The result, 1−edit distance
sequence length, is normalized by
the length of the ground truth sequence, where a higher edit
similarity indicates better readability of the decompiled code.
Baselines. We compare our CFADecLLM with a recent
prominent LLMs in the field of decompilation and some well-
known universal LLMs, specifically including:
• DeepSeek-Coder [Guo et al. , 2024 ], the current state-of-
the-art open-source code LLM, representing the leading
publicly available model tailored for coding tasks.
• LLM4Decompile [Tanet al. , 2024 ], the latest and most
advanced LLM for end-to-end decompilation, built on
DeepSeek-Coder. It supports both direct binary decom-
pilation and the refinement of Ghidra’s outputs, deliv-
ering significant improvements in readability and exe-
cutability.
• GPT-4o [OpenAI et al. , 2024 ], a top-performing
general-purpose LLM, renowned for its outstanding ca-
pabilities in understanding and generating high-level
programming languages.
• DeepSeek-V3 [DeepSeek-AI et al. , 2024 ]is an ad-
vanced open-source language model designed for high-
performance natural language processing tasks, opti-
mized for multilingual understanding and generation. It
emphasizes efficiency and scalability, supporting diverse
applications like code generation, mathematical reason-
ing, and context-aware dialogue.
Implementation. We use the LLM4Decompile-1.3B
model [Tanet al. , 2024 ]as the base. When finetuning the
model, we set the batch-size as 16, learning rate as 2e-5, max
sequence length as 4096. The training process is conducted
Page 7:
Benchmark Exebench 64-bit
Model gpt-4o Deepseek Coder LLM4Decompile Deepseek V3 ours(SF)
Metric Re-exe(%) ES(%) Re-exe(%) ES(%) Re-exe(%) ES(%) Re-exe(%) ES(%) Re-exe(%) ES(%)
O0 1.33 20.49 5.00 22.58 2.67 12.94 3.67 22.74 28.33 64.93
O1 4.67 12.98 9.00 12.32 3.00 11.53 6.67 11.87 8.33 33.14
O2 6.67 13.49 8.00 12.27 2.33 10.75 8.00 11.88 7.33 30.74
O3 5.67 13.03 8.00 12.35 2.67 10.41 6.67 11.50 7.33 30.35
A VG 4.58 14.99 7.50 14.88 2.67 11.41 6.25 14.00 12.83 39.79
Table 3: Main comparison between our method with the baselines models at O0, O1, O2, O3 on Exebench 64-bit
Benchmark Exebench 32-bit
Model gpt-4o Deepseek Coder LLM4Decompile Deepseek V3 ours(SF)
Metric Re-exe(%) ES(%) Re-exe(%) ES(%) Re-exe(%) ES(%) Re-exe(%) ES(%) Re-exe(%) ES(%)
O0 0.67 18.69 3.00 21.00 0.00 18.79 3.67 21.72 30.33 65.32
O1 4.00 13.49 4.33 13.33 0.67 18.76 6.67 12.44 15.67 38.69
O2 4.33 13.91 5.67 13.28 1.33 17.56 8.00 13.28 14.00 37.52
O3 3.67 13.85 5.33 13.14 1.33 18.03 6.67 13.02 13.33 36.52
A VG 4.17 15.74 4.83 15.19 0.83 18.04 6.17 15.12 18.08 44.51
Table 4: Main comparison between our method with the baselines models at O0, O1, O2, O3 on Exebench 32-bit
for one epoch. All experiments are done on NVIDIA H100-
80GB GPU clusters. We adopt greedy search for generating
decompiled high-level source code in model inference.
4.2 Main Results
The main experimental results can be seen at Table 1, Table 2,
Table 3, Table 4. From the experimental results, our method
consistently outperforms other methods (gpt-4o, Deepseek
Coder, LLM4Decompile, Deepseek V3) across both datasets
and bitwidths. Below is a detailed analysis of the results:
(1) Human-eval 64-bit Results Analysis
•Re-exe: Our method outperforms all other methods
in terms of Re-exe on all test sets (O0, O1, O2,
O3), with an average value of 41.51% . In compari-
son, gpt-4o achieves 20.73% , Deepseek Coder 5.49% ,
LLM4Decompile 27.32% , and Deepseek V3 22.56% .
•ES: Our method also achieves a significant ESof
40.92% , surpassing gpt-4o ( 17.65% ) and Deepseek V3
(30.02% ).
Conclusion: On Human-eval 64-bit, our method demon-
strates superior performance in both Re-exe and ES , show-
casing a stronger decompilation capability than existing
methods.
(2) Human-eval 32-bit Results Analysis
•Re-exe: Our method achieves a Re-exe of 36.74% ,
significantly outperforming gpt-4o ( 18.25% ), Deepseek
Coder ( 3.36% ), LLM4Decompile ( 0%), and Deepseek
V3 (20.08% ).
•ES:Our method achieves a remarkable ES of 40.67% ,
which is higher than Deepseek V3 ( 31.20% ) and gpt-4o
(30.95% ).Conclusion: On Human-eval 32-bit, our method leads in
both Re-exe and ES , with a notable improvement in exe-
cutable rate, surpassing Deepseek Coder by over 10 times and
gpt-4o by almost double.
(3) Exebench 64-bit Results Analysis
•Re-exe: Our method has an average Re-exe of 12.83% ,
which is competitive, though slightly lower on O0.
However, on O1, O2, and O3, our method outperforms
others.
•ES:Our method achieves an impressive ES of 39.79% ,
far exceeding Deepseek V3 ( 14.00% ).
Conclusion: On Exebench 64-bit, our method demonstrates
a significant advantage in ES, with competitive Re-exe , espe-
cially excelling in certain sub-tasks.
(4) Exebench 32-bit Results Analysis
•Re-exe: Our method achieves Re-exe of 18.08% on
Exebench 32-bit, outperforming all other methods in-
cluding gpt-4o ( 4.17% ) and Deepseek V3 ( 6.17% ).
•ES: Our method also achieves ES of 44.51% , signifi-
cantly higher than Deepseek V3 ( 15.12% ) and gpt-4o
(15.74% ).
Conclusion: On Exebench 32-bit, our method outperforms
others in both Re-exe and ES , demonstrating its superiority
in smaller-width decompilation tasks.
Overall Experimental Conclusion
• Our method outperforms all other methods in terms of
Re-exe , proving its reliability in generating executable
decompiled code.
Page 8:
1int archHcubDomTerm (const ArchHcub *
const archptr, ArchHcubDom *
const domnptr, const ArchDomNum
domnnum)
2{
3 if(domnnum < (1 << archptr->
dimnnbr))
4 {
5 domnptr->dimncur = 0;
6 domnptr->bitsset = domnnum;
7 return (0);
8 }
9 return (1);
10}
(a) Source code1int archHcubDomTerm (const ArchHcub *
archptr, ArchHcubDom *domnptr, int
termnum)
2{
3 if(termnum >= (1 << archptr->
dimnnbr))
4 {
5 return (1);
6 }
7
8 domnptr->bitsset = 0;
9 domnptr->dimncur = termnum;
10 return (0);
11}
(b) CFADecLLM1int archHcubDomTerm (const ArchHcub *
arch, ArchHcubDom *term, int
nbVars)
2{
3 if(nbVars < (1 << arch->dimnnbr))
4 {
5 term->bitsset = 0;
6 term->dimncur = nbVars;
7
8 return 1;
9 }
10
11 return 0;
12}
(c) LLM4Decompile
1int archHcubDomTerm (int *param1, int
param2, int *param3)
2{
3 int cl = *param1;
4 int eax = 1;
5 int r8d = 1;
6
7 eax <<= cl;
8 if(param2 <= eax)
9 {
10 param3[1] = 0;
11 }
12 *param3 = param2;
13 return r8d;
14}
(d) gpt-4o1int archHcubDomTerm (int *rdi, int *rsi
,int edx)
2{
3 int ecx = *rdi;
4 int eax = 1;
5 int r8d = 1;
6 eax = eax << ecx;
7 if(eax <= edx)
8 {
9 *(rsi + 1) = 0;
10 r8d = 0;
11 }
12 *rsi = edx;
13 return r8d;
14}
(e) Deepseek-v31int archHcubDomTerm (int *rdi, int *rsi
,int edx)
2{
3 int ecx = *rdi;
4 int eax = 1;
5 int r8d = 1;
6 eax = eax << ecx;
7 if(eax <= edx)
8 {
9 *(rsi + 1) = 0;
10 r8d = 0;
11 }
12 *rsi = edx;
13 return r8d;
14}
(f) Deepseek Coder
Figure 4: Case study on Correct Control Flow. Presented are the source code (a) of the case sample alongside the decompilations of
CFADecLLM (b), LLM4Decompile (c), GPT-4o (d), DeepSeek-v3 (e), and DeepSeek-Coder (f).
• Our method also leads in ES, which indicates that the
generated code is closer to the ground truth both in struc-
ture and semantics.
• On both Human-eval (64-bit & 32-bit) datasets, our
method shows significant improvements, outperforming
existing methods.
• On Exebench (64-bit & 32-bit), our method still holds a
clear advantage, especially in ES.
• The consistent performance of our model across two
different bitwidths (32-bit and 64-bit) demonstrates its
strong decompilation capability in cross-bitwidth sce-
narios and further extended cross-architecture. This
highlights the robustness of our approach in handling di-
verse and complex decompilation tasks.
Final Conclusion: Our model demonstrates overall supe-
rior performance in decompilation tasks, excelling in both ex-
ecutable rate and code similarity, outperforming current state-
of-the-art methods.
4.3 Case Study
In this section, we present two cases that highlight the advan-
tages of CFADecLLM in terms of decompilation correctness
and readability compared to four other baseline models.
(1) Correctness of control flow
Figure 4 presents a case study based on a test sample from
ExeBench, where only CFADecLLM successfully recoversthe correct control flow during decompilation. The figure
compares the decompilations of CFADecLLM and four base-
line models for the ’-O2’ compiled 64-bit binary of the sam-
ple’s source code. In the LLM4Decompile decompilation,
thereturn values (Fig.4(c), Lines 8 and 11) in the two
branches of the ifcondition (Fig.4(c), Line 3) are inverse
compared to the source code’s control flow (Fig.4(a), Lines
3, 7 and 11). This same issue is observed in the GPT-4o,
DeepSeek-v3, and DeepSeek-Coder decompilations. This in-
dicates that the baseline models cannot correctly infer branch
transitions from the assembly code directly. Additionally, in
the LLM4Decompile decompilation, the two assignment op-
erations to the second input parameter (Fig.4(c), Lines 5 and
6) are incorrectly executed before return 1 (Fig.4(c), Line
8), whereas they should occur before return 0 (Fig.4(a),
Lines 5, 6, and 7) in the source code, forming a basic block in
the control flow. Similarily, one of the assignment operations
is misplaced before both return s in the GPT-4o (Fig.4(d),
Lines 12 and 13), DeepSeek-v3 (Fig.4(e), Lines 12 and 13),
and DeepSeek-Coder (Fig.4(f), Lines 12 and 13) decompi-
lations. This issue suggests that the baseline models fail to
correctly form basic blocks within the control flow from the
assembly code directly. In contrast, while the ifcondition
in CFADecLLM’s decompilation (Fig.4(b), Line 3) is the in-
verse of the source code’s, the overall control flow is correctly
reconstructed. This correctness benefits from CFADecLLM’s
ability to learn and utilize control flow graph (CFG) knowl-
edge. Moreover, the inversion of the ifcondition results in
more compact and structured true branch blocks enclosed in
Page 9:
curly braces { }, improving the readability of the decompi-
lation.
(2) Readability of decompilation
Figure 5 presents a case study based on a test sample
from ExeBench, demonstrating how CFADecLLM enhances
the readability of its decompilation——surpassing even the
source code. The figure compares the decompilations of
CFADecLLM and four baseline models for the ’-O1’ com-
piled 64-bit binary of the sample’s source code. While
maintaining correctness, CFADecLLM effectively eliminates
unnecessary if-else nesting by avoiding explicit else
branches, which are present in both the source code (Fig.5(a),
Lines 3 and 8) and the LLM4Decompile decompilation
(Fig.5(c), Line 8). This reduction in nested conditionals sim-
plifies the code structure and improves readability. Addi-
tionally, CFADecLLM generates more concise code, avoid-
inggoto statements, unlike DeepSeek-v3 (Fig.5(e), Lines
13 and 20) and DeepSeek-Coder (Fig.5(f), Lines 13 and 20).
The CFADecLLM decompilation consists of only 10 lines of
valid code, whereas DeepSeek-v3 and DeepSeek-Coder pro-
duce significantly longer code with 32 and 31 lines, respec-
tively. Furthermore, compared to the source code and the
four baseline decompilations, CFADecLLM adheres to the
”Early Return” principle in modern C coding style (Fig.5(b),
Lines 6 and 10), reducing complexity and enhancing read-
ability. The model also generates meaningful variable names
and avoids unnecessary procedural variables, aligning more
closely with human-written code. In contrast, DeepSeek-v3
and DeepSeek-Coder (Fig.5(e) and Fig.5(f)) produce decom-
pilations that resemble assembly code, using register names
as variable names and explicitly rendering each register oper-
ation, making them far less comprehensible to humans. Over-
all, CFADecLLM produces more streamlined and readable
decompilation than the source code and the four baselines.
This improvement stems from its ability to leverage control
flow graph (CFG) knowledge effectively.
5 Conclusion
In this paper, we propose CFADecLLM , a novel control
flow-augmented decompilation framework based on large
language models. Unlike existing LLM-based decompila-
tion approaches, our method explicitly integrates control flow
graph (CFG) information into the decompilation process, en-
abling a more comprehensive understanding of program exe-
cution logic and improving the accuracy of high-level code
reconstruction. To achieve this, we construct a structured
dataset that combines assembly code and control flow infor-
mation, allowing for more effective learning and generaliza-
tion. Furthermore, we enhance decompilation efficiency by
designing natural language prompts that dynamically guide
the model during both training and inference.
Extensive experiments on two publicly available bench-
marks, HumanEval and ExeBench , across multiple
bitwidths (32-bit and 64-bit), demonstrate the superior per-
formance of our approach. Our model significantly outper-
forms existing state-of-the-art methods in terms of both re-
executability ( Re-exe ) and semantic similarity ( ES), high-
lighting its effectiveness in real-world decompilation tasks.The results further validate that incorporating global con-
trol flow information enhances decompilation quality, partic-
ularly in scenarios involving complex control structures.
Additionally, our findings underscore the potential of large
language models in decompilation tasks and highlight the im-
portance of cross-architecture adaptability. Unlike prior ap-
proaches primarily designed for x86, our idea may generalize
well across different instruction set architectures through dif-
ferent bitwidths, paving the way for more versatile and robust
decompilation solutions.
In future work, we aim to explore the integration of addi-
tional program analysis techniques, such as data dependency
analysis and symbolic execution, to further refine the decom-
pilation process. Moreover, we plan to investigate methods
for improving model interpretability, ensuring that decom-
piled results not only achieve high execution accuracy but
also maintain human readability and logical consistency.
Overall, our work advances the state of binary decom-
pilation by bridging the gap between control flow analysis
and large language models, setting a new benchmark for
future research in the field.
References
[Armengol-Estap ´eet al. , 2022 ]Jordi Armengol-Estap ´e,
Jackson Woodruff, and et al. Exebench: an ml-scale
dataset of executable c functions. In Proceedings of the
6th ACM SIGPLAN International Symposium on Machine
Programming , MAPS 2022, page 50–59, New York, NY ,
USA, 2022. Association for Computing Machinery.
[Armengol-Estap ´eet al. , 2024 ]Jordi Armengol-Estap ´e,
Jackson Woodruff, and et al. Slade: A portable small
language model decompiler for optimized assembly.
InProceedings of the 2024 IEEE/ACM International
Symposium on Code Generation and Optimization , CGO
’24, page 67–80. IEEE Press, 2024.
[Basque et al. , 2024 ]Zion Leonahenahe Basque, Ati Priya
Bajaj, and et al. Ahoy SAILR! there is no need to DREAM
of c: A Compiler-Aware structuring algorithm for bi-
nary decompilation. In 33rd USENIX Security Symposium
(USENIX Security 24) , pages 361–378, Philadelphia, PA,
August 2024. USENIX Association.
[Binutils, 2025 ]Binutils. objdump, 2025. [Online].
[Bossert et al. , 2014 ]Georges Bossert, Fr ´ed´eric Guih ´ery,
and et al. Towards automated protocol reverse engineer-
ing using semantic information. In Proceedings of the
9th ACM Symposium on Information, Computer and Com-
munications Security , ASIA CCS ’14, page 51–62, New
York, NY , USA, 2014.
[Brumley et al. , 2013 ]David Brumley, JongHyup Lee, and
et al. Native x86 decompilation using Semantics-
Preserving structural analysis and iterative Control-Flow
structuring. In 22nd USENIX Security Symposium
(USENIX Security 13) , pages 353–368, Washington, D.C.,
August 2013. USENIX Association.
[Caoet al. , 2022 ]Ying Cao, Ruigang Liang, and et al.
Boosting neural networks to decompile optimized bina-
Page 10:
ries. In Proceedings of the 38th Annual Computer Secu-
rity Applications Conference , ACSAC ’22, page 508–518,
New York, NY , USA, 2022.
[Chen et al. , 2021 ]Mark Chen, Jerry Tworek, and et al.
Evaluating large language models trained on code, 2021.
[Cifuentes, 1994 ]Cristina Garcia Cifuentes. Reverse compi-
lation techniques. 1994.
[David et al. , 2020 ]Yaniv David, Uri Alon, and et al. Neural
reverse engineering of stripped binaries using augmented
control flow graphs. Proc. ACM Program. Lang. , 4(OOP-
SLA), November 2020.
[DeepSeek-AI et al. , 2024 ]DeepSeek-AI, Aixin Liu, Bei
Feng, and et al. Deepseek-v3 technical report. ArXiv ,
abs/2412.19437, 2024.
[Egele et al. , 2008 ]Manuel Egele, Theodoor Scholte, and
et al. A survey on automated dynamic malware-analysis
techniques and tools. ACM Comput. Surv. , 44(2), March
2008.
[Federico et al. , 2018 ]Alessandro Di Federico, Pietro Fez-
zardi, and et al. rev.ng: A multi-architecture framework
for reverse engineering and vulnerability discovery. 2018
International Carnahan Conference on Security Technol-
ogy (ICCST) , pages 1–5, 2018.
[Feng et al. , 2024 ]Yunlong Feng, Dechuan Teng, and et al.
Self-constructed context decompilation with fined-grained
alignment enhancement, 2024.
[Fuet al. , 2019 ]Cheng Fu, Huili Chen, and et al. Coda: an
end-to-end neural program decompiler. In Proceedings of
the 33rd International Conference on Neural Information
Processing Systems , Red Hook, NY , USA, 2019. Curran
Associates Inc.
[Gao et al. , 2021 ]Han Gao, Shaoyin Cheng, and et al. A
lightweight framework for function name reassignment
based on large-scale stripped binaries. In Proceedings of
the 30th ACM SIGSOFT International Symposium on Soft-
ware Testing and Analysis , ISSTA 2021, page 607–619,
New York, NY , USA, 2021.
[Ghidra, 2025 ]Ghidra. Ghidra software reverse engineering
framework, 2025. Accessed: 2025-01-14.
[Guo et al. , 2024 ]Daya Guo, Qihao Zhu, and et al.
Deepseek-coder: When the large language model meets
programming – the rise of code intelligence, 2024.
[Gussoni et al. , 2020 ]Andrea Gussoni, Alessandro Di Fed-
erico, and et al. A comb for decompiled c code. In
Proceedings of the 15th ACM Asia Conference on Com-
puter and Communications Security , ASIA CCS ’20, page
637–651, New York, NY , USA, 2020.
[Heet al. , 2018 ]Jingxuan He, Pesho Ivanov, and et al. De-
bin: Predicting debug information in stripped binaries.
InProceedings of the 2018 ACM SIGSAC Conference on
Computer and Communications Security , CCS ’18, page
1667–1680, New York, NY , USA, 2018.
[Hex-Rays, 2025 ]Hex-Rays. Hex-rays decompiler, 2025.
Accessed: 2025-01-14.[Hosseini and Dolan-Gavitt, 2022 ]Iman Hosseini and Bren-
dan Dolan-Gavitt. Beyond the c: Retargetable decompi-
lation using neural machine translation. In Proceedings
2022 Workshop on Binary Analysis Research , BAR 2022.
Internet Society, 2022.
[Huet al. , 2024 ]Peiwei Hu, Ruigang Liang, and et al.
Degpt: Optimizing decompiler output with llm. In Pro-
ceedings 2024 Network and Distributed System Security
Symposium (2024). https://api. semanticscholar. org/Cor-
pusID , volume 267622140, 2024.
[Jiang et al. , 2024 ]Nan Jiang, Chengxiao Wang, and et al.
Nova: Generative language models for assembly code with
hierarchical attention and contrastive learning. In Submit-
ted to The Thirteenth International Conference on Learn-
ing Representations , 2024. under review.
[Jinet al. , 2022 ]Xin Jin, Kexin Pei, and et al. Symlm: Pre-
dicting function names in stripped binaries via context-
sensitive execution-aware code embeddings. Proceedings
of the 2022 ACM SIGSAC Conference on Computer and
Communications Security , 2022.
[Katz et al. , 2018a ]Deborah S. Katz, Jason Ruchti, and et al.
Using recurrent neural networks for decompilation. In
2018 IEEE 25th International Conference on Software
Analysis, Evolution and Reengineering (SANER) , pages
346–356, 2018.
[Katz et al. , 2018b ]Deborah S. Katz, Jason Ruchti, and et al.
Using recurrent neural networks for decompilation. In
2018 IEEE 25th International Conference on Software
Analysis, Evolution and Reengineering (SANER) , pages
346–356, 2018.
[Katz et al. , 2019 ]Omer Katz, Yuval Olshaker, Yoav Gold-
berg, and Eran Yahav. Towards neural decompilation,
2019.
[Kruegel et al. , 2004 ]Christopher Kruegel, William Robert-
son, and et al. Detecting kernel-level rootkits through bi-
nary analysis. In Proceedings of the 20th Annual Com-
puter Security Applications Conference , ACSAC ’04, page
91–100, USA, 2004. IEEE Computer Society.
[Liang et al. , 2021 ]Ruigang Liang, Ying Cao, and et al.
Neutron: an attention-based neural decompiler. Cyberse-
curity , 4:5, 03 2021.
[OpenAI et al. , 2024 ]OpenAI, Josh Achiam, Steven Adler,
and et al. Gpt-4 technical report, 2024.
[Szekeres et al. , 2013 ]L´aszl´o Szekeres, Mathias Payer, and
et al. Sok: Eternal war in memory. In 2013 IEEE Sympo-
sium on Security and Privacy , pages 48–62, 2013.
[Tanet al. , 2024 ]Hanzhuo Tan, Qi Luo, and et al.
Llm4decompile: Decompiling binary code with large lan-
guage models, 2024.
[Wong et al. , 2023 ]Wai Kin Wong, Huaijin Wang, and et al.
Refining decompiled c code with large language models,
2023.
[Yakdan et al. , 2015 ]Khaled Yakdan, Sebastian Eschweiler,
and et al. No more gotos: Decompilation us-
ing pattern-independent control-flow structuring and
Page 11:
semantic-preserving transformations. In Network and Dis-
tributed System Security Symposium , 2015.
[Yakdan et al. , 2016 ]Khaled Yakdan, Sergej Dechand, and
et al. Helping johnny to analyze malware: A usability-
optimized decompiler and malware analysis user study.
2016 IEEE Symposium on Security and Privacy (SP) ,
pages 158–177, 2016.
[Yang et al. , 2022 ]Jia Yang, Cai Fu, and et al. Codee: A
tensor embedding scheme for binary code search. IEEE
Transactions on Software Engineering , 48(7):2224–2244,
2022.
[Zhang et al. , 2021 ]Zhuo Zhang, Yapeng Ye, and et al. Os-
prey: Recovery of variable and data structure via proba-
bilistic analysis for stripped binary. In 2021 IEEE Sympo-
sium on Security and Privacy (SP) , pages 813–832, 2021.
[Zhang et al. , 2024a ]Runze Zhang, Ying Cao, and et al. Op-
timizing Decompiler Output by Eliminating Redundant
Data Flow in Self-Recursive Inlining . In 2024 IEEE Inter-
national Conference on Software Maintenance and Evolu-
tion (ICSME) , pages 38–49, Los Alamitos, CA, USA, Oc-
tober 2024. IEEE Computer Society.
[Zhang et al. , 2024b ]Xiaoling Zhang, Zhengzi Xu, and et al.
Enhancing function name prediction using votes-based
name tokenization and multi-task learning. Proc. ACM
Softw. Eng. , 1(FSE), July 2024.
[Zhuet al. , 2024 ]Chang Zhu, Ziyang Li, and et al. Tygr:
Type inference on stripped binaries using graph neural net-
works. In USENIX Security Symposium , 2024.
Page 12:
1int irlink_init (void ){
2 hw.fd = irlink_open (hw.device);
3 if(hw.fd < 0){
4 logprintf (LOG_ERR, "Could not
open the ’%s’ device", hw
.device);
5 }else {
6 if(irlink_detect (hw.fd) == 0)
{
7 return 1;
8 }else {
9 logprintf (LOG_ERR, "Failed
to detect IRLink on
’%s’ device", hw.
device);
10 irlink_deinit ();
11 }
12 }
13 return 0;
14}
(a) Source code1int irlink_init (void ){
2 hw.fd = irlink_open (hw.device);
3 if(hw.fd < 0){
4 logprintf (LOG_ERR, "Could not
open the ’%s’ device", hw
.device);
5
6 return 0;
7 }
8
9 if(irlink_detect (hw.fd) == 0)
10 return 1;
11
12 logprintf (LOG_ERR, "Failed to
detect IRLink on ’%s’ device"
, hw.device);
13 irlink_deinit ();
14
15 return 0;
16}
(b) CFADecLLM1int irlink_init (void ){
2 hw.fd = irlink_open (hw.device);
3
4 if(hw.fd < 0) {
5
6 logprintf (LOG_ERR, "
irlink_open", errno);
7
8 }else {
9
10 if(irlink_detect (hw.fd) == 0)
11 return 1;
12
13 logprintf (LOG_ERR, "
irlink_detect", errno);
14 irlink_deinit ();
15 }
16
17 return 0;
18}
(c) LLM4Decompile
1int irlink_init (){
2 int result;
3 int status = irlink_open (hw.fd);
4 hw.fd = status;
5 if(hw.fd >= 0){
6 status = irlink_detect (hw.fd);
7 if(status != 0){
8 return 1;
9 }
10 }else {
11
12 // Some initialization logic
in case of negative
status
13 irlink_deinit ();
14 }
15
16 // Return 0 if everything is
initialized correctly
17 return 0;
18}
(d) gpt-4o1int irlink_init (){
2 int eax = *(int*)0x0;
3 int edi = eax;
4 void *rax = ( void *)96;
5 *(void **)0x0 = rax;
6 rax = *(void **)0x0;
7 if((long )rax < 0){
8 int edx = *(int*)0x0;
9 eax = *(int*)0x0;
10 void *rsi = ( void *)0x0;
11 edi = eax;
12 rax = ( void *)c3;
13 goto label104;
14 }else {
15 rax = *(void **)0x0;
16 void *rdi = rax;
17 rax = ( void *)d4;
18 if(rax != 0){
19 eax = 1;
20 goto label109;
21 }else {
22 edx = *(int*)0x0;
23 eax = *(int*)0x0;
24 rsi = ( void *)0x0;
25 edi = eax;
26 rax = ( void *)fa;
27 eax = 0;
28 }
29 }
30label104:
31 eax = 0;
32label109:
33 return eax;
34}
(e) Deepseek-v31int irlink_init (){
2 int eax = *(int*)0x0;
3 int edi = eax;
4 void *rax = ( void *)96;
5 *(void **)0x0 = rax;
6 rax = *(void **)0x0;
7 if((long )rax < 0){
8 int edx = *(int*)0x0;
9 eax = *(int*)0x0;
10 void *rsi = ( void *)0x0;
11 edi = eax;
12 rax = ( void *)c3;
13 goto label_104;
14 }
15 rax = *(void **)0x0;
16 void *rdi = rax;
17 rax = ( void *)d4;
18 if(rax != 0){
19 eax = 1;
20 goto label_109;
21 }
22 int edx = *(int*)0x0;
23 eax = *(int*)0x0;
24 void *rsi = ( void *)0x0;
25 edi = eax;
26 rax = ( void *)fa;
27label_104:
28 eax = 0;
29 rax = ( void *)104;
30label_109:
31 eax = 0;
32 return eax;
33}
(f) Deepseek Coder
Figure 5: Case Study on Readable Decompilation. Presented are the source code (a) of the case sample alongside the decompilations of
CFADecLLM (b), LLM4Decompile (c), GPT-4o (d), DeepSeek-v3 (e), and DeepSeek-Coder (f).