CodeCIW / CIW_design_process.md
CodeInverter's picture
Upload 2 files
88c941d verified
We first construct a validation set containing 200 functions and generate 1,600 samples based on two different bit widths and four compiler optimization levels, which are used for testing and refining prompts. Across three general-purpose LLMs, we refine the prompts based on the average re-execution performance on these samples. The specific design process and findings are as follows:
**Initial Prompt Design:** The prompt includes assembly code, control flow graph (CFG) information, explanations of CFG components (e.g., “edges between two connected cfg_blocks”), Data Mapping information and its component explanations (e.g., “stack variables (size and relative offset)”), with the corresponding source code provided as labels.
**Content Simplification:** We find that providing CFG and Data Mapping information alone sufficiently supports decompilation, and the initial prompt contains redundant information.
**Role Definition of Assembly Code:** We attempt to explicitly define the properties of assembly code in the prompt (e.g., “instruction set architecture,” “compiler optimization level”), but experiments show that LLMs can leverage prior knowledge to infer these properties without explicit specification.
**Model Role Setting:** Defining the model’s role remains important, but it should be concise (e.g., “You are a professional decompilation assistant.”).
**Explanation of Information Function:** Clearly indicating the role of CFG and Data Mapping in the prompt can improve generation quality (e.g., “defines the correspondence between data labels and their actual values”).
**Output Constraints:** The prompt specifies that the generated code should satisfy conditions of structural completeness, syntactic correctness, and consistency of logic and data with the assembly code.
**Special Case Handling:** When the number of CFG nodes is one, the input is effectively a sequence of assembly instructions rather than a graph, and the prompt needs to account for this distinction.
Through this series of optimizations, we obtain a concise, efficient prompt design that generalizes well across diverse LLMs.