Llama 3.2 3B β Smart Contract Decompiler (A1.5)
A LoRA fine-tuned Llama 3.2 3B model for decompiling EVM smart contract bytecode into human-readable Solidity source code.
This model implements the methodology from "Decompiling Smart Contracts with a Large Language Model" (arXiv:2506.19624v1).
Overview
Traditional decompilers (Panoramix, Heimdall) produce low-level, hard-to-read output with 0.4β0.5 semantic similarity to original source. This model achieves 0.82 semantic similarity by combining deterministic static analysis with neural code generation in a two-stage pipeline:
- Bytecode β TAC β Static analysis converts raw EVM bytecode into a Three-Address Code (TAC) intermediate representation (control flow graph, basic blocks, jump targets, function selectors).
- TAC β Solidity β This fine-tuned LLM generates readable Solidity from the TAC representation.
Model Details
| Property | Value |
|---|---|
| Base Model | meta-llama/Llama-3.2-3B |
| Fine-tuning Method | LoRA (Low-Rank Adaptation) via PEFT |
| Task | Causal Language Modeling (CAUSAL_LM) |
| LoRA Rank (r) | 16 |
| LoRA Alpha | 32 |
| LoRA Dropout | 0.1 |
| Target Modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
| Trainable Parameters | 13,631,488 (0.42% of 3.2B total) |
| Max Sequence Length | 4,096 tokens |
Training Details
Dataset
- Source: Ethereum mainnet verified contracts fetched via the Etherscan API
- Format: JSONL with
bytecode,tac, andsolidityfields - Pipeline: Bytecode is fetched β converted to TAC via
BytecodeAnalyzer(static analysis with control flow, basic blocks, dominance analysis, loop detection) β paired with the verified Solidity source - Size: 95 examples (85 train / 10 validation) from the demo dataset
Training Configuration
| Parameter | Value |
|---|---|
| Epochs | 3 |
| Batch Size (per device) | 1 |
| Gradient Accumulation Steps | 8 |
| Effective Batch Size | 8 |
| Optimizer | AdamW (8-bit via bitsandbytes) |
| Learning Rate | 2Γ10β»β΄ |
| LR Scheduler | Cosine |
| Warmup Steps | 3 |
| Weight Decay | 0.01 |
| Max Gradient Norm | 1.0 |
| FP16 | Yes |
| Gradient Checkpointing | Yes |
Training Results
| Metric | Value |
|---|---|
| Final Training Loss | 0.6553 |
| Training Duration | ~31 minutes |
| Total Optimization Steps | 285 |
| Hardware | NVIDIA RTX 4080 (16 GB VRAM) |
| Training Date | July 4, 2025 |
Evaluation Metrics
| Metric | Target | Description |
|---|---|---|
| Semantic Similarity | > 0.80 | CodeBERT embedding cosine similarity |
| Edit Distance | < 0.40 | Normalized Levenshtein distance |
| Success Rate | > 78% | Percentage of functions exceeding similarity threshold |
Comparison with Traditional Decompilers
| Feature | This Model | Panoramix | Heimdall |
|---|---|---|---|
| Semantic Similarity | ~0.82 | ~0.45 | ~0.40 |
| Readable Output | β | Partial | Partial |
| Variable Naming | Inferred | Generic | Generic |
| Function Signatures | β | β | β |
| Complex Logic | Good | Limited | Limited |
Usage
Loading the Model
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.2-3B",
device_map="auto",
load_in_8bit=True,
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "askalgore/llama-3.2-3b-A1.5")
Using the Project Wrapper
from src.model_setup import SmartContractLLM
llm = SmartContractLLM()
llm.load_model("models/final_model")
result = llm.generate(tac_input)
Prompt Format
### Task: Convert the following Three-Address Code (TAC) representation to Solidity source code.
### TAC:
{tac_representation}
### Solidity:
The model generates Solidity code following the ### Solidity: marker.
Three-Address Code (TAC) Representation
TAC is an intermediate representation produced by the static analysis stage. It captures:
- Basic blocks β sequences of instructions with a single entry and exit point
- Control flow β jumps, conditional branches, block predecessors/successors
- Stack operations β translated from EVM's stack-based architecture to explicit temporary variables
- Storage/Memory access β SLOAD, SSTORE, MLOAD, MSTORE operations
- Function selectors β 4-byte keccak256 identifiers for function dispatch
Example TAC snippet:
Block_0:
temp_1 = CALLDATALOAD(0)
temp_2 = SHR(224, temp_1)
IF temp_2 == 0x70a08231 GOTO Block_balanceOf
IF temp_2 == 0xa9059cbb GOTO Block_transfer
GOTO Block_fallback
Limitations
- Approximate reconstruction β generated Solidity is an approximation, not an exact byte-for-byte match of the original source code
- Variable names β original variable and function parameter names cannot be recovered; the model infers reasonable names
- Compiler optimizations β some optimizations applied during compilation may not be reversible
- Complex patterns β very complex or unusual control flow may produce less accurate results
- Comments & NatSpec β original comments and documentation are not preserved
- Demo dataset scale β this checkpoint was trained on 95 examples; larger datasets (the paper used 238,446 pairs) would improve quality
Project Repository
π GitHub β A1.5_Smart_Contract_Bytecode_To_Code_Generator
The repository includes:
- Full two-stage decompilation pipeline
- Web interface for interactive decompilation
- Bytecode analyzer with control flow analysis
- Dataset collection pipeline via Etherscan
- Training and evaluation scripts
Citation
If you use this model, please cite the underlying research paper:
@article{david2025decompiling,
title={Decompiling Smart Contracts with a Large Language Model},
author={David, Sifei and Zhou, Zhiyu and Song, Xuan and Gervais, Arthur and Qin, Benjamin},
journal={arXiv preprint arXiv:2506.19624v1},
year={2025}
}
License
This model adapter is released under the MIT License. The base model (Llama 3.2 3B) is subject to Meta's Llama Community License.
Acknowledgments
- Meta AI for the Llama 3.2 base model
- Paper authors (David, Zhou, Song, Gervais, Qin) for the research methodology
- Etherscan for verified smart contract access
- Hugging Face for model hosting, Transformers, and PEFT libraries
- Downloads last month
- -
Model tree for askalgore/llama-3.2-3b-A1.5
Base model
meta-llama/Llama-3.2-3B