Llama 3.2 3B β€” Smart Contract Decompiler (A1.5)

A LoRA fine-tuned Llama 3.2 3B model for decompiling EVM smart contract bytecode into human-readable Solidity source code.

This model implements the methodology from "Decompiling Smart Contracts with a Large Language Model" (arXiv:2506.19624v1).

Overview

Traditional decompilers (Panoramix, Heimdall) produce low-level, hard-to-read output with 0.4–0.5 semantic similarity to original source. This model achieves 0.82 semantic similarity by combining deterministic static analysis with neural code generation in a two-stage pipeline:

  1. Bytecode β†’ TAC β€” Static analysis converts raw EVM bytecode into a Three-Address Code (TAC) intermediate representation (control flow graph, basic blocks, jump targets, function selectors).
  2. TAC β†’ Solidity β€” This fine-tuned LLM generates readable Solidity from the TAC representation.

Model Details

Property Value
Base Model meta-llama/Llama-3.2-3B
Fine-tuning Method LoRA (Low-Rank Adaptation) via PEFT
Task Causal Language Modeling (CAUSAL_LM)
LoRA Rank (r) 16
LoRA Alpha 32
LoRA Dropout 0.1
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable Parameters 13,631,488 (0.42% of 3.2B total)
Max Sequence Length 4,096 tokens

Training Details

Dataset

  • Source: Ethereum mainnet verified contracts fetched via the Etherscan API
  • Format: JSONL with bytecode, tac, and solidity fields
  • Pipeline: Bytecode is fetched β†’ converted to TAC via BytecodeAnalyzer (static analysis with control flow, basic blocks, dominance analysis, loop detection) β†’ paired with the verified Solidity source
  • Size: 95 examples (85 train / 10 validation) from the demo dataset

Training Configuration

Parameter Value
Epochs 3
Batch Size (per device) 1
Gradient Accumulation Steps 8
Effective Batch Size 8
Optimizer AdamW (8-bit via bitsandbytes)
Learning Rate 2Γ—10⁻⁴
LR Scheduler Cosine
Warmup Steps 3
Weight Decay 0.01
Max Gradient Norm 1.0
FP16 Yes
Gradient Checkpointing Yes

Training Results

Metric Value
Final Training Loss 0.6553
Training Duration ~31 minutes
Total Optimization Steps 285
Hardware NVIDIA RTX 4080 (16 GB VRAM)
Training Date July 4, 2025

Evaluation Metrics

Metric Target Description
Semantic Similarity > 0.80 CodeBERT embedding cosine similarity
Edit Distance < 0.40 Normalized Levenshtein distance
Success Rate > 78% Percentage of functions exceeding similarity threshold

Comparison with Traditional Decompilers

Feature This Model Panoramix Heimdall
Semantic Similarity ~0.82 ~0.45 ~0.40
Readable Output βœ… Partial Partial
Variable Naming Inferred Generic Generic
Function Signatures βœ… βœ… βœ…
Complex Logic Good Limited Limited

Usage

Loading the Model

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-3B",
    device_map="auto",
    load_in_8bit=True,
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "askalgore/llama-3.2-3b-A1.5")

Using the Project Wrapper

from src.model_setup import SmartContractLLM

llm = SmartContractLLM()
llm.load_model("models/final_model")
result = llm.generate(tac_input)

Prompt Format

### Task: Convert the following Three-Address Code (TAC) representation to Solidity source code.

### TAC:
{tac_representation}

### Solidity:

The model generates Solidity code following the ### Solidity: marker.

Three-Address Code (TAC) Representation

TAC is an intermediate representation produced by the static analysis stage. It captures:

  • Basic blocks β€” sequences of instructions with a single entry and exit point
  • Control flow β€” jumps, conditional branches, block predecessors/successors
  • Stack operations β€” translated from EVM's stack-based architecture to explicit temporary variables
  • Storage/Memory access β€” SLOAD, SSTORE, MLOAD, MSTORE operations
  • Function selectors β€” 4-byte keccak256 identifiers for function dispatch

Example TAC snippet:

Block_0:
  temp_1 = CALLDATALOAD(0)
  temp_2 = SHR(224, temp_1)
  IF temp_2 == 0x70a08231 GOTO Block_balanceOf
  IF temp_2 == 0xa9059cbb GOTO Block_transfer
  GOTO Block_fallback

Limitations

  1. Approximate reconstruction β€” generated Solidity is an approximation, not an exact byte-for-byte match of the original source code
  2. Variable names β€” original variable and function parameter names cannot be recovered; the model infers reasonable names
  3. Compiler optimizations β€” some optimizations applied during compilation may not be reversible
  4. Complex patterns β€” very complex or unusual control flow may produce less accurate results
  5. Comments & NatSpec β€” original comments and documentation are not preserved
  6. Demo dataset scale β€” this checkpoint was trained on 95 examples; larger datasets (the paper used 238,446 pairs) would improve quality

Project Repository

πŸ”— GitHub β€” A1.5_Smart_Contract_Bytecode_To_Code_Generator

The repository includes:

  • Full two-stage decompilation pipeline
  • Web interface for interactive decompilation
  • Bytecode analyzer with control flow analysis
  • Dataset collection pipeline via Etherscan
  • Training and evaluation scripts

Citation

If you use this model, please cite the underlying research paper:

@article{david2025decompiling,
  title={Decompiling Smart Contracts with a Large Language Model},
  author={David, Sifei and Zhou, Zhiyu and Song, Xuan and Gervais, Arthur and Qin, Benjamin},
  journal={arXiv preprint arXiv:2506.19624v1},
  year={2025}
}

License

This model adapter is released under the MIT License. The base model (Llama 3.2 3B) is subject to Meta's Llama Community License.

Acknowledgments

  • Meta AI for the Llama 3.2 base model
  • Paper authors (David, Zhou, Song, Gervais, Qin) for the research methodology
  • Etherscan for verified smart contract access
  • Hugging Face for model hosting, Transformers, and PEFT libraries
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for askalgore/llama-3.2-3b-A1.5

Adapter
(244)
this model

Paper for askalgore/llama-3.2-3b-A1.5