mwritescode/slither-audited-smart-contracts
Viewer β’ Updated β’ 467k β’ 568 β’ 59
Fine-tuning AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored to translate EVM Three-Address Code (TAC) into Huff assembly and Solidity source code.
Trained model β demeleww/evm-tac-decompiler (private)
Pre-built dataset β demeleww/evm-decompiler-dataset (public)
The model is trained to never guess code. Every training example is verified:
keccak256(canonical_sig)[:4]) appears in the bytecode dispatch table (PUSH4 <sel> EQ PUSH2 <dest> JUMPI)| Detail | Value |
|---|---|
| Base model | AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored |
| Architecture | qwen3_5 β Hybrid: 48 GatedDeltaNet + 16 full attention layers (64 total) |
| Method | QLoRA (8-bit LLM.int8 or 4-bit NF4) |
| LoRA rank | r=32, Ξ±=64 |
| LoRA targets | q_proj, k_proj, v_proj, o_proj + in_proj_qkv, in_proj_z, out_proj + gate_proj, up_proj, down_proj |
| Dataset | demeleww/evm-decompiler-dataset β selector-aligned, 4 task types |
| Sequence length | 4096 tokens |
| File | Description |
|---|---|
train.py |
Core training script β model loading, dataset handling, SFT config |
run_full.py |
Colab wrapper β BestLossUploader callback, orchestrates training |
build_dataset.py |
v2 dataset builder β selector-aligned, refusal training, grounded prompt |
requirements.txt |
Python dependencies |
Requirements: Google Colab with β₯95GB GPU (A100 95GB). Runtime β Change runtime type β A100.
import torch
if torch.cuda.is_available():
gpu = torch.cuda.get_device_properties(0)
print(f"β GPU: {gpu.name}")
print(f"β VRAM: {gpu.total_memory / 1e9:.1f} GB")
assert gpu.total_memory / 1e9 > 90, f"Need β₯95GB GPU, got {gpu.total_memory / 1e9:.1f}GB"
print("β Ready for training!")
else:
print("β No GPU! Go to Runtime β Change runtime type β A100")
β οΈ Must install transformers from source β PyPI version doesn't support qwen3_5 architecture.
!pip install -q git+https://github.com/huggingface/transformers.git
!pip install -q trl peft datasets accelerate bitsandbytes pyevmasm sentencepiece huggingface_hub liger-kernel pycryptodome
from huggingface_hub import login
login() # paste your token when prompted
Skip if dataset already exists at demeleww/evm-decompiler-dataset.
Run once β processes contracts with selector alignment + refusal examples (~15-30 min).
from huggingface_hub import hf_hub_download
hf_hub_download("demeleww/evm-bytecode-to-solidity-qwen3.6-27b", "build_dataset.py", local_dir=".", force_download=True)
%run build_dataset.py
What it does:
PUSH4 <selector> EQ PUSH2 <dest> JUMPI patternskeccak256 selectors for each Solidity function in the source codedemeleww/evm-decompiler-datasetfrom huggingface_hub import hf_hub_download
hf_hub_download("demeleww/evm-bytecode-to-solidity-qwen3.6-27b", "train.py", local_dir=".", force_download=True)
hf_hub_download("demeleww/evm-bytecode-to-solidity-qwen3.6-27b", "run_full.py", local_dir=".", force_download=True)
print("β Scripts downloaded")
%run run_full.py
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained("demeleww/evm-tac-decompiler")
base = AutoModelForCausalLM.from_pretrained(
"AEON-7/Qwen3.6-27B-AEON-Ultimate-Uncensored",
quantization_config=bnb_config,
device_map="auto",
attn_implementation="eager",
)
model = PeftModel.from_pretrained(base, "demeleww/evm-tac-decompiler")
model.eval()
print("β Model loaded")
# Sample TAC with visible selectors
tac = """Block_0:
v1 = 0x80
v2 = 0x40
MSTORE(stack[-2], stack[-1])
v3 = CALLVALUE()
DUP1
v4 = ISZERO(stack[-1])
v5 = 0x10
IF(stack[-1]) GOTO(stack[-2])
Block_1: // @pc=16
v6 = 0x0
DUP1
REVERT
Block_2: // @pc=20
POP
v7 = 0x4
v8 = CALLDATASIZE()
v9 = LT(stack[-2], stack[-1])
v10 = 0x4c
IF(stack[-1]) GOTO(stack[-2])"""
prompt = f"### Task: Convert the following EVM Three-Address Code (TAC) to Solidity source code.\n\n### TAC:\n{tac}\n\n### Solidity:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print("=== TAC β Solidity ===")
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
from pyevmasm import disassemble_all
def bytecode_to_tac(bytecode_hex, max_chars=5000):
clean = bytecode_hex.strip()
if clean.startswith("0x"): clean = clean[2:]
if len(clean) % 2 != 0: clean = clean[:-1]
instructions = list(disassemble_all(bytes.fromhex(clean)))
binary_ops = {"ADD","SUB","MUL","DIV","SDIV","MOD","SMOD","EXP","AND","OR","XOR","SHL","SHR","SAR","LT","GT","SLT","SGT","EQ","BYTE","SIGNEXTEND"}
unary_ops = {"NOT","ISZERO"}
nullary_ops = {"CALLER","CALLVALUE","CALLDATASIZE","ADDRESS","ORIGIN","GASPRICE","TIMESTAMP","NUMBER","CHAINID","SELFBALANCE","CODESIZE","RETURNDATASIZE","GAS","COINBASE"}
lines, bid, vc = ["Block_0:"], 0, 0
for inst in instructions:
n = inst.name
if n == "JUMPDEST": bid += 1; lines.append(f"\nBlock_{bid}: // @pc={inst.pc}")
elif n.startswith("PUSH"): vc += 1; lines.append(f" v{vc} = {inst.operand or 0:#x}")
elif n in binary_ops: vc += 1; lines.append(f" v{vc} = {n}(stack[-2], stack[-1])")
elif n in unary_ops: vc += 1; lines.append(f" v{vc} = {n}(stack[-1])")
elif n in nullary_ops: vc += 1; lines.append(f" v{vc} = {n}()")
elif n == "CALLDATALOAD": vc += 1; lines.append(f" v{vc} = CALLDATALOAD(stack[-1])")
elif n in ("SLOAD","MLOAD"): vc += 1; lines.append(f" v{vc} = {n}(stack[-1])")
elif n in ("SSTORE","MSTORE"): lines.append(f" {n}(stack[-2], stack[-1])")
elif n == "JUMP": lines.append(f" GOTO(stack[-1])")
elif n == "JUMPI": lines.append(f" IF(stack[-1]) GOTO(stack[-2])")
elif n in ("RETURN","REVERT","STOP","SELFDESTRUCT","INVALID"): lines.append(f" {n}")
elif n in ("CALL","STATICCALL","DELEGATECALL"): vc += 1; lines.append(f" v{vc} = {n}(...)")
elif n == "KECCAK256": vc += 1; lines.append(f" v{vc} = KECCAK256(stack[-2], stack[-1])")
else: lines.append(f" {n}")
if len("\n".join(lines)) > max_chars: break
return "\n".join(lines)
bytecode = "0x608060405234801561001057600080fd5b506004361061004c5760003560e01c806306fdde0314610051578063095ea7b31461006f57806318160ddd1461009f57806323b872dd146100bd575b600080fd5b"
tac = bytecode_to_tac(bytecode)
print("=== Generated TAC ===")
print(tac[:2000])
print("\n=== Decompiled Solidity ===")
prompt = f"### Task: Convert the following EVM Three-Address Code (TAC) to Solidity source code.\n\n### TAC:\n{tac}\n\n### Solidity:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(output[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
OLD (v1) β ungrounded:
Contract has 20 functions, TAC covers first 200 opcodes (dispatcher only)
β All 20 functions paired with same TAC
β Model learns: "when I see ANY TAC, generate transfer(), approve(), etc."
β Result: HALLUCINATED code
NEW (v2) β grounded:
Contract has 20 functions, bytecode dispatch table has selectors for all 20
β Only functions whose selector appears in PUSH4...EQ...JUMPI get paired
β Functions NOT in dispatch β become refusal training examples
β Model learns: "only output code for selectors I can SEE in the TAC"
β Result: GROUNDED code or honest refusal
| Parameter | Value | Why |
|---|---|---|
batch_size |
2 | Conservative for 95GB with 8-bit weights (~30GB) |
grad_accum |
4 | Effective batch = 8 |
lr |
2e-4 | Standard for QLoRA |
scheduler |
cosine | Smooth decay |
warmup |
100 steps | |
lora_r |
32 | Higher rank = more expressive |
lora_alpha |
64 | 2Γ rank |
max_length |
4096 | Fits most contract TAC |
optimizer |
paged_adamw_8bit | Spills to CPU if GPU full |
gradient_checkpointing |
True | Required for 27B model |
liger_kernel |
True | Up to 60% less activation memory |
packing |
False | Preserves prompt/completion boundaries |
| Mode | VRAM for weights | Quality | Flag |
|---|---|---|---|
| 8-bit (LLM.int8) | ~30 GB | Higher | USE_8BIT = True (default) |
| 4-bit (NF4) | ~14 GB | Good | USE_8BIT = False |
USE_8BIT = False in train.pyBATCH_SIZE = 1, GRAD_ACCUM = 8use_liger_kernel=True and optim="paged_adamw_8bit" alwayspip install git+https://github.com/huggingface/transformers.git.pyc import errorsBase model
Qwen/Qwen3.6-27B