HeapTRM: Tiny Recursive Models for Security Primitives

Applying the Tiny Recursive Model architecture (arXiv 2510.04871) to security — heap exploit detection, deserialization attack classification, and guided exploit generation on real binaries.

A 304K-parameter model (~0.3MB) that reasons about exploit-relevant structure through recursive processing of grid-encoded state.

Results Summary

Classification (proven, strong)

Task	F1	Precision	Recall	Test Set
Heap exploit detection (script-level)	0.958	97.1%	94.4%	CTF binary, noisy exploits vs benign
Heap exploit detection (state-level)	0.842	84.2%	84.2%	Per-operation heap state classification
Pickle deserialization detection	0.818	73.1%	92.9%	Held-out attack families (shutil, socket, BUILD)

Action Prediction (experimental)

Task	Result	Notes
Simulator (tcache poison)	100% best-of-10	Positional memorization; does not transfer
Real binary (single technique)	74% greedy, 100% best-of-10	Hybrid: TRM for malloc/free, rule for UAF write
Real binary (multi-technique)	42% tcache + 18% off-by-one	TRM selects technique, rules trigger writes
Ablation: write_UAF with chunks	98% accuracy	vs 74% without chunk data — structure matters for trigger timing

Ablation Study

Grid Variant	Val Acc	Write_UAF Acc	Interpretation
Full grid (chunks + history + summary)	0.864	0.98	Best overall
No chunks (counters only)	0.833	0.74	Counters sufficient for M/F phase
Chunks only (no history)	0.842	0.73	Structure alone is comparable
History only	0.807	0.49	Weakest — needs state context

Key finding: Chunk structure adds +24 percentage points specifically on the exploit-critical write trigger decision, even though overall accuracy gains only +3%. The model uses spatial heap reasoning where it matters most.

Architecture

Input: 32x16 integer grid (vocab_size=64)
  Rows 0-23:  Chunk metadata (state, size, adjacency, fd/bk, coalesce potential)
  Rows 24-27: Action history (last 4 operations)
  Rows 28-31: Heap summary statistics

Model: TRM (2 recursive blocks, 6 inner iterations)
  Token embedding (64 -> 128 dim) + positional embedding
  Recursive: z = z + block_z(x + y + z); y = y + block_y(y + z)
  Output: mean pool -> linear head

Parameters: 304,260 (~0.3MB)

Components

Heap Instrumentation (`harness/`)

heapgrid_harness.c — LD_PRELOAD library that hooks malloc/free/calloc/realloc
Dumps heap chunk metadata (size, flags, fd/bk, state) as JSONL after every operation
Works with any dynamically-linked binary on Linux

Classifiers (`model/`, `dataset/`)

trm_heap.py — TRM model with deep supervision, focal loss, training/eval loops
dataset_gen.py — Converts harness JSONL dumps to 32x16 grid arrays
Validated on 23 how2heap techniques across glibc 2.35-2.39

Pickle Deserialization (`pickle_deser/`)

dumper.py — Instrumented pickle unpickler using sentinels (safe, no code execution)
grid_encoder.py — Encodes pickle VM stack/memo state as 32x16 grids
gen_payloads.py — Generates benign + malicious + noisy pickle payloads
Trained on os.system/subprocess/eval/exec, generalizes to shutil/socket/BUILD

CTF Challenge (`ctf/`)

vuln_heap.c — Menu-driven heap challenge with UAF + off-by-one bugs
drive_ctf.py — Generates exploit + benign interaction scripts
run_ctf_validation.py — End-to-end: instrument, collect, train, evaluate per-script

Action Agent (`agent/`)

universal_grid.py — Allocator-agnostic grid encoding (relationships, not internals)
simple_agent.py — TRM policy for operation type prediction (4-class)
train_enhanced.py — Training on real binary dumps with history tracking
multi_technique.py — Multi-technique agent (tcache poison + off-by-one + coalesce)
train_universal.py — GPU training with universal grid

Heap Simulator (`simulator/`)

heap_sim.py — Lightweight ptmalloc2 simulator (tcache, fastbins, coalescing, top chunk)
Used for self-play experiments; sim-to-real transfer validated

Quickstart

# Setup
uv venv .venv --python 3.12
uv pip install torch --index-url https://download.pytorch.org/whl/cu124 -p .venv
uv pip install numpy -p .venv

# Build heap harness
make -C harness/

# Run heap classifier validation (how2heap + CTF)
python3 runner/run_poc.py

# Run pickle deserialization classifier
python3 pickle_deser/run_poc.py

# Run CTF exploit detection
python3 ctf/run_ctf_validation.py

# Train action agent on real binary (GPU)
.venv/bin/python3 agent/train_universal.py

# Run multi-technique agent
.venv/bin/python3 agent/multi_technique.py

# Run ablation study
.venv/bin/python3 agent/ablation.py  # or inline script from experiments

Honest Assessment

What works: Classification. TRM learns exploit-relevant structure from grid-encoded state and generalizes across technique families. Zero false positives on CTF detection. The LD_PRELOAD harness is a useful standalone tool.

What partially works: Action prediction with hybrid architecture (TRM for phase, rules for trigger). Achieves 74% single-technique on real binaries but relies on hand-crafted rules for the critical write step.

What doesn't work: Pure end-to-end action prediction from grid to operation. The 128-action space overwhelms the model. Simulator-trained policies don't transfer to real binaries without matching grid encoders. The model takes counter shortcuts over spatial reasoning for M/F decisions.

Key insight: TRM's recursive processing genuinely helps for exploit trigger timing (98% vs 74% per ablation) but not for phase sequencing (counters suffice). The architecture is best suited as a classifier/oracle rather than a standalone agent.

Checkpoints

data/checkpoints_focal/best_model.pt — Heap classifier (focal loss, best F1=0.636 on held-out techniques)
ctf/checkpoints/best_model.pt — CTF exploit detector (F1=0.842 state-level, F1=0.958 script-level)
pickle_deser/checkpoints/best_model.pt — Pickle deser classifier (F1=0.818)
agent/checkpoints/ — Action prediction models

Citation

Based on:

@article{jolicoeur2025less,
  title={Less is More: Recursive Reasoning with Tiny Networks},
  author={Jolicoeur-Martineau, Alexia},
  journal={arXiv preprint arXiv:2510.04871},
  year={2025}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for amarck/heap-trm

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 517