HeapTRM — Goals & Roadmap

Current State (March 2026)

304K-param TRM classifier detecting 90-97% of heap exploit techniques across glibc 2.27-2.39. V2 harness catches metadata corruption and double-frees with zero false positives on CVE pattern tests. Pwntools integration and CLI built. Action agent achieves 74% UAF success on real binary (hybrid: TRM for strategy, rules for trigger).

Immediate (next session)

1. Quarantine UAF detection (#4)

Implement quarantine zone in v2 harness: delay real free(), fill with canary, detect overwrites
Closes the UAF gap (test 3 in CVE sims)
Lightweight ASAN-in-LD_PRELOAD for unmodified binaries

2. Retrain classifier on v2 harness data (#2)

V2 dumps include is_corrupted, corruption types, metadata change info
Train TRM to predict corruption BEFORE it happens (pre-overflow heap layouts)
Use cross-glibc Docker data with v2 harness

Short-term (1-2 weeks)

3. AFL/libFuzzer oracle integration (#1)

AFL custom mutator that scores heap states via heaptrm scan --json
Fitness = code coverage + exploit-reachability score
Directed fuzzing toward exploitable heap states

4. Real-world CVE benchmark (#5)

CI pipeline: download CVE PoC binaries, Docker with matching glibc, run heaptrm, compare ground truth
Start with 10 CVEs: glibc syslog (CVE-2023-6246), iconv (CVE-2024-2961), sudo heap overflow, polkit pkexec, etc.
Publish as a benchmark for heap exploit detection

5. Temporal sequence model (#3)

Replace per-state classification with sequence model (window of K states)
LSTM or Transformer over TRM state embeddings
Detect multi-step patterns: spray → free → realloc

Medium-term (1-2 months)

6. Java deserialization extension

Instrument JVM ObjectInputStream
Encode object graphs as grids
Detect gadget chains (ysoserial patterns)
Far bigger attack surface than pickle

7. Windows heap support

API hooking via Detours or ETW tracing
NT heap / segment heap / LFH metadata encoding
Test against Windows exploit techniques
Same universal grid, different harness

8. pypi package release

pip install heaptrm
Pre-trained weights bundled
Auto-compile harness on first use
Documentation + examples

Long-term (research direction)

9. TRM as fuzzer-in-the-loop

Not just scoring states, but guiding input generation
TRM predicts which mutations are most likely to trigger corruption
Combine with symbolic execution for targeted constraint solving

10. Allocator-agnostic generalization

Test on jemalloc (Firefox, FreeBSD), tcmalloc (Chrome), mimalloc
The universal grid should transfer — validate this claim
Per-allocator fine-tuning vs zero-shot

11. Pre-corruption prediction

Train on temporal sequences ending in corruption
Predict "this heap layout is N steps from exploitable" before overflow happens
Runtime defense: alert/kill process before corruption materializes

Open Questions

Is the TRM architecture actually necessary, or would a simple MLP on the summary row work just as well? (Ablation showed 3% overall contribution from chunk data, but 24% on trigger timing)
Can the model generalize to binaries with 100K+ heap objects? Current grid is 32 rows.
Is metadata corruption detection sufficient without UAF? Real-world exploits often chain UAF → tcache poison → arbitrary write.
Should we pursue kernel heap exploitation (SLUB/SLAB)? Different allocator but same grid concept.

Non-goals

Replacing ASAN/MSAN for development — those require recompilation but are more thorough
Kernel exploit detection — different domain, different harness needed
Closed-source binary analysis without execution — we need runtime instrumentation