HeapTRM β Goals & Roadmap
Current State (March 2026)
304K-param TRM classifier detecting 90-97% of heap exploit techniques across glibc 2.27-2.39. V2 harness catches metadata corruption and double-frees with zero false positives on CVE pattern tests. Pwntools integration and CLI built. Action agent achieves 74% UAF success on real binary (hybrid: TRM for strategy, rules for trigger).
Immediate (next session)
1. Quarantine UAF detection (#4)
- Implement quarantine zone in v2 harness: delay real free(), fill with canary, detect overwrites
- Closes the UAF gap (test 3 in CVE sims)
- Lightweight ASAN-in-LD_PRELOAD for unmodified binaries
2. Retrain classifier on v2 harness data (#2)
- V2 dumps include
is_corrupted, corruption types, metadata change info - Train TRM to predict corruption BEFORE it happens (pre-overflow heap layouts)
- Use cross-glibc Docker data with v2 harness
Short-term (1-2 weeks)
3. AFL/libFuzzer oracle integration (#1)
- AFL custom mutator that scores heap states via
heaptrm scan --json - Fitness = code coverage + exploit-reachability score
- Directed fuzzing toward exploitable heap states
4. Real-world CVE benchmark (#5)
- CI pipeline: download CVE PoC binaries, Docker with matching glibc, run heaptrm, compare ground truth
- Start with 10 CVEs: glibc syslog (CVE-2023-6246), iconv (CVE-2024-2961), sudo heap overflow, polkit pkexec, etc.
- Publish as a benchmark for heap exploit detection
5. Temporal sequence model (#3)
- Replace per-state classification with sequence model (window of K states)
- LSTM or Transformer over TRM state embeddings
- Detect multi-step patterns: spray β free β realloc
Medium-term (1-2 months)
6. Java deserialization extension
- Instrument JVM ObjectInputStream
- Encode object graphs as grids
- Detect gadget chains (ysoserial patterns)
- Far bigger attack surface than pickle
7. Windows heap support
- API hooking via Detours or ETW tracing
- NT heap / segment heap / LFH metadata encoding
- Test against Windows exploit techniques
- Same universal grid, different harness
8. pypi package release
pip install heaptrm- Pre-trained weights bundled
- Auto-compile harness on first use
- Documentation + examples
Long-term (research direction)
9. TRM as fuzzer-in-the-loop
- Not just scoring states, but guiding input generation
- TRM predicts which mutations are most likely to trigger corruption
- Combine with symbolic execution for targeted constraint solving
10. Allocator-agnostic generalization
- Test on jemalloc (Firefox, FreeBSD), tcmalloc (Chrome), mimalloc
- The universal grid should transfer β validate this claim
- Per-allocator fine-tuning vs zero-shot
11. Pre-corruption prediction
- Train on temporal sequences ending in corruption
- Predict "this heap layout is N steps from exploitable" before overflow happens
- Runtime defense: alert/kill process before corruption materializes
Open Questions
- Is the TRM architecture actually necessary, or would a simple MLP on the summary row work just as well? (Ablation showed 3% overall contribution from chunk data, but 24% on trigger timing)
- Can the model generalize to binaries with 100K+ heap objects? Current grid is 32 rows.
- Is metadata corruption detection sufficient without UAF? Real-world exploits often chain UAF β tcache poison β arbitrary write.
- Should we pursue kernel heap exploitation (SLUB/SLAB)? Different allocator but same grid concept.
Non-goals
- Replacing ASAN/MSAN for development β those require recompilation but are more thorough
- Kernel exploit detection β different domain, different harness needed
- Closed-source binary analysis without execution β we need runtime instrumentation