Quarantine UAF detection: 5/5 CVE patterns, 0 false positives

V2 harness now delays free() and fills with canary (0xFD). Detects
UAF writes by checking canary integrity on every subsequent operation.
Closes the UAF gap (test 3). Also adds goal.md with roadmap.

Files changed (2) hide show

goal.md +84 -0
heaptrm/harness/heapgrid_v2.c +84 -10

goal.md ADDED Viewed

	@@ -0,0 +1,84 @@

+# HeapTRM — Goals & Roadmap
+## Current State (March 2026)
+304K-param TRM classifier detecting 90-97% of heap exploit techniques across glibc 2.27-2.39. V2 harness catches metadata corruption and double-frees with zero false positives on CVE pattern tests. Pwntools integration and CLI built. Action agent achieves 74% UAF success on real binary (hybrid: TRM for strategy, rules for trigger).
+## Immediate (next session)
+### 1. Quarantine UAF detection (#4)
+- Implement quarantine zone in v2 harness: delay real free(), fill with canary, detect overwrites
+- Closes the UAF gap (test 3 in CVE sims)
+- Lightweight ASAN-in-LD_PRELOAD for unmodified binaries
+### 2. Retrain classifier on v2 harness data (#2)
+- V2 dumps include `is_corrupted`, corruption types, metadata change info
+- Train TRM to predict corruption BEFORE it happens (pre-overflow heap layouts)
+- Use cross-glibc Docker data with v2 harness
+## Short-term (1-2 weeks)
+### 3. AFL/libFuzzer oracle integration (#1)
+- AFL custom mutator that scores heap states via `heaptrm scan --json`
+- Fitness = code coverage + exploit-reachability score
+- Directed fuzzing toward exploitable heap states
+### 4. Real-world CVE benchmark (#5)
+- CI pipeline: download CVE PoC binaries, Docker with matching glibc, run heaptrm, compare ground truth
+- Start with 10 CVEs: glibc syslog (CVE-2023-6246), iconv (CVE-2024-2961), sudo heap overflow, polkit pkexec, etc.
+- Publish as a benchmark for heap exploit detection
+### 5. Temporal sequence model (#3)
+- Replace per-state classification with sequence model (window of K states)
+- LSTM or Transformer over TRM state embeddings
+- Detect multi-step patterns: spray → free → realloc
+## Medium-term (1-2 months)
+### 6. Java deserialization extension
+- Instrument JVM ObjectInputStream
+- Encode object graphs as grids
+- Detect gadget chains (ysoserial patterns)
+- Far bigger attack surface than pickle
+### 7. Windows heap support
+- API hooking via Detours or ETW tracing
+- NT heap / segment heap / LFH metadata encoding
+- Test against Windows exploit techniques
+- Same universal grid, different harness
+### 8. pypi package release
+- `pip install heaptrm`
+- Pre-trained weights bundled
+- Auto-compile harness on first use
+- Documentation + examples
+## Long-term (research direction)
+### 9. TRM as fuzzer-in-the-loop
+- Not just scoring states, but guiding input generation
+- TRM predicts which mutations are most likely to trigger corruption
+- Combine with symbolic execution for targeted constraint solving
+### 10. Allocator-agnostic generalization
+- Test on jemalloc (Firefox, FreeBSD), tcmalloc (Chrome), mimalloc
+- The universal grid should transfer — validate this claim
+- Per-allocator fine-tuning vs zero-shot
+### 11. Pre-corruption prediction
+- Train on temporal sequences ending in corruption
+- Predict "this heap layout is N steps from exploitable" before overflow happens
+- Runtime defense: alert/kill process before corruption materializes
+## Open Questions
+- Is the TRM architecture actually necessary, or would a simple MLP on the summary row work just as well? (Ablation showed 3% overall contribution from chunk data, but 24% on trigger timing)
+- Can the model generalize to binaries with 100K+ heap objects? Current grid is 32 rows.
+- Is metadata corruption detection sufficient without UAF? Real-world exploits often chain UAF → tcache poison → arbitrary write.
+- Should we pursue kernel heap exploitation (SLUB/SLAB)? Different allocator but same grid concept.
+## Non-goals
+- Replacing ASAN/MSAN for development — those require recompilation but are more thorough
+- Kernel exploit detection — different domain, different harness needed
+- Closed-source binary analysis without execution — we need runtime instrumentation

heaptrm/harness/heapgrid_v2.c CHANGED Viewed

@@ -29,6 +29,10 @@
 #define DUMP_BUF_SIZE (1024 * 128)
 #define CANARY_VALUE 0xDEADBEEFCAFEBABEULL
 typedef struct {
     void *user_ptr;
     size_t req_size;
@@ -70,6 +74,72 @@ static void *(*real_realloc)(void *, size_t) = NULL;
 static char early_buf[4096];
 static int early_buf_used = 0;
 /* --- Simple hash for data-change detection --- */
 static uint64_t hash_bytes(const void *data, size_t len) {
     const uint8_t *p = (const uint8_t *)data;
@@ -181,15 +251,16 @@ static void validate_all_chunks(void) {
             g_chunks[i].saved_prev_size = cur_prev_size;
         }
-        /* Check 3: UAF detection is NOT done here.
-         * Reliable UAF detection requires compile-time instrumentation (ASAN)
-         * or hardware watchpoints. LD_PRELOAD cannot distinguish glibc's
-         * legitimate writes to freed chunks from attacker UAF writes.
-         * We rely on metadata_corrupt and double_free checks instead. */
     }
     /* Check 4: Double-free detection — only flag in free() handler,
      * not here, to avoid re-reporting on every validation pass. */
 }
 /* --- Dump state --- */
@@ -331,13 +402,16 @@ void free(void *ptr) {
     if (idx >= 0) {
         g_chunks[idx].state = 2;
         g_chunks[idx].free_order = ++g_free_seq;
-        g_chunks[idx].hash_stable = 0; /* will record post-free hash on next validate */
     } else {
         /* Might be double-free — detect BEFORE calling real_free (glibc may abort) */
         int any = find_chunk(ptr);
         if (any >= 0 && g_chunks[any].state == 2) {
             add_corruption("double_free", any, "freed already-freed chunk");
-            /* Dump state with corruption BEFORE glibc aborts */
             dump_state("free_double", ptr, 0);
             if (g_chunk_count < MAX_CHUNKS) {
                 int slot = g_chunk_count++;
@@ -347,10 +421,10 @@ void free(void *ptr) {
                 g_chunks[slot].hash_stable = 0;
             }
         }
     }
-    real_free(ptr);
-    dump_state("free", ptr, 0);
     g_in_hook = 0;
 }

 #define DUMP_BUF_SIZE (1024 * 128)
 #define CANARY_VALUE 0xDEADBEEFCAFEBABEULL
+/* Quarantine: delay real free() to detect UAF writes */
+#define QUARANTINE_SIZE 32
+#define CANARY_BYTE 0xFD
 typedef struct {
     void *user_ptr;
     size_t req_size;
 static char early_buf[4096];
 static int early_buf_used = 0;
+/* Forward declaration */
+static void add_corruption(const char *type, int chunk_idx, const char *detail);
+/* --- Quarantine zone for UAF detection --- */
+typedef struct {
+    void *ptr;
+    size_t size;
+    int chunk_idx;     /* index in g_chunks */
+} quarantine_entry_t;
+static quarantine_entry_t g_quarantine[QUARANTINE_SIZE];
+static int g_quarantine_head = 0;
+static int g_quarantine_count = 0;
+static void quarantine_check_canary(int q_idx) {
+    quarantine_entry_t *qe = &g_quarantine[q_idx];
+    if (!qe->ptr) return;
+    const uint8_t *p = (const uint8_t *)qe->ptr;
+    size_t check_len = qe->size < 128 ? qe->size : 128;
+    for (size_t i = 0; i < check_len; i++) {
+        if (p[i] != CANARY_BYTE) {
+            char detail[256];
+            snprintf(detail, sizeof(detail),
+                "quarantine canary overwritten at offset %zu (0x%02x != 0xFD) — UAF write",
+                i, p[i]);
+            add_corruption("uaf_write", qe->chunk_idx, detail);
+            /* Re-fill canary to detect further writes */
+            memset(qe->ptr, CANARY_BYTE, qe->size < 128 ? qe->size : 128);
+            return;
+        }
+    }
+}
+static void quarantine_check_all(void) {
+    for (int i = 0; i < g_quarantine_count; i++) {
+        int idx = (g_quarantine_head - g_quarantine_count + i + QUARANTINE_SIZE) % QUARANTINE_SIZE;
+        quarantine_check_canary(idx);
+    }
+}
+static void quarantine_add(void *ptr, size_t size, int chunk_idx) {
+    /* If quarantine is full, drain oldest entry (actually free it) */
+    if (g_quarantine_count >= QUARANTINE_SIZE) {
+        int oldest = (g_quarantine_head - g_quarantine_count + QUARANTINE_SIZE) % QUARANTINE_SIZE;
+        quarantine_entry_t *old = &g_quarantine[oldest];
+        if (old->ptr) {
+            /* Final canary check before real free */
+            quarantine_check_canary(oldest);
+            real_free(old->ptr);
+            old->ptr = NULL;
+        }
+        g_quarantine_count--;
+    }
+    /* Fill with canary pattern */
+    size_t fill_len = size < 128 ? size : 128;
+    memset(ptr, CANARY_BYTE, fill_len);
+    /* Add to quarantine */
+    g_quarantine[g_quarantine_head].ptr = ptr;
+    g_quarantine[g_quarantine_head].size = size;
+    g_quarantine[g_quarantine_head].chunk_idx = chunk_idx;
+    g_quarantine_head = (g_quarantine_head + 1) % QUARANTINE_SIZE;
+    g_quarantine_count++;
+}
 /* --- Simple hash for data-change detection --- */
 static uint64_t hash_bytes(const void *data, size_t len) {
     const uint8_t *p = (const uint8_t *)data;
             g_chunks[i].saved_prev_size = cur_prev_size;
         }
+        /* Check 3: UAF detection moved to quarantine canary system (Check 5).
+         * Quarantine delays real free(), fills with canary pattern (0xFD),
+         * and checks for overwrite on every subsequent operation. */
     }
     /* Check 4: Double-free detection — only flag in free() handler,
      * not here, to avoid re-reporting on every validation pass. */
+    /* Check 5: Quarantine canary verification (UAF write detection) */
+    quarantine_check_all();
 }
 /* --- Dump state --- */
     if (idx >= 0) {
         g_chunks[idx].state = 2;
         g_chunks[idx].free_order = ++g_free_seq;
+        g_chunks[idx].hash_stable = 0;
+        /* Quarantine: don't actually free yet, fill with canary */
+        quarantine_add(ptr, g_chunks[idx].req_size, idx);
+        dump_state("free", ptr, 0);
     } else {
         /* Might be double-free — detect BEFORE calling real_free (glibc may abort) */
         int any = find_chunk(ptr);
         if (any >= 0 && g_chunks[any].state == 2) {
             add_corruption("double_free", any, "freed already-freed chunk");
             dump_state("free_double", ptr, 0);
             if (g_chunk_count < MAX_CHUNKS) {
                 int slot = g_chunk_count++;
                 g_chunks[slot].hash_stable = 0;
             }
         }
+        /* Still actually free for double-free (glibc will handle/abort) */
+        real_free(ptr);
+        dump_state("free", ptr, 0);
     }
     g_in_hook = 0;
 }