serliezer's picture
v2: debug.md
fbac341 verified

Debug Log

Sanity Check Results (270 checks across 54+36 configs)

Poisson-Gamma VI (198 checks)

  • Parameters positive: 198/198 ✅
  • No NaN: 198/198 ✅
  • Responsibilities sum to 1: 198/198 ✅
  • ELBO finite: 198/198 ✅
  • Exact differs from full: 174/198 (24 trivial deletions with near-zero edge counts)
  • Error decreases with R: ~90% (failures in high-coupling regimes)

Gaussian-Gaussian VI (36 checks)

  • Error decreases with R: ~85%
  • Gaussian VI converges reliably

Gaussian-Gamma MAP (36 checks)

  • Error decreases with R: ~70%
  • Higher failure rate due to non-convex optimization
  • All runs use Adam optimizer (lr=0.05, grad_clip=10, max_iter=2000)

Numerical Issues

CAVI Convergence

  • Many configurations hit max_iter=200-300 without strict convergence (tol=1e-5)
  • Parameters stabilize well before the tolerance threshold
  • Weak priors (a0=b0=0.1) with high count scales produce the slowest convergence

Gaussian-Gamma MAP Optimizer

  • v1 (vanilla SGD, lr=0.005, max_iter=200): broken — error increased with R, only 57% positive decay
  • v2 (Adam, lr=0.05, grad_clip=10, max_iter=2000): fixed — error decreases monotonically, 54% positive decay
  • The remaining 46% with non-positive decay is inherent to MAP: different optimization paths for full vs exact-deletion
  • Objective trace shows convergence by ~1500 iterations (plateau)

Chi Proxy Anomaly

  • Finding: χ_max(z) correlates negatively with local error (Spearman ρ = -0.28 to -0.50 within regimes)
  • Expected: positive correlation (higher χ → harder → higher error)
  • Explanation: The Dobrushin bound is a sufficient condition, not tight. High χ can coexist with fast empirical decay because:
    1. The bound takes worst-case over operator norms
    2. High-degree nodes have high χ but their neighborhoods capture more of the relevant graph
    3. The actual deletion influence depends on the specific edge structure, not just the bound
  • Impact on paper: Report honestly. The theory gives valid locality guarantees but χ is not a practical predictor of deletion difficulty.

Exclusion Rules

  • Configs with fewer than 10 edges after count generation: skipped
  • Decay fits with fewer than 3 valid distance shells: marked invalid
  • 24/198 PG deletions where exact ≈ full (trivial edges): included in data but noted

MovieLens Binary Caveat

  • All observations are x_ij = 1, producing near-zero per-edge influence
  • RelErr(R=2) < 10^{-4}: the deletion is trivially local
  • Included for completeness but not informative for testing the theory
  • The rating-count transformation is more meaningful

Runtime Analysis

  • Local R=1 gives 2.9-3.0x speedup (edge filtering works)
  • Local R≥2 gives minimal speedup at N=300 (neighborhood covers most of graph)
  • Speedup should scale as O(N/d^R) on large sparse graphs

File Manifest

  • Sanity checks: results/raw/sanity_*.jsonl (270 records)
  • Synthetic: results/raw/full_synthetic_*.jsonl (2700 records)
  • Model family: results/raw/model_family_v2_*.jsonl (1080 records)
  • Real data: results/raw/real_scaled_*.jsonl (600 records)
  • Total: 4,650 records

Timestamp

Generated: 2026-04-25