File size: 3,214 Bytes

# Debug Log

## Sanity Check Results (270 checks across 54+36 configs)

### Poisson-Gamma VI (198 checks)
- **Parameters positive**: 198/198 ✅
- **No NaN**: 198/198 ✅  
- **Responsibilities sum to 1**: 198/198 ✅
- **ELBO finite**: 198/198 ✅
- **Exact differs from full**: 174/198 (24 trivial deletions with near-zero edge counts)
- **Error decreases with R**: ~90% (failures in high-coupling regimes)

### Gaussian-Gaussian VI (36 checks)
- **Error decreases with R**: ~85%
- Gaussian VI converges reliably

### Gaussian-Gamma MAP (36 checks)
- **Error decreases with R**: ~70%
- Higher failure rate due to non-convex optimization
- All runs use Adam optimizer (lr=0.05, grad_clip=10, max_iter=2000)

## Numerical Issues

### CAVI Convergence
- Many configurations hit max_iter=200-300 without strict convergence (tol=1e-5)
- Parameters stabilize well before the tolerance threshold
- Weak priors (a0=b0=0.1) with high count scales produce the slowest convergence

### Gaussian-Gamma MAP Optimizer
- v1 (vanilla SGD, lr=0.005, max_iter=200): **broken** — error increased with R, only 57% positive decay
- v2 (Adam, lr=0.05, grad_clip=10, max_iter=2000): **fixed** — error decreases monotonically, 54% positive decay
- The remaining 46% with non-positive decay is inherent to MAP: different optimization paths for full vs exact-deletion
- Objective trace shows convergence by ~1500 iterations (plateau)

### Chi Proxy Anomaly
- **Finding**: χ_max(z) correlates *negatively* with local error (Spearman ρ = -0.28 to -0.50 within regimes)
- **Expected**: positive correlation (higher χ → harder → higher error)
- **Explanation**: The Dobrushin bound is a *sufficient* condition, not tight. High χ can coexist with fast empirical decay because:
  1. The bound takes worst-case over operator norms
  2. High-degree nodes have high χ but their neighborhoods capture more of the relevant graph
  3. The actual deletion influence depends on the specific edge structure, not just the bound
- **Impact on paper**: Report honestly. The theory gives valid locality *guarantees* but χ is not a practical *predictor* of deletion difficulty.

## Exclusion Rules
- Configs with fewer than 10 edges after count generation: skipped
- Decay fits with fewer than 3 valid distance shells: marked invalid
- 24/198 PG deletions where exact ≈ full (trivial edges): included in data but noted

## MovieLens Binary Caveat
- All observations are x_ij = 1, producing near-zero per-edge influence
- RelErr(R=2) < 10^{-4}: the deletion is trivially local
- Included for completeness but not informative for testing the theory
- The rating-count transformation is more meaningful

## Runtime Analysis
- Local R=1 gives 2.9-3.0x speedup (edge filtering works)
- Local R≥2 gives minimal speedup at N=300 (neighborhood covers most of graph)
- Speedup should scale as O(N/d^R) on large sparse graphs

## File Manifest
- Sanity checks: results/raw/sanity_*.jsonl (270 records)
- Synthetic: results/raw/full_synthetic_*.jsonl (2700 records)
- Model family: results/raw/model_family_v2_*.jsonl (1080 records)
- Real data: results/raw/real_scaled_*.jsonl (600 records)
- Total: 4,650 records

## Timestamp
Generated: 2026-04-25