File size: 3,214 Bytes
0fc18d4 fbac341 0fc18d4 fbac341 0fc18d4 fbac341 0fc18d4 fbac341 ee515ea 0fc18d4 fbac341 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | # Debug Log
## Sanity Check Results (270 checks across 54+36 configs)
### Poisson-Gamma VI (198 checks)
- **Parameters positive**: 198/198 ✅
- **No NaN**: 198/198 ✅
- **Responsibilities sum to 1**: 198/198 ✅
- **ELBO finite**: 198/198 ✅
- **Exact differs from full**: 174/198 (24 trivial deletions with near-zero edge counts)
- **Error decreases with R**: ~90% (failures in high-coupling regimes)
### Gaussian-Gaussian VI (36 checks)
- **Error decreases with R**: ~85%
- Gaussian VI converges reliably
### Gaussian-Gamma MAP (36 checks)
- **Error decreases with R**: ~70%
- Higher failure rate due to non-convex optimization
- All runs use Adam optimizer (lr=0.05, grad_clip=10, max_iter=2000)
## Numerical Issues
### CAVI Convergence
- Many configurations hit max_iter=200-300 without strict convergence (tol=1e-5)
- Parameters stabilize well before the tolerance threshold
- Weak priors (a0=b0=0.1) with high count scales produce the slowest convergence
### Gaussian-Gamma MAP Optimizer
- v1 (vanilla SGD, lr=0.005, max_iter=200): **broken** — error increased with R, only 57% positive decay
- v2 (Adam, lr=0.05, grad_clip=10, max_iter=2000): **fixed** — error decreases monotonically, 54% positive decay
- The remaining 46% with non-positive decay is inherent to MAP: different optimization paths for full vs exact-deletion
- Objective trace shows convergence by ~1500 iterations (plateau)
### Chi Proxy Anomaly
- **Finding**: χ_max(z) correlates *negatively* with local error (Spearman ρ = -0.28 to -0.50 within regimes)
- **Expected**: positive correlation (higher χ → harder → higher error)
- **Explanation**: The Dobrushin bound is a *sufficient* condition, not tight. High χ can coexist with fast empirical decay because:
1. The bound takes worst-case over operator norms
2. High-degree nodes have high χ but their neighborhoods capture more of the relevant graph
3. The actual deletion influence depends on the specific edge structure, not just the bound
- **Impact on paper**: Report honestly. The theory gives valid locality *guarantees* but χ is not a practical *predictor* of deletion difficulty.
## Exclusion Rules
- Configs with fewer than 10 edges after count generation: skipped
- Decay fits with fewer than 3 valid distance shells: marked invalid
- 24/198 PG deletions where exact ≈ full (trivial edges): included in data but noted
## MovieLens Binary Caveat
- All observations are x_ij = 1, producing near-zero per-edge influence
- RelErr(R=2) < 10^{-4}: the deletion is trivially local
- Included for completeness but not informative for testing the theory
- The rating-count transformation is more meaningful
## Runtime Analysis
- Local R=1 gives 2.9-3.0x speedup (edge filtering works)
- Local R≥2 gives minimal speedup at N=300 (neighborhood covers most of graph)
- Speedup should scale as O(N/d^R) on large sparse graphs
## File Manifest
- Sanity checks: results/raw/sanity_*.jsonl (270 records)
- Synthetic: results/raw/full_synthetic_*.jsonl (2700 records)
- Model family: results/raw/model_family_v2_*.jsonl (1080 records)
- Real data: results/raw/real_scaled_*.jsonl (600 records)
- Total: 4,650 records
## Timestamp
Generated: 2026-04-25
|