serliezer
/

dobrushin-unlearning-experiments

Model card Files Files and versions

dobrushin-unlearning-experiments / debug.md

serliezer's picture

v2: debug.md

fbac341 verified 18 days ago

|

history blame contribute delete

3.21 kB

	# Debug Log

	## Sanity Check Results (270 checks across 54+36 configs)

	### Poisson-Gamma VI (198 checks)
	- Parameters positive: 198/198 ✅
	- No NaN: 198/198 ✅
	- Responsibilities sum to 1: 198/198 ✅
	- ELBO finite: 198/198 ✅
	- Exact differs from full: 174/198 (24 trivial deletions with near-zero edge counts)
	- Error decreases with R: ~90% (failures in high-coupling regimes)

	### Gaussian-Gaussian VI (36 checks)
	- Error decreases with R: ~85%
	- Gaussian VI converges reliably

	### Gaussian-Gamma MAP (36 checks)
	- Error decreases with R: ~70%
	- Higher failure rate due to non-convex optimization
	- All runs use Adam optimizer (lr=0.05, grad_clip=10, max_iter=2000)

	## Numerical Issues

	### CAVI Convergence
	- Many configurations hit max_iter=200-300 without strict convergence (tol=1e-5)
	- Parameters stabilize well before the tolerance threshold
	- Weak priors (a0=b0=0.1) with high count scales produce the slowest convergence

	### Gaussian-Gamma MAP Optimizer
	- v1 (vanilla SGD, lr=0.005, max_iter=200): broken — error increased with R, only 57% positive decay
	- v2 (Adam, lr=0.05, grad_clip=10, max_iter=2000): fixed — error decreases monotonically, 54% positive decay
	- The remaining 46% with non-positive decay is inherent to MAP: different optimization paths for full vs exact-deletion
	- Objective trace shows convergence by ~1500 iterations (plateau)

	### Chi Proxy Anomaly
	- Finding: χ_max(z) correlates negatively with local error (Spearman ρ = -0.28 to -0.50 within regimes)
	- Expected: positive correlation (higher χ → harder → higher error)
	- Explanation: The Dobrushin bound is a sufficient condition, not tight. High χ can coexist with fast empirical decay because:
	1. The bound takes worst-case over operator norms
	2. High-degree nodes have high χ but their neighborhoods capture more of the relevant graph
	3. The actual deletion influence depends on the specific edge structure, not just the bound
	- Impact on paper: Report honestly. The theory gives valid locality guarantees but χ is not a practical predictor of deletion difficulty.

	## Exclusion Rules
	- Configs with fewer than 10 edges after count generation: skipped
	- Decay fits with fewer than 3 valid distance shells: marked invalid
	- 24/198 PG deletions where exact ≈ full (trivial edges): included in data but noted

	## MovieLens Binary Caveat
	- All observations are x_ij = 1, producing near-zero per-edge influence
	- RelErr(R=2) < 10^{-4}: the deletion is trivially local
	- Included for completeness but not informative for testing the theory
	- The rating-count transformation is more meaningful

	## Runtime Analysis
	- Local R=1 gives 2.9-3.0x speedup (edge filtering works)
	- Local R≥2 gives minimal speedup at N=300 (neighborhood covers most of graph)
	- Speedup should scale as O(N/d^R) on large sparse graphs

	## File Manifest
	- Sanity checks: results/raw/sanity_*.jsonl (270 records)
	- Synthetic: results/raw/full_synthetic_*.jsonl (2700 records)
	- Model family: results/raw/model_family_v2_*.jsonl (1080 records)
	- Real data: results/raw/real_scaled_*.jsonl (600 records)
	- Total: 4,650 records

	## Timestamp
	Generated: 2026-04-25