serliezer
/

dobrushin-unlearning-experiments

Model card Files Files and versions

xet

Community

serliezer commited on 22 days ago

Commit

42ea22d

verified ·

1 Parent(s): b0584e8

Add main.tex

Browse files

Files changed (1) hide show

main.tex +619 -0

main.tex ADDED Viewed

	@@ -0,0 +1,619 @@

+\documentclass[final]{neurips_2026}
+\usepackage[utf8]{inputenc}
+\usepackage[T1]{fontenc}
+\usepackage{hyperref}
+\usepackage{url}
+\usepackage{booktabs}
+\usepackage{amsfonts}
+\usepackage{amsmath}
+\usepackage{amssymb}
+\usepackage{amsthm}
+\usepackage{nicefrac}
+\usepackage{microtype}
+\usepackage{graphicx}
+\usepackage{xcolor}
+\usepackage{enumitem}
+\usepackage{algorithm}
+\usepackage{algorithmic}
+\usepackage{mathtools}
+\usepackage{thm-restate}
+% Theorem environments
+\newtheorem{theorem}{Theorem}
+\newtheorem{lemma}[theorem]{Lemma}
+\newtheorem{proposition}[theorem]{Proposition}
+\newtheorem{corollary}[theorem]{Corollary}
+\newtheorem{definition}[theorem]{Definition}
+\newtheorem{assumption}[theorem]{Assumption}
+\newtheorem{remark}[theorem]{Remark}
+\newtheorem{example}[theorem]{Example}
+% Shortcuts
+\newcommand{\R}{\mathbb{R}}
+\newcommand{\E}{\mathbb{E}}
+\newcommand{\KL}{\mathrm{KL}}
+\newcommand{\ELBO}{\mathcal{L}}
+\newcommand{\setminus}{\smallsetminus}
+\newcommand{\cN}{\mathcal{N}}
+\newcommand{\cG}{\mathcal{G}}
+\newcommand{\bU}{\mathbf{U}}
+\newcommand{\bV}{\mathbf{V}}
+\newcommand{\bX}{\mathbf{X}}
+\newcommand{\blambda}{\boldsymbol{\lambda}}
+\DeclareMathOperator{\diag}{diag}
+\DeclareMathOperator{\Gam}{Gamma}
+\DeclareMathOperator{\Poi}{Poisson}
+\DeclareMathOperator{\Mult}{Mult}
+\DeclareMathOperator{\Tr}{Tr}
+\DeclareMathOperator*{\argmin}{arg\,min}
+\DeclareMathOperator*{\argmax}{arg\,max}
+\title{When Is Local Unlearning Possible?\\ A Weighted Dobrushin Theory for Variational Inference}
+\author{%
+  Anonymous Author(s)
+}
+\begin{document}
+\maketitle
+\begin{abstract}
+We study the problem of efficiently removing the influence of a single data point from a fitted variational inference model without full retraining.
+We develop a \emph{weighted Dobrushin theory} that provides sufficient conditions under which the deletion influence decays exponentially with graph distance from the deleted observation, enabling local unlearning.
+The key quantity is the \emph{weighted interaction mass}~$\chi(z)$, which captures how strongly each observation couples the variational parameters of its endpoints through the coordinate-ascent variational inference (CAVI) fixed-point map.
+When $\chi(z) < 1$ in a suitable spectral sense, we prove that the variational parameter change due to deletion decays as $O(e^{-\mu \cdot d})$, where $d$ is the graph distance and $\mu > 0$ depends on the spectral gap of the weighted Dobrushin matrix.
+We instantiate the theory for Gamma--Poisson, Gaussian--Gaussian, and Gaussian--Gamma matrix factorization models, derive closed-form expressions for the interaction constants, and validate the theory on synthetic and real-world recommendation datasets.
+Local radius-$R$ unlearning achieves up to $3\times$ speedup over exact retraining at $R=1$ with relative error below $8\%$, and the error decreases exponentially with~$R$.
+\end{abstract}
+%% ============================================================
+\section{Introduction}
+\label{sec:intro}
+Machine unlearning---the problem of removing the influence of a specific training data point from a learned model---has attracted significant attention due to privacy regulations such as the GDPR ``right to be forgotten''~\citep{bourtoule2021machine,cao2015towards,ginart2019making}.
+The na\"ive approach of retraining from scratch is computationally prohibitive for large models, motivating the development of \emph{approximate unlearning} methods that efficiently update the model to approximate the retrained solution~\citep{guo2020certified,neel2021descent,izzo2021approximate,sekhari2021remember}.
+Most existing unlearning methods operate \emph{globally}: they modify all model parameters to account for the deleted data point.
+For large-scale models, even a single Newton step or influence-function correction requires touching the entire parameter vector~\citep{koh2017understanding}.
+This raises a natural question:
+\begin{quote}
+\emph{When is it possible to unlearn locally---by updating only a small neighborhood of parameters near the deleted observation, while leaving the rest unchanged?}
+\end{quote}
+We answer this question for \emph{variational inference} (VI) models fitted by coordinate-ascent variational inference (CAVI).
+CAVI optimizes a mean-field variational objective by iteratively updating one block of variational parameters at a time, holding the others fixed.
+The structure of the CAVI update map induces a natural notion of \emph{interaction}: block~$u$'s update depends on block~$v$ only if $u$ and $v$ share an observed data point.
+In the bipartite user--item models common in recommendation and topic modeling, this interaction structure is determined by the observation graph.
+\paragraph{Contributions.}
+\begin{enumerate}[nosep,leftmargin=*]
+\item We develop a \emph{weighted Dobrushin theory} for CAVI fixed points (Section~\ref{sec:theory}). We define a weighted interaction matrix whose entries capture the sensitivity of each block's CAVI update to perturbations in its neighbors, weighted by the data values. When the spectral radius of this matrix is less than~1, the CAVI fixed point is unique and deletion influence decays exponentially with graph distance.
+\item We derive \emph{closed-form interaction constants} for Gamma--Poisson matrix factorization (Section~\ref{sec:gamma_poisson}), the workhorse model for count data in recommendation and text mining. The constants depend on the digamma function of the shape parameters, the rate parameters, and the observed counts, providing an interpretable proxy for deletion difficulty.
+\item We extend the framework to Gaussian--Gaussian and Gaussian--Gamma matrix factorization (Section~\ref{sec:extensions}), showing that the locality phenomenon is not specific to the Poisson--Gamma conjugate pair.
+\item We validate the theory on synthetic data across three graph families and three model families, and on real-world data from Last.fm and MovieLens (Section~\ref{sec:experiments}). Local $R=1$ unlearning achieves $3\times$ speedup with $<8\%$ relative error; the error decays exponentially with radius.
+\end{enumerate}
+\paragraph{Related work.}
+\emph{Machine unlearning} methods can be grouped into exact methods based on data partitioning~\citep{bourtoule2021machine,chen2022graph}, approximate methods based on influence functions or Newton steps~\citep{guo2020certified,izzo2021approximate,neel2021descent}, and certified methods with formal deletion guarantees~\citep{sekhari2021remember,ullah2021machine}.
+Our work is closest to the influence-function approach but exploits the graph structure of CAVI to restrict updates to a local neighborhood.
+The \emph{Dobrushin uniqueness condition}~\citep{dobrushin1968description,dobrushin1970prescribing} is a classical tool in statistical mechanics for establishing uniqueness of Gibbs measures and decay of correlations.
+It has been applied to analyze convergence of belief propagation~\citep{tatikonda2002loopy}, Glauber dynamics~\citep{hayes2006simple}, and localized inference~\citep{loh2017efficient}.
+Our contribution is to adapt the Dobrushin condition to the \emph{variational} inference setting, where the ``measure'' is the CAVI fixed point rather than a Gibbs measure, and to derive \emph{weighted} versions that account for heterogeneous data values.
+\emph{Gamma--Poisson matrix factorization}~\citep{gopalan2015scalable,zhou2012beta,cemgil2009bayesian} is a widely used model for count data.
+The augmented representation with auxiliary multinomial variables admits closed-form CAVI updates~\citep{gopalan2015scalable}, making it an ideal testbed for our theory.
+%% ============================================================
+\section{Problem Setup}
+\label{sec:setup}
+\paragraph{Bipartite observation model.}
+Consider a bipartite graph $G = (U, V, E)$ with user nodes $U = \{1, \ldots, N\}$, item nodes $V = \{1, \ldots, M\}$, and observed edges $E \subseteq U \times V$.
+For each edge $(i,j) \in E$, we observe a data value $x_{ij}$.
+We denote the neighborhood of user $i$ as $\Omega_i = \{j : (i,j) \in E\}$ and of item $j$ as $\Omega_j = \{i : (i,j) \in E\}$.
+\paragraph{Latent variable model.}
+Each user $i$ has a latent factor $U_i \in \R^K$ and each item $j$ has a latent factor $V_j \in \R^K$.
+The data $x_{ij}$ is generated from a likelihood that depends on $U_i$ and $V_j$.
+We place independent priors on $U_i$ and $V_j$.
+\paragraph{Mean-field variational inference.}
+We approximate the posterior $p(U, V \mid X)$ with a factorized distribution:
+\begin{equation}
+q(U, V; \blambda) = \prod_{i=1}^N q(U_i; \lambda^U_i) \prod_{j=1}^M q(V_j; \lambda^V_j),
+\end{equation}
+where $\blambda = (\lambda^U_1, \ldots, \lambda^U_N, \lambda^V_1, \ldots, \lambda^V_M)$ collects all variational parameters.
+We write $\blambda_u$ for the parameter block of node $u$ (either a user or item).
+The ELBO is:
+\begin{equation}
+\ELBO(\blambda) = \E_q[\log p(X, U, V)] - \E_q[\log q(U, V; \blambda)].
+\end{equation}
+\paragraph{CAVI.}
+Coordinate-ascent variational inference updates each block $\blambda_u$ by maximizing $\ELBO$ with all other blocks fixed:
+\begin{equation}
+\label{eq:cavi}
+\blambda_u^{(t+1)} = F_u(\blambda_{-u}^{(t)}),
+\end{equation}
+where $F_u$ is the CAVI update map for block $u$.
+A fixed point $\blambda^\star$ satisfies $\blambda_u^\star = F_u(\blambda_{-u}^\star)$ for all $u$.
+\paragraph{Deletion and unlearning.}
+Given a deletion request for observation $z = (i, j, x_{ij})$, the goal is to compute the variational parameters $\blambda^{\setminus z}$ that would result from fitting the model on $E \setminus \{z\}$.
+\emph{Exact deletion} retrains from scratch (or from a warm start).
+\emph{Local radius-$R$ unlearning} defines a seed set $S(z) = \{U_i, V_j\}$ and updates only blocks within graph distance $R$ of $S(z)$, holding all other blocks at their full-data values $\blambda^\star$.
+%% ============================================================
+\section{Weighted Dobrushin Theory for CAVI}
+\label{sec:theory}
+\subsection{The CAVI Interaction Matrix}
+The key object is the \emph{CAVI interaction matrix} $\mathbf{C}$, an $(N+M) \times (N+M)$ matrix whose entries measure how strongly block $u$'s CAVI update depends on block $v$.
+\begin{definition}[CAVI interaction coefficient]
+\label{def:interaction}
+For blocks $u, v$ in the variational model, define:
+\begin{equation}
+C_{uv} = \sup_{\blambda} \left\| \frac{\partial F_u(\blambda_{-u})}{\partial \blambda_v} \right\|,
+\end{equation}
+where the norm is the operator norm between the parameter spaces of blocks $v$ and $u$, and the supremum is taken over a suitable domain of~$\blambda$.
+\end{definition}
+In the bipartite setting, $C_{uv} = 0$ unless $u$ and $v$ share an edge.
+For a user block $U_i$, its update depends only on the item blocks $V_j$ for $j \in \Omega_i$, and vice versa.
+\begin{definition}[Weighted Dobrushin condition]
+\label{def:dobrushin}
+The \emph{weighted Dobrushin condition} holds if the spectral radius of $\mathbf{C}$ satisfies:
+\begin{equation}
+\rho(\mathbf{C}) < 1.
+\end{equation}
+Equivalently, there exist positive weights $w_u > 0$ such that for all $u$:
+\begin{equation}
+\label{eq:weighted_dobrushin}
+\sum_{v \neq u} C_{uv} \frac{w_v}{w_u} < 1.
+\end{equation}
+\end{definition}
+\subsection{Main Theorem: Exponential Decay of Deletion Influence}
+\begin{theorem}[Deletion influence decay]
+\label{thm:main}
+Suppose the weighted Dobrushin condition (Definition~\ref{def:dobrushin}) holds with $\rho(\mathbf{C}) = 1 - \delta$ for some $\delta > 0$.
+Let $\blambda^\star$ be the CAVI fixed point on the full data, and $\blambda^{\setminus z}$ be the fixed point after deleting observation $z = (i,j,x_{ij})$.
+Then for any block $u$ at graph distance $d(u, S(z)) = r$ from the seed set $S(z) = \{U_i, V_j\}$:
+\begin{equation}
+\|\blambda_u^\star - \blambda_u^{\setminus z}\| \leq C_0 \cdot (1 - \delta)^r,
+\end{equation}
+where $C_0$ depends on the magnitude of the deleted observation's contribution to the seed blocks' CAVI updates.
+\end{theorem}
+The proof (Appendix~\ref{app:proof_main}) proceeds by analyzing the perturbation of the CAVI fixed-point equation.
+Deleting $z$ changes the update map only for the seed blocks $U_i$ and $V_j$.
+The change propagates through the interaction matrix: each hop multiplies by $\mathbf{C}$, and since $\rho(\mathbf{C}) < 1$, the perturbation contracts geometrically.
+\begin{corollary}[Local unlearning error bound]
+\label{cor:local}
+Under the conditions of Theorem~\ref{thm:main}, local radius-$R$ unlearning satisfies:
+\begin{equation}
+\|\blambda^{(R)}_{\mathrm{local}} - \blambda^{\setminus z}\| \leq C_0 \cdot \frac{(1-\delta)^{R+1}}{1 - (1-\delta)} = \frac{C_0}{\delta} (1-\delta)^{R+1}.
+\end{equation}
+\end{corollary}
+This shows that local unlearning error decreases \emph{exponentially} with radius $R$, with rate determined by the spectral gap $\delta$.
+\subsection{Weighted Interaction Mass}
+For a specific deletion $z = (i, j, x_{ij})$, we define the \emph{weighted interaction mass} at the seed nodes:
+\begin{definition}[Weighted interaction mass]
+\label{def:chi}
+\begin{equation}
+\chi_i = \sum_{j' \in \Omega_i} C_{U_i, V_{j'}}, \qquad
+\widetilde{\chi}_j = \sum_{i' \in \Omega_j} C_{V_j, U_{i'}}, \qquad
+\chi(z) = \max\{\chi_i, \widetilde{\chi}_j\}.
+\end{equation}
+\end{definition}
+If $\chi(z) < 1$, the Dobrushin condition holds \emph{at least locally} around the seed, and the influence of deleting $z$ decays.
+The quantity $\chi(z)$ serves as a per-deletion proxy for locality strength: deletions with low $\chi$ are ``easy'' (strongly local), while those with high $\chi$ are ``hard'' (potentially non-local).
+%% ============================================================
+\section{Gamma--Poisson Matrix Factorization}
+\label{sec:gamma_poisson}
+\subsection{Model}
+The augmented Gamma--Poisson model~\citep{gopalan2015scalable} is:
+\begin{align}
+U_{ik} &\sim \Gam(a_0, b_0), \quad V_{jk} \sim \Gam(c_0, d_0), \quad k = 1, \ldots, K, \\
+x_{ij} \mid U_i, V_j &\sim \Poi\!\left(\rho_x \sum_{k=1}^K U_{ik} V_{jk}\right).
+\end{align}
+Using the Poisson--multinomial augmentation, each count $x_{ij}$ is split into component-wise counts $z_{ijk}$ with $\sum_k z_{ijk} = x_{ij}$ and:
+\begin{equation}
+(z_{ij1}, \ldots, z_{ijK}) \mid U_i, V_j \sim \Mult\!\left(x_{ij};\; \frac{U_{ik} V_{jk}}{\sum_\ell U_{i\ell} V_{j\ell}}\right).
+\end{equation}
+\subsection{CAVI Updates}
+The mean-field variational family is $q(U_{ik}) = \Gam(a_{ik}, b_{ik})$, $q(V_{jk}) = \Gam(c_{jk}, d_{jk})$.
+The responsibilities are:
+\begin{equation}
+r_{ijk} = \frac{\exp(\psi(a_{ik}) - \log b_{ik} + \psi(c_{jk}) - \log d_{jk})}{\sum_\ell \exp(\psi(a_{i\ell}) - \log b_{i\ell} + \psi(c_{j\ell}) - \log d_{j\ell})},
+\end{equation}
+where $\psi$ is the digamma function.
+The CAVI updates are:
+\begin{align}
+a_{ik} &= a_0 + \sum_{j \in \Omega_i} x_{ij} r_{ijk}, & b_{ik} &= b_0 + \sum_{j \in \Omega_i} \frac{c_{jk}}{d_{jk}}, \\
+c_{jk} &= c_0 + \sum_{i \in \Omega_j} x_{ij} r_{ijk}, & d_{jk} &= d_0 + \sum_{i \in \Omega_j} \frac{a_{ik}}{b_{ik}}.
+\end{align}
+\subsection{Interaction Constants}
+\begin{proposition}[Interaction constants for Gamma--Poisson]
+\label{prop:pg_constants}
+The CAVI interaction coefficients for the Gamma--Poisson model satisfy:
+\begin{equation}
+C_{U_i, V_j} \leq C_x \cdot x_{ij} + C_0,
+\end{equation}
+where:
+\begin{equation}
+C_x = \tfrac{1}{2}\psi'(c_{\min}) + \tfrac{1}{2 d_{\min}}, \qquad
+C_0 = \tfrac{1}{d_{\min}} + \tfrac{c_{\max}}{d_{\min}^2},
+\end{equation}
+and $c_{\min}, d_{\min}, c_{\max}$ are bounds on the variational parameters at the fixed point.
+Similarly, $C_{V_j, U_i} \leq \widetilde{C}_x \cdot x_{ij} + \widetilde{C}_0$ with:
+\begin{equation}
+\widetilde{C}_x = \tfrac{1}{2}\psi'(a_{\min}) + \tfrac{1}{2 b_{\min}}, \qquad
+\widetilde{C}_0 = \tfrac{1}{b_{\min}} + \tfrac{a_{\max}}{b_{\min}^2}.
+\end{equation}
+\end{proposition}
+The proof (Appendix~\ref{app:proof_constants}) differentiates the CAVI update with respect to the neighbor's parameters and bounds the resulting Jacobian.
+The $C_x \cdot x_{ij}$ term captures the count-dependent coupling (higher counts create stronger interaction), while $C_0$ captures the baseline coupling through the rate-parameter updates.
+The weighted interaction mass at the seed of deletion $z = (i,j,x_{ij})$ is:
+\begin{equation}
+\chi_i = \sum_{j' \in \Omega_i} (C_x \cdot x_{ij'} + C_0), \qquad
+\widetilde{\chi}_j = \sum_{i' \in \Omega_j} (\widetilde{C}_x \cdot x_{i'j} + \widetilde{C}_0).
+\end{equation}
+\paragraph{Interpretation.}
+The Dobrushin condition $\chi(z) < 1$ is easier to satisfy when: (i) the graph is sparse (few neighbors), (ii) counts are low, (iii) priors are strong (large $b_0, d_0$ push $b_{\min}, d_{\min}$ away from zero), and (iv) the rank $K$ is small (fewer components to couple).
+This provides actionable guidance: locality is strongest for sparse, low-count, strongly regularized models.
+%% ============================================================
+\section{Extensions to Other Model Families}
+\label{sec:extensions}
+\subsection{Gaussian--Gaussian Matrix Factorization}
+For $U_{ik} \sim \cN(0, \sigma_U^2)$, $V_{jk} \sim \cN(0, \sigma_V^2)$, and $x_{ij} \mid U_i, V_j \sim \cN(U_i^\top V_j, \sigma_x^2)$, the mean-field variational family is $q(U_{ik}) = \cN(m^U_{ik}, s^U_{ik})$, $q(V_{jk}) = \cN(m^V_{jk}, s^V_{jk})$.
+The interaction proxy for this model is:
+\begin{equation}
+\chi_i^{GG} = \sum_{j \in \Omega_i} \frac{1}{\sigma_x^2} \E_q[\|V_j\|^2], \qquad
+\widetilde{\chi}_j^{GG} = \sum_{i \in \Omega_j} \frac{1}{\sigma_x^2} \E_q[\|U_i\|^2].
+\end{equation}
+Locality is stronger when observation noise $\sigma_x^2$ is large (weaker coupling) and priors are strong (small $\sigma_U^2, \sigma_V^2$).
+\subsection{Gaussian--Gamma MAP}
+For nonnegative factors with Gamma priors and Gaussian likelihood, we optimize the MAP objective:
+\begin{equation}
+\max_{U \geq 0, V \geq 0} \left\{ -\frac{1}{2\sigma_x^2} \sum_{(i,j) \in E} (x_{ij} - U_i^\top V_j)^2 + \sum_{i,k} \log p(U_{ik}) + \sum_{j,k} \log p(V_{jk}) \right\}.
+\end{equation}
+Since this is non-convex, the ``exact deletion'' solution depends on the optimization path.
+The locality theory applies approximately: we use the same interaction proxy but note that the guarantees are weaker because the fixed point may not be unique.
+%% ============================================================
+%% EXPERIMENTS (imported from separate file)
+%% ============================================================
+\input{latex_results_section}
+%% ============================================================
+\section{Conclusion}
+\label{sec:conclusion}
+We have developed a weighted Dobrushin theory that characterizes when local unlearning is possible for variational inference models.
+The theory identifies the weighted interaction mass $\chi(z)$ as the key quantity controlling locality: when $\chi(z) < 1$, deletion influence decays exponentially with graph distance.
+Experiments on synthetic and real data confirm that:
+(i) influence decay is robust across graph types and model families,
+(ii) local radius-$R$ unlearning error decreases exponentially with $R$,
+(iii) the speedup is most pronounced for small $R$ on sparse graphs,
+and (iv) conjugate VI models exhibit stronger locality than non-convex MAP alternatives.
+The main limitation is that the theoretical proxy $\chi(z)$ does not tightly predict per-deletion difficulty in practice---it provides a valid sufficient condition but is conservative.
+Future work could tighten the bound using spectral methods or develop data-dependent proxies.
+Another direction is scaling the experiments to large graphs ($N \gg 10^4$) where the runtime benefits of local unlearning are most significant.
+%% ============================================================
+\bibliographystyle{plainnat}
+\begin{thebibliography}{99}
+\bibitem[Bourtoule et~al.(2021)]{bourtoule2021machine}
+L.~Bourtoule, V.~Chandrasekaran, C.A. Choquette-Choo, H.~Jia, A.~Travers, B.~Zhang, D.~Lie, and N.~Papernot.
+\newblock Machine unlearning.
+\newblock In \emph{IEEE S\&P}, 2021.
+\bibitem[Cao and Yang(2015)]{cao2015towards}
+Y.~Cao and J.~Yang.
+\newblock Towards making systems forget with machine unlearning.
+\newblock In \emph{IEEE S\&P}, 2015.
+\bibitem[Cemgil(2009)]{cemgil2009bayesian}
+A.T. Cemgil.
+\newblock Bayesian inference for nonnegative matrix factorisation models.
+\newblock \emph{Computational Intelligence and Neuroscience}, 2009.
+\bibitem[Chen et~al.(2022)]{chen2022graph}
+M.~Chen, Z.~Zhang, T.~Wang, M.~Backes, M.~Humbert, and Y.~Zhang.
+\newblock Graph unlearning.
+\newblock In \emph{CCS}, 2022.
+\bibitem[Dobrushin(1968)]{dobrushin1968description}
+R.L. Dobrushin.
+\newblock Description of a random field by means of conditional probabilities and the conditions governing its regularity.
+\newblock \emph{Theory of Probability and its Applications}, 13(2):197--224, 1968.
+\bibitem[Dobrushin(1970)]{dobrushin1970prescribing}
+R.L. Dobrushin.
+\newblock Prescribing a system of random variables by conditional distributions.
+\newblock \emph{Theory of Probability and its Applications}, 15(3):458--486, 1970.
+\bibitem[Ginart et~al.(2019)]{ginart2019making}
+A.~Ginart, M.~Guan, G.~Valiant, and J.~Zou.
+\newblock Making {AI} forget you: Data deletion in machine learning.
+\newblock In \emph{NeurIPS}, 2019.
+\bibitem[Gopalan et~al.(2015)]{gopalan2015scalable}
+P.~Gopalan, J.M. Hofman, and D.M. Blei.
+\newblock Scalable recommendation with hierarchical {P}oisson factorization.
+\newblock In \emph{UAI}, 2015.
+\bibitem[Guo et~al.(2020)]{guo2020certified}
+C.~Guo, T.~Goldstein, A.~Hannun, and L.~van~der~Maaten.
+\newblock Certified data removal from machine learning models.
+\newblock In \emph{ICML}, 2020.
+\bibitem[Hayes and Sinclair(2006)]{hayes2006simple}
+T.P. Hayes and A.~Sinclair.
+\newblock A simple condition implying rapid mixing of single-site dynamics on spin systems.
+\newblock In \emph{FOCS}, 2006.
+\bibitem[Izzo et~al.(2021)]{izzo2021approximate}
+Z.~Izzo, M.A. Smart, K.~Chaudhuri, and J.~Zou.
+\newblock Approximate data deletion from machine learning models.
+\newblock In \emph{AISTATS}, 2021.
+\bibitem[Koh and Liang(2017)]{koh2017understanding}
+P.W. Koh and P.~Liang.
+\newblock Understanding black-box predictions via influence functions.
+\newblock In \emph{ICML}, 2017.
+\bibitem[Loh and Wainwright(2017)]{loh2017efficient}
+P.-L. Loh and M.J. Wainwright.
+\newblock Efficient localized inference for large graphical models.
+\newblock \emph{arXiv:1710.10404}, 2017.
+\bibitem[Neel et~al.(2021)]{neel2021descent}
+S.~Neel, A.~Roth, and S.~Sharifi-Malvajerdi.
+\newblock Descent-to-delete: Gradient-based methods for machine unlearning.
+\newblock In \emph{ALT}, 2021.
+\bibitem[Sekhari et~al.(2021)]{sekhari2021remember}
+A.~Sekhari, J.~Acharya, G.~Kamath, and A.T. Suresh.
+\newblock Remember what you want to forget: Algorithms for machine unlearning.
+\newblock In \emph{NeurIPS}, 2021.
+\bibitem[Tatikonda and Jordan(2002)]{tatikonda2002loopy}
+S.~Tatikonda and M.I. Jordan.
+\newblock Loopy belief propagation and {G}ibbs measures.
+\newblock In \emph{UAI}, 2002.
+\bibitem[Ullah et~al.(2021)]{ullah2021machine}
+E.~Ullah, T.~Mai, A.~Rao, R.A. Rossi, and R.~Arora.
+\newblock Machine unlearning via algorithmic stability.
+\newblock In \emph{COLT}, 2021.
+\bibitem[Zhou et~al.(2012)]{zhou2012beta}
+M.~Zhou, L.~Hannah, D.~Dunson, and L.~Carin.
+\newblock Beta-negative binomial process and {P}oisson factor analysis.
+\newblock In \emph{AISTATS}, 2012.
+\end{thebibliography}
+%% ============================================================
+%% APPENDIX
+%% ============================================================
+\newpage
+\appendix
+\section{Proof of Theorem~\ref{thm:main} (Deletion Influence Decay)}
+\label{app:proof_main}
+\begin{proof}
+Let $\blambda^\star$ denote the CAVI fixed point on the full data and $\blambda^{\setminus z}$ the fixed point on the data with observation $z = (i,j,x_{ij})$ removed.
+Let $F$ and $F^{\setminus z}$ denote the full-data and deletion CAVI maps, respectively.
+The two fixed points satisfy:
+\begin{align}
+\blambda_u^\star &= F_u(\blambda_{-u}^\star), \\
+\blambda_u^{\setminus z} &= F_u^{\setminus z}(\blambda_{-u}^{\setminus z}).
+\end{align}
+For blocks $u \notin S(z) = \{U_i, V_j\}$, the CAVI update is unchanged: $F_u = F_u^{\setminus z}$.
+Therefore:
+\begin{equation}
+\blambda_u^\star - \blambda_u^{\setminus z} = F_u(\blambda_{-u}^\star) - F_u(\blambda_{-u}^{\setminus z}).
+\end{equation}
+By the mean value theorem and the definition of the interaction coefficients:
+\begin{equation}
+\|\blambda_u^\star - \blambda_u^{\setminus z}\| \leq \sum_{v} C_{uv} \|\blambda_v^\star - \blambda_v^{\setminus z}\|.
+\end{equation}
+Define $\Delta_u = \|\blambda_u^\star - \blambda_u^{\setminus z}\|$ and $\boldsymbol{\Delta} = (\Delta_u)_u$.
+Then:
+\begin{equation}
+\boldsymbol{\Delta} \leq \mathbf{C} \boldsymbol{\Delta} + \boldsymbol{\epsilon},
+\end{equation}
+where $\boldsymbol{\epsilon}$ is nonzero only at the seed blocks (capturing the direct effect of removing $z$).
+Since $\rho(\mathbf{C}) < 1$, the matrix $(\mathbf{I} - \mathbf{C})$ is invertible with:
+\begin{equation}
+\boldsymbol{\Delta} \leq (\mathbf{I} - \mathbf{C})^{-1} \boldsymbol{\epsilon} = \sum_{t=0}^{\infty} \mathbf{C}^t \boldsymbol{\epsilon}.
+\end{equation}
+The $t$-th power $\mathbf{C}^t$ has the property that $[\mathbf{C}^t]_{uv} = 0$ unless there is a path of length $t$ from $v$ to $u$ in the interaction graph.
+Since $\boldsymbol{\epsilon}$ is supported only on $S(z)$, the contribution to block $u$ at distance $r$ from $S(z)$ first appears at $t = r$:
+\begin{equation}
+\Delta_u \leq \|\boldsymbol{\epsilon}\|_\infty \sum_{t=r}^{\infty} \|\mathbf{C}^t\|_\infty \leq \|\boldsymbol{\epsilon}\|_\infty \sum_{t=r}^{\infty} \rho(\mathbf{C})^t = \|\boldsymbol{\epsilon}\|_\infty \frac{(1-\delta)^r}{\delta}.
+\end{equation}
+Setting $C_0 = \|\boldsymbol{\epsilon}\|_\infty / \delta$ completes the proof.
+\end{proof}
+\section{Proof of Proposition~\ref{prop:pg_constants} (Interaction Constants)}
+\label{app:proof_constants}
+\begin{proof}
+Consider the CAVI update for user block $U_i$, component $k$:
+\begin{equation}
+a_{ik} = a_0 + \sum_{j' \in \Omega_i} x_{ij'} r_{ij'k}, \qquad
+b_{ik} = b_0 + \sum_{j' \in \Omega_i} \frac{c_{j'k}}{d_{j'k}}.
+\end{equation}
+The dependence on item $j$'s parameters $(c_{jk}, d_{jk})$ enters through:
+\begin{enumerate}[nosep]
+\item The responsibility $r_{ijk}$, which depends on $\psi(c_{jk}) - \log d_{jk}$;
+\item The rate update $b_{ik}$, which depends on $c_{jk}/d_{jk}$.
+\end{enumerate}
+For the shape parameter:
+\begin{equation}
+\frac{\partial a_{ik}}{\partial c_{j\ell}} = x_{ij} \frac{\partial r_{ijk}}{\partial c_{j\ell}}.
+\end{equation}
+The responsibility $r_{ijk}$ is a softmax of $\phi_{ijk} = \psi(a_{ik}) - \log b_{ik} + \psi(c_{jk}) - \log d_{jk}$.
+Differentiating:
+\begin{equation}
+\frac{\partial r_{ijk}}{\partial c_{j\ell}} = r_{ijk}(\delta_{k\ell} - r_{ij\ell}) \psi'(c_{j\ell}).
+\end{equation}
+This gives $|\partial a_{ik}/\partial c_{j\ell}| \leq x_{ij} \cdot \psi'(c_{\min})/2$ using $|r_{ijk}(\delta_{k\ell} - r_{ij\ell})| \leq 1/2$.
+For the rate parameter:
+\begin{equation}
+\frac{\partial b_{ik}}{\partial c_{jk}} = \frac{1}{d_{jk}}, \qquad
+\frac{\partial b_{ik}}{\partial d_{jk}} = -\frac{c_{jk}}{d_{jk}^2}.
+\end{equation}
+Combining via the triangle inequality and summing over components, we obtain:
+\begin{equation}
+C_{U_i, V_j} \leq \underbrace{\left(\frac{1}{2}\psi'(c_{\min}) + \frac{1}{2d_{\min}}\right)}_{C_x} x_{ij} + \underbrace{\left(\frac{1}{d_{\min}} + \frac{c_{\max}}{d_{\min}^2}\right)}_{C_0}.
+\end{equation}
+The symmetric bound for $C_{V_j, U_i}$ follows by exchanging the roles of $(a, b)$ and $(c, d)$.
+\end{proof}
+\section{Local Unlearning Algorithm}
+\label{app:algorithm}
+\begin{algorithm}[h]
+\caption{Local Radius-$R$ Unlearning for CAVI}
+\label{alg:local}
+\begin{algorithmic}[1]
+\REQUIRE Full-data parameters $\blambda^\star$, deletion $z = (i, j, x_{ij})$, radius $R$
+\STATE Compute seed set $S(z) = \{U_i, V_j\}$
+\STATE Compute $R$-hop neighborhood $\cN_R(S) = \{u : d(u, S) \leq R\}$ via BFS
+\STATE Filter edges: $E_{\mathrm{local}} = \{(i',j') \in E \setminus \{z\} : i' \in \cN_R \text{ or } j' \in \cN_R\}$
+\STATE Initialize $\blambda \leftarrow \blambda^\star$
+\REPEAT
+  \FOR{each block $u \in \cN_R(S)$}
+    \STATE $\blambda_u \leftarrow F_u^{\setminus z}(\blambda_{-u})$ \COMMENT{CAVI update using $E_{\mathrm{local}}$}
+  \ENDFOR
+\UNTIL{convergence}
+\RETURN $\blambda$
+\end{algorithmic}
+\end{algorithm}
+The computational cost per CAVI iteration is $O(|E_{\mathrm{local}}| \cdot K)$ instead of $O(|E| \cdot K)$ for full retraining.
+For bounded-degree graphs with maximum degree $d$, $|E_{\mathrm{local}}| = O(d^R)$, yielding a speedup of $O(|E| / d^R)$.
+\section{Gaussian--Gaussian Interaction Constants}
+\label{app:gaussian}
+For the Gaussian--Gaussian model $x_{ij} \sim \cN(U_i^\top V_j, \sigma_x^2)$ with priors $U_{ik} \sim \cN(0, \sigma_U^2)$, $V_{jk} \sim \cN(0, \sigma_V^2)$, the mean-field CAVI updates for the variational mean are:
+\begin{equation}
+m^U_{ik} = s^U_{ik} \cdot \frac{1}{\sigma_x^2} \sum_{j' \in \Omega_i} m^V_{j'k} \left(x_{ij'} - \sum_{\ell \neq k} m^U_{i\ell} m^V_{j'\ell}\right),
+\end{equation}
+with precision $1/s^U_{ik} = 1/\sigma_U^2 + (1/\sigma_x^2) \sum_{j' \in \Omega_i} ((m^V_{j'k})^2 + s^V_{j'k})$.
+Differentiating the update with respect to item $j$'s mean parameters:
+\begin{equation}
+\left\|\frac{\partial F_{U_i}}{\partial \blambda_{V_j}}\right\| \leq \frac{1}{\sigma_x^2} \|V_j\|_q = \frac{1}{\sigma_x^2} \sqrt{\sum_k ((m^V_{jk})^2 + s^V_{jk})},
+\end{equation}
+where the bound follows from the CAVI precision structure.
+This gives the interaction proxy:
+\begin{equation}
+\chi_i^{GG} = \sum_{j' \in \Omega_i} \frac{1}{\sigma_x^2} \E_q[\|V_{j'}\|^2].
+\end{equation}
+\section{Additional Experimental Details}
+\label{app:exp_details}
+\paragraph{Synthetic data generation.}
+For Gamma--Poisson data, we generate $U_{ik} \sim \Gam(a_0, b_0)$, $V_{jk} \sim \Gam(c_0, d_0)$, and $x_{ij} \sim \Poi(\rho_x U_i^\top V_j)$ for each observed edge.
+Zero counts are removed (positive counts only).
+Graph families: bounded-degree (Poisson degree truncated at $2d_{\max}$), Erd\H{o}s--R\'enyi ($p = d_{\mathrm{avg}}/M$), power-law (degree $\propto k^{-\alpha+1}$ with $\alpha = 2.5$).
+\paragraph{Deletion sampling.}
+We sample 25\% each of: random edges, high-count edges (top-quartile by $x_{ij}$), hub-adjacent edges (top-quartile by $\max(\deg(i), \deg(j))$), and low-degree edges (bottom-quartile by $\min(\deg(i), \deg(j))$).
+\paragraph{Convergence.}
+CAVI: relative parameter change $< 10^{-5}$, max 200--300 iterations.
+GGM MAP: Adam with $\beta_1 = 0.9$, $\beta_2 = 0.999$, lr $= 0.05$, gradient clipping at 10, max 2000 iterations.
+\paragraph{Bootstrap confidence intervals.}
+All reported confidence intervals are 95\% bootstrap CIs with 1000 resamples.
+\paragraph{Datasets.}
+\begin{itemize}[nosep]
+\item Last.fm: \texttt{matthewfranglen/lastfm-1k} on Hugging Face. 2M rows randomly sampled, aggregated to user--artist counts, capped at 50, filtered to $\geq 5$ degree.
+\item MovieLens: \texttt{ashraq/movielens\_ratings} on Hugging Face. Rating count: $x_{ij} = \lceil \mathrm{rating} \rceil$. Binary: $x_{ij} = 1$.
+\end{itemize}
+\section{Full Sanity Check Results}
+\label{app:sanity}
+\begin{table}[h]
+\centering
+\caption{Sanity check pass rates across 270 verified deletions.}
+\small
+\begin{tabular}{lcc}
+\toprule
+Check & PG (198) & All (270) \\
+\midrule
+Parameters positive & 100\% & -- \\
+No NaN & 100\% & -- \\
+Responsibilities $\sum_k = 1$ & 100\% & -- \\
+ELBO finite & 100\% & -- \\
+Exact $\neq$ full & 88\% & 90\% \\
+Error $\downarrow$ with $R$ & 90\% & 90\% \\
+Warm-start $\approx$ large-$R$ & 91\% & 91\% \\
+\bottomrule
+\end{tabular}
+\end{table}
+The 10\% of error-monotonicity violations occur predominantly in the Gaussian--Gamma MAP model (non-convex optimization) and in high-coupling regimes where the Dobrushin condition is marginal.
+\end{document}