{% extends "base.html" %} {% block title %}DP-SGD Explorer - Learning Hub{% endblock %} {% block content %}
Differential Privacy (DP) is a mathematical framework that provides strong privacy guarantees when performing analyses on sensitive data. It ensures that the presence or absence of any single individual's data has a minimal effect on the output of an analysis.
Traditional anonymization techniques often fail to protect privacy. With enough auxiliary information, it's possible to re-identify individuals in supposedly "anonymized" datasets. Differential privacy addresses this by adding carefully calibrated noise to the analysis process.
Differential privacy creates plausible deniability. By adding controlled noise, it becomes mathematically impossible to confidently determine whether any individual's data was used in the analysis.
There's an inherent trade-off between privacy and utility (accuracy) in DP. More privacy means more noise, which typically reduces accuracy. The challenge is finding the right balance for your specific application.
A mechanism M is (ε,δ)-differentially private if for all neighboring datasets D and D' (differing in one record), and for all possible outputs S:
ε (epsilon): The privacy budget. Lower values mean stronger privacy but typically lower utility.
δ (delta): The probability of the privacy guarantee being broken. Usually set very small (e.g., 10^-5).
Laplace Mechanism: Adds noise from a Laplace distribution to numeric queries.
Gaussian Mechanism: Adds noise from a Gaussian (normal) distribution. This is used in DP-SGD.
Exponential Mechanism: Used for non-numeric outputs, selects an output based on a probability distribution.
When you apply multiple differentially private operations, the privacy loss (ε) accumulates. This is known as composition.
Advanced composition theorems and privacy accountants help track the total privacy spend.
Stochastic Gradient Descent (SGD) is an optimization algorithm used to train machine learning models by iteratively updating parameters based on gradients computed from mini-batches of data.
The standard SGD update for a batch B is:
Where:
Standard SGD can leak information about individual training examples through the gradients. For example:
These privacy concerns motivate the need for differentially private training methods.
Differentially Private SGD modifies standard SGD in two key ways:
Compute gradients for each example individually, then clip their L2 norm to a threshold C.
This limits the influence of any single training example on the model update.
Add Gaussian noise to the sum of clipped gradients before applying the update.
The noise scale is proportional to the clipping threshold and the noise multiplier.
The DP-SGD update can be summarized as:
Where:
DP-SGD introduces several new hyperparameters that need to be tuned carefully:
The maximum allowed L2 norm for any individual gradient.
Controls the amount of noise added to the gradients.
Affects both training dynamics and privacy accounting.
May need adjustment compared to non-private training.
More epochs consume more privacy budget.
Privacy accounting is the process of keeping track of the total privacy loss (ε) throughout training.
Used in the original DP-SGD paper, provides tight bounds on the privacy loss.
Tracks the moments of the privacy loss random variable.
Alternative accounting method based on Rényi divergence.
Often used in modern implementations like TensorFlow Privacy and Opacus.
Simpler method for specific mechanisms like the Gaussian Mechanism.
Less tight bounds but easier to compute.
With a fixed privacy budget (ε), you must decide how to allocate it:
In practice, privacy accounting is handled by libraries like: