|
|
{% extends "base.html" %} |
|
|
|
|
|
{% block title %}DP-SGD Explorer - Learning Hub{% endblock %} |
|
|
|
|
|
{% block content %} |
|
|
<h1 class="section-title">Learning Hub</h1> |
|
|
|
|
|
<div class="learning-container"> |
|
|
<div class="learning-sidebar"> |
|
|
<h2 class="panel-title">DP-SGD Concepts</h2> |
|
|
<ul class="learning-steps"> |
|
|
<li class="learning-step active" data-step="intro">Introduction to Differential Privacy</li> |
|
|
<li class="learning-step" data-step="dp-concepts">Core DP Concepts</li> |
|
|
<li class="learning-step" data-step="sgd-basics">SGD Refresher</li> |
|
|
<li class="learning-step" data-step="dpsgd-intro">DP-SGD: Core Modifications</li> |
|
|
<li class="learning-step" data-step="parameters">Hyperparameter Deep Dive</li> |
|
|
<li class="learning-step" data-step="privacy-accounting">Privacy Accounting</li> |
|
|
</ul> |
|
|
</div> |
|
|
|
|
|
<div class="learning-content"> |
|
|
<div id="intro-content" class="step-content active"> |
|
|
<h2>Introduction to Differential Privacy</h2> |
|
|
<p>Differential Privacy (DP) is a mathematical framework that provides strong privacy guarantees when performing analyses on sensitive data. It ensures that the presence or absence of any single individual's data has a minimal effect on the output of an analysis.</p> |
|
|
|
|
|
<h3>Why is Differential Privacy Important?</h3> |
|
|
<p>Traditional anonymization techniques often fail to protect privacy. With enough auxiliary information, it's possible to re-identify individuals in supposedly "anonymized" datasets. Differential privacy addresses this by adding carefully calibrated noise to the analysis process.</p> |
|
|
|
|
|
<div class="concept-highlight"> |
|
|
<h4>Key Insight</h4> |
|
|
<p>Differential privacy creates plausible deniability. By adding controlled noise, it becomes mathematically impossible to confidently determine whether any individual's data was used in the analysis.</p> |
|
|
</div> |
|
|
|
|
|
<h3>The Privacy-Utility Trade-off</h3> |
|
|
<p>There's an inherent trade-off between privacy and utility (accuracy) in DP. More privacy means more noise, which typically reduces accuracy. The challenge is finding the right balance for your specific application.</p> |
|
|
|
|
|
<div class="concept-box"> |
|
|
<div class="box1"> |
|
|
<h4>Strong Privacy (Low ε)</h4> |
|
|
<ul> |
|
|
<li>More noise added</li> |
|
|
<li>Lower accuracy</li> |
|
|
<li>Better protection for sensitive data</li> |
|
|
</ul> |
|
|
</div> |
|
|
<div class="box2"> |
|
|
<h4>Strong Utility (Higher ε)</h4> |
|
|
<ul> |
|
|
<li>Less noise added</li> |
|
|
<li>Higher accuracy</li> |
|
|
<li>Reduced privacy guarantees</li> |
|
|
</ul> |
|
|
</div> |
|
|
</div> |
|
|
</div> |
|
|
|
|
|
<div id="dp-concepts-content" class="step-content"> |
|
|
<h2>Core Differential Privacy Concepts</h2> |
|
|
|
|
|
<h3>The Formal Definition</h3> |
|
|
<p>A mechanism M is (ε,δ)-differentially private if for all neighboring datasets D and D' (differing in one record), and for all possible outputs S:</p> |
|
|
<div class="formula"> |
|
|
P(M(D) ∈ S) ≤ e^ε × P(M(D') ∈ S) + δ |
|
|
</div> |
|
|
|
|
|
<h3>Key Parameters</h3> |
|
|
<p><strong>ε (epsilon)</strong>: The privacy budget. Lower values mean stronger privacy but typically lower utility.</p> |
|
|
<p><strong>δ (delta)</strong>: The probability of the privacy guarantee being broken. Usually set very small (e.g., 10^-5).</p> |
|
|
|
|
|
<h3>Differential Privacy Mechanisms</h3> |
|
|
<p><strong>Laplace Mechanism</strong>: Adds noise from a Laplace distribution to numeric queries.</p> |
|
|
<p><strong>Gaussian Mechanism</strong>: Adds noise from a Gaussian (normal) distribution. This is used in DP-SGD.</p> |
|
|
<p><strong>Exponential Mechanism</strong>: Used for non-numeric outputs, selects an output based on a probability distribution.</p> |
|
|
|
|
|
<h3>Privacy Accounting</h3> |
|
|
<p>When you apply multiple differentially private operations, the privacy loss (ε) accumulates. This is known as composition.</p> |
|
|
<p>Advanced composition theorems and privacy accountants help track the total privacy spend.</p> |
|
|
</div> |
|
|
|
|
|
<div id="sgd-basics-content" class="step-content"> |
|
|
<h2>Stochastic Gradient Descent Refresher</h2> |
|
|
|
|
|
<h3>Standard SGD</h3> |
|
|
<p>Stochastic Gradient Descent (SGD) is an optimization algorithm used to train machine learning models by iteratively updating parameters based on gradients computed from mini-batches of data.</p> |
|
|
|
|
|
<h3>The Basic Update Rule</h3> |
|
|
<p>The standard SGD update for a batch B is:</p> |
|
|
<div class="formula"> |
|
|
θ ← θ - η∇L(θ; B) |
|
|
</div> |
|
|
<p>Where:</p> |
|
|
<ul> |
|
|
<li>θ represents the model parameters</li> |
|
|
<li>η is the learning rate</li> |
|
|
<li>∇L(θ; B) is the average gradient of the loss over the batch B</li> |
|
|
</ul> |
|
|
|
|
|
<h3>Privacy Concerns with Standard SGD</h3> |
|
|
<p>Standard SGD can leak information about individual training examples through the gradients. For example:</p> |
|
|
<ul> |
|
|
<li>Gradients might be larger for outliers or unusual examples</li> |
|
|
<li>Model memorization of sensitive data can be extracted through attacks</li> |
|
|
<li>Gradient values can be used in reconstruction attacks</li> |
|
|
</ul> |
|
|
|
|
|
<p>These privacy concerns motivate the need for differentially private training methods.</p> |
|
|
</div> |
|
|
|
|
|
<div id="dpsgd-intro-content" class="step-content"> |
|
|
<h2>DP-SGD: Core Modifications</h2> |
|
|
|
|
|
<h3>How DP-SGD Differs from Standard SGD</h3> |
|
|
<p>Differentially Private SGD modifies standard SGD in two key ways:</p> |
|
|
|
|
|
<div class="concept-box"> |
|
|
<div class="box1"> |
|
|
<h4>1. Per-Sample Gradient Clipping</h4> |
|
|
<p>Compute gradients for each example individually, then clip their L2 norm to a threshold C.</p> |
|
|
<p>This limits the influence of any single training example on the model update.</p> |
|
|
</div> |
|
|
|
|
|
<div class="box2"> |
|
|
<h4>2. Noise Addition</h4> |
|
|
<p>Add Gaussian noise to the sum of clipped gradients before applying the update.</p> |
|
|
<p>The noise scale is proportional to the clipping threshold and the noise multiplier.</p> |
|
|
</div> |
|
|
</div> |
|
|
|
|
|
<h3>The DP-SGD Update Rule</h3> |
|
|
<p>The DP-SGD update can be summarized as:</p> |
|
|
<ol> |
|
|
<li>Compute per-sample gradients: g<sub>i</sub> = ∇L(θ; x<sub>i</sub>)</li> |
|
|
<li>Clip each gradient: g̃<sub>i</sub> = g<sub>i</sub> × min(1, C/||g<sub>i</sub>||<sub>2</sub>)</li> |
|
|
<li>Add noise: ḡ = (1/|B|) × (∑g̃<sub>i</sub> + N(0, σ²C²I))</li> |
|
|
<li>Update parameters: θ ← θ - η × ḡ</li> |
|
|
</ol> |
|
|
|
|
|
<p>Where:</p> |
|
|
<ul> |
|
|
<li>C is the clipping norm</li> |
|
|
<li>σ is the noise multiplier</li> |
|
|
<li>B is the batch</li> |
|
|
</ul> |
|
|
</div> |
|
|
|
|
|
<div id="parameters-content" class="step-content"> |
|
|
<h2>Hyperparameter Deep Dive</h2> |
|
|
|
|
|
<p>DP-SGD introduces several new hyperparameters that need to be tuned carefully:</p> |
|
|
|
|
|
<h3>Clipping Norm (C)</h3> |
|
|
<p>The maximum allowed L2 norm for any individual gradient.</p> |
|
|
<ul> |
|
|
<li><strong>Too small:</strong> Gradients are over-clipped, limiting learning</li> |
|
|
<li><strong>Too large:</strong> Requires more noise to achieve the same privacy guarantee</li> |
|
|
<li><strong>Typical range:</strong> 0.1 to 10.0, depending on the dataset and model</li> |
|
|
</ul> |
|
|
|
|
|
<h3>Noise Multiplier (σ)</h3> |
|
|
<p>Controls the amount of noise added to the gradients.</p> |
|
|
<ul> |
|
|
<li><strong>Higher σ:</strong> Better privacy, worse utility</li> |
|
|
<li><strong>Lower σ:</strong> Better utility, worse privacy</li> |
|
|
<li><strong>Typical range:</strong> 0.5 to 2.0 for most practical applications</li> |
|
|
</ul> |
|
|
|
|
|
<h3>Batch Size</h3> |
|
|
<p>Affects both training dynamics and privacy accounting.</p> |
|
|
<ul> |
|
|
<li><strong>Larger batches:</strong> Reduce variance from noise, but change sampling probability</li> |
|
|
<li><strong>Smaller batches:</strong> More update steps, potentially consuming more privacy budget</li> |
|
|
<li><strong>Typical range:</strong> 64 to 1024, larger than standard SGD</li> |
|
|
</ul> |
|
|
|
|
|
<h3>Learning Rate (η)</h3> |
|
|
<p>May need adjustment compared to non-private training.</p> |
|
|
<ul> |
|
|
<li><strong>DP-SGD often requires:</strong> Lower learning rates or careful scheduling</li> |
|
|
<li><strong>Reason:</strong> Added noise can destabilize training with high learning rates</li> |
|
|
</ul> |
|
|
|
|
|
<h3>Number of Epochs</h3> |
|
|
<p>More epochs consume more privacy budget.</p> |
|
|
<ul> |
|
|
<li><strong>Trade-off:</strong> More training vs. privacy budget consumption</li> |
|
|
<li><strong>Early stopping:</strong> Often beneficial for balancing accuracy and privacy</li> |
|
|
</ul> |
|
|
</div> |
|
|
|
|
|
<div id="privacy-accounting-content" class="step-content"> |
|
|
<h2>Privacy Accounting</h2> |
|
|
|
|
|
<h3>Tracking Privacy Budget</h3> |
|
|
<p>Privacy accounting is the process of keeping track of the total privacy loss (ε) throughout training.</p> |
|
|
|
|
|
<h3>Common Methods</h3> |
|
|
<div style="display: flex; flex-direction: column; gap: 15px; margin: 15px 0;"> |
|
|
<div class="concept-highlight"> |
|
|
<h4>Moment Accountant</h4> |
|
|
<p>Used in the original DP-SGD paper, provides tight bounds on the privacy loss.</p> |
|
|
<p>Tracks the moments of the privacy loss random variable.</p> |
|
|
</div> |
|
|
|
|
|
<div class="concept-highlight"> |
|
|
<h4>Rényi Differential Privacy (RDP)</h4> |
|
|
<p>Alternative accounting method based on Rényi divergence.</p> |
|
|
<p>Often used in modern implementations like TensorFlow Privacy and Opacus.</p> |
|
|
</div> |
|
|
|
|
|
<div class="concept-highlight"> |
|
|
<h4>Analytical Gaussian Mechanism</h4> |
|
|
<p>Simpler method for specific mechanisms like the Gaussian Mechanism.</p> |
|
|
<p>Less tight bounds but easier to compute.</p> |
|
|
</div> |
|
|
</div> |
|
|
|
|
|
<h3>Privacy Budget Allocation</h3> |
|
|
<p>With a fixed privacy budget (ε), you must decide how to allocate it:</p> |
|
|
<ul> |
|
|
<li><strong>Fixed noise, variable epochs:</strong> Set noise level, train until budget is exhausted</li> |
|
|
<li><strong>Fixed epochs, variable noise:</strong> Set desired epochs, calculate required noise</li> |
|
|
<li><strong>Advanced techniques:</strong> Privacy filters, odometers, and adaptive mechanisms</li> |
|
|
</ul> |
|
|
|
|
|
<h3>Practical Implementation</h3> |
|
|
<p>In practice, privacy accounting is handled by libraries like:</p> |
|
|
<ul> |
|
|
<li>TensorFlow Privacy</li> |
|
|
<li>PyTorch Opacus</li> |
|
|
<li>Diffprivlib (IBM)</li> |
|
|
</ul> |
|
|
</div> |
|
|
</div> |
|
|
</div> |
|
|
|
|
|
{% endblock %} |
|
|
|
|
|
{% block extra_scripts %} |
|
|
<script> |
|
|
document.addEventListener('DOMContentLoaded', () => { |
|
|
const steps = document.querySelectorAll('.learning-step'); |
|
|
steps.forEach(step => { |
|
|
step.addEventListener('click', () => { |
|
|
|
|
|
steps.forEach(s => s.classList.remove('active')); |
|
|
|
|
|
step.classList.add('active'); |
|
|
|
|
|
|
|
|
document.querySelectorAll('.step-content').forEach(content => { |
|
|
content.classList.remove('active'); |
|
|
}); |
|
|
|
|
|
|
|
|
const stepName = step.getAttribute('data-step'); |
|
|
document.getElementById(`${stepName}-content`).classList.add('active'); |
|
|
|
|
|
|
|
|
if (typeof track === 'function') { |
|
|
track('learning_step_open', { step: stepName }); |
|
|
} |
|
|
}); |
|
|
}); |
|
|
}); |
|
|
</script> |
|
|
{% endblock %} |
|
|
|
|
|
|