Spaces:

Jethro85
/

DPSGDTool

Running

DPSGDTool / app /templates /learning.html

Jethro85

log to supabase

bdb2e55 24 days ago

13.6 kB

	{% extends "base.html" %}

	{% block title %}DP-SGD Explorer - Learning Hub{% endblock %}

	{% block content %}
	<h1 class="section-title">Learning Hub</h1>

	<div class="learning-container">
	<div class="learning-sidebar">
	<h2 class="panel-title">DP-SGD Concepts</h2>
	<ul class="learning-steps">
	<li class="learning-step active" data-step="intro">Introduction to Differential Privacy</li>
	<li class="learning-step" data-step="dp-concepts">Core DP Concepts</li>
	<li class="learning-step" data-step="sgd-basics">SGD Refresher</li>
	<li class="learning-step" data-step="dpsgd-intro">DP-SGD: Core Modifications</li>
	<li class="learning-step" data-step="parameters">Hyperparameter Deep Dive</li>
	<li class="learning-step" data-step="privacy-accounting">Privacy Accounting</li>
	</ul>
	</div>

	<div class="learning-content">
	<div id="intro-content" class="step-content active">
	<h2>Introduction to Differential Privacy</h2>
	<p>Differential Privacy (DP) is a mathematical framework that provides strong privacy guarantees when performing analyses on sensitive data. It ensures that the presence or absence of any single individual's data has a minimal effect on the output of an analysis.</p>

	<h3>Why is Differential Privacy Important?</h3>
	<p>Traditional anonymization techniques often fail to protect privacy. With enough auxiliary information, it's possible to re-identify individuals in supposedly "anonymized" datasets. Differential privacy addresses this by adding carefully calibrated noise to the analysis process.</p>

	<div class="concept-highlight">
	<h4>Key Insight</h4>
	<p>Differential privacy creates plausible deniability. By adding controlled noise, it becomes mathematically impossible to confidently determine whether any individual's data was used in the analysis.</p>
	</div>

	<h3>The Privacy-Utility Trade-off</h3>
	<p>There's an inherent trade-off between privacy and utility (accuracy) in DP. More privacy means more noise, which typically reduces accuracy. The challenge is finding the right balance for your specific application.</p>

	<div class="concept-box">
	<div class="box1">
	<h4>Strong Privacy (Low ε)</h4>
	<ul>
	<li>More noise added</li>
	<li>Lower accuracy</li>
	<li>Better protection for sensitive data</li>
	</ul>
	</div>
	<div class="box2">
	<h4>Strong Utility (Higher ε)</h4>
	<ul>
	<li>Less noise added</li>
	<li>Higher accuracy</li>
	<li>Reduced privacy guarantees</li>
	</ul>
	</div>
	</div>
	</div>

	<div id="dp-concepts-content" class="step-content">
	<h2>Core Differential Privacy Concepts</h2>

	<h3>The Formal Definition</h3>
	<p>A mechanism M is (ε,δ)-differentially private if for all neighboring datasets D and D' (differing in one record), and for all possible outputs S:</p>
	<div class="formula">
	P(M(D) ∈ S) ≤ e^ε × P(M(D') ∈ S) + δ
	</div>

	<h3>Key Parameters</h3>
	<p><strong>ε (epsilon)</strong>: The privacy budget. Lower values mean stronger privacy but typically lower utility.</p>
	<p><strong>δ (delta)</strong>: The probability of the privacy guarantee being broken. Usually set very small (e.g., 10^-5).</p>

	<h3>Differential Privacy Mechanisms</h3>
	<p><strong>Laplace Mechanism</strong>: Adds noise from a Laplace distribution to numeric queries.</p>
	<p><strong>Gaussian Mechanism</strong>: Adds noise from a Gaussian (normal) distribution. This is used in DP-SGD.</p>
	<p><strong>Exponential Mechanism</strong>: Used for non-numeric outputs, selects an output based on a probability distribution.</p>

	<h3>Privacy Accounting</h3>
	<p>When you apply multiple differentially private operations, the privacy loss (ε) accumulates. This is known as composition.</p>
	<p>Advanced composition theorems and privacy accountants help track the total privacy spend.</p>
	</div>

	<div id="sgd-basics-content" class="step-content">
	<h2>Stochastic Gradient Descent Refresher</h2>

	<h3>Standard SGD</h3>
	<p>Stochastic Gradient Descent (SGD) is an optimization algorithm used to train machine learning models by iteratively updating parameters based on gradients computed from mini-batches of data.</p>

	<h3>The Basic Update Rule</h3>
	<p>The standard SGD update for a batch B is:</p>
	<div class="formula">
	θ ← θ - η∇L(θ; B)
	</div>
	<p>Where:</p>
	<ul>
	<li>θ represents the model parameters</li>
	<li>η is the learning rate</li>
	<li>∇L(θ; B) is the average gradient of the loss over the batch B</li>
	</ul>

	<h3>Privacy Concerns with Standard SGD</h3>
	<p>Standard SGD can leak information about individual training examples through the gradients. For example:</p>
	<ul>
	<li>Gradients might be larger for outliers or unusual examples</li>
	<li>Model memorization of sensitive data can be extracted through attacks</li>
	<li>Gradient values can be used in reconstruction attacks</li>
	</ul>

	<p>These privacy concerns motivate the need for differentially private training methods.</p>
	</div>

	<div id="dpsgd-intro-content" class="step-content">
	<h2>DP-SGD: Core Modifications</h2>

	<h3>How DP-SGD Differs from Standard SGD</h3>
	<p>Differentially Private SGD modifies standard SGD in two key ways:</p>

	<div class="concept-box">
	<div class="box1">
	<h4>1. Per-Sample Gradient Clipping</h4>
	<p>Compute gradients for each example individually, then clip their L2 norm to a threshold C.</p>
	<p>This limits the influence of any single training example on the model update.</p>
	</div>

	<div class="box2">
	<h4>2. Noise Addition</h4>
	<p>Add Gaussian noise to the sum of clipped gradients before applying the update.</p>
	<p>The noise scale is proportional to the clipping threshold and the noise multiplier.</p>
	</div>
	</div>

	<h3>The DP-SGD Update Rule</h3>
	<p>The DP-SGD update can be summarized as:</p>
	<ol>
	<li>Compute per-sample gradients: g<sub>i</sub> = ∇L(θ; x<sub>i</sub>)</li>
	<li>Clip each gradient: g̃<sub>i</sub> = g<sub>i</sub> × min(1, C/\|\|g<sub>i</sub>\|\|<sub>2</sub>)</li>
	<li>Add noise: ḡ = (1/\|B\|) × (∑g̃<sub>i</sub> + N(0, σ²C²I))</li>
	<li>Update parameters: θ ← θ - η × ḡ</li>
	</ol>

	<p>Where:</p>
	<ul>
	<li>C is the clipping norm</li>
	<li>σ is the noise multiplier</li>
	<li>B is the batch</li>
	</ul>
	</div>

	<div id="parameters-content" class="step-content">
	<h2>Hyperparameter Deep Dive</h2>

	<p>DP-SGD introduces several new hyperparameters that need to be tuned carefully:</p>

	<h3>Clipping Norm (C)</h3>
	<p>The maximum allowed L2 norm for any individual gradient.</p>
	<ul>
	<li><strong>Too small:</strong> Gradients are over-clipped, limiting learning</li>
	<li><strong>Too large:</strong> Requires more noise to achieve the same privacy guarantee</li>
	<li><strong>Typical range:</strong> 0.1 to 10.0, depending on the dataset and model</li>
	</ul>

	<h3>Noise Multiplier (σ)</h3>
	<p>Controls the amount of noise added to the gradients.</p>
	<ul>
	<li><strong>Higher σ:</strong> Better privacy, worse utility</li>
	<li><strong>Lower σ:</strong> Better utility, worse privacy</li>
	<li><strong>Typical range:</strong> 0.5 to 2.0 for most practical applications</li>
	</ul>

	<h3>Batch Size</h3>
	<p>Affects both training dynamics and privacy accounting.</p>
	<ul>
	<li><strong>Larger batches:</strong> Reduce variance from noise, but change sampling probability</li>
	<li><strong>Smaller batches:</strong> More update steps, potentially consuming more privacy budget</li>
	<li><strong>Typical range:</strong> 64 to 1024, larger than standard SGD</li>
	</ul>

	<h3>Learning Rate (η)</h3>
	<p>May need adjustment compared to non-private training.</p>
	<ul>
	<li><strong>DP-SGD often requires:</strong> Lower learning rates or careful scheduling</li>
	<li><strong>Reason:</strong> Added noise can destabilize training with high learning rates</li>
	</ul>

	<h3>Number of Epochs</h3>
	<p>More epochs consume more privacy budget.</p>
	<ul>
	<li><strong>Trade-off:</strong> More training vs. privacy budget consumption</li>
	<li><strong>Early stopping:</strong> Often beneficial for balancing accuracy and privacy</li>
	</ul>
	</div>

	<div id="privacy-accounting-content" class="step-content">
	<h2>Privacy Accounting</h2>

	<h3>Tracking Privacy Budget</h3>
	<p>Privacy accounting is the process of keeping track of the total privacy loss (ε) throughout training.</p>

	<h3>Common Methods</h3>
	<div style="display: flex; flex-direction: column; gap: 15px; margin: 15px 0;">
	<div class="concept-highlight">
	<h4>Moment Accountant</h4>
	<p>Used in the original DP-SGD paper, provides tight bounds on the privacy loss.</p>
	<p>Tracks the moments of the privacy loss random variable.</p>
	</div>

	<div class="concept-highlight">
	<h4>Rényi Differential Privacy (RDP)</h4>
	<p>Alternative accounting method based on Rényi divergence.</p>
	<p>Often used in modern implementations like TensorFlow Privacy and Opacus.</p>
	</div>

	<div class="concept-highlight">
	<h4>Analytical Gaussian Mechanism</h4>
	<p>Simpler method for specific mechanisms like the Gaussian Mechanism.</p>
	<p>Less tight bounds but easier to compute.</p>
	</div>
	</div>

	<h3>Privacy Budget Allocation</h3>
	<p>With a fixed privacy budget (ε), you must decide how to allocate it:</p>
	<ul>
	<li><strong>Fixed noise, variable epochs:</strong> Set noise level, train until budget is exhausted</li>
	<li><strong>Fixed epochs, variable noise:</strong> Set desired epochs, calculate required noise</li>
	<li><strong>Advanced techniques:</strong> Privacy filters, odometers, and adaptive mechanisms</li>
	</ul>

	<h3>Practical Implementation</h3>
	<p>In practice, privacy accounting is handled by libraries like:</p>
	<ul>
	<li>TensorFlow Privacy</li>
	<li>PyTorch Opacus</li>
	<li>Diffprivlib (IBM)</li>
	</ul>
	</div>
	</div>
	</div>

	{% endblock %}

	{% block extra_scripts %}
	<script>
	document.addEventListener('DOMContentLoaded', () => {
	const steps = document.querySelectorAll('.learning-step');
	steps.forEach(step => {
	step.addEventListener('click', () => {
	// Remove active class from all steps
	steps.forEach(s => s.classList.remove('active'));
	// Add active class to clicked step
	step.classList.add('active');

	// Hide all content
	document.querySelectorAll('.step-content').forEach(content => {
	content.classList.remove('active');
	});

	// Show selected content
	const stepName = step.getAttribute('data-step');
	document.getElementById(`${stepName}-content`).classList.add('active');

	// ✅ Analytics
	if (typeof track === 'function') {
	track('learning_step_open', { step: stepName });
	}
	});
	});
	});
	</script>
	{% endblock %}