Spaces:

Jethro85
/

DPSGDTool

Running on T4

File size: 13,552 Bytes

{% extends "base.html" %}

{% block title %}DP-SGD Explorer - Learning Hub{% endblock %}

{% block content %}
<h1 class="section-title">Learning Hub</h1>

<div class="learning-container">
    <div class="learning-sidebar">
        <h2 class="panel-title">DP-SGD Concepts</h2>
        <ul class="learning-steps">
            <li class="learning-step active" data-step="intro">Introduction to Differential Privacy</li>
            <li class="learning-step" data-step="dp-concepts">Core DP Concepts</li>
            <li class="learning-step" data-step="sgd-basics">SGD Refresher</li>
            <li class="learning-step" data-step="dpsgd-intro">DP-SGD: Core Modifications</li>
            <li class="learning-step" data-step="parameters">Hyperparameter Deep Dive</li>
            <li class="learning-step" data-step="privacy-accounting">Privacy Accounting</li>
        </ul>
    </div>
    
    <div class="learning-content">
        <div id="intro-content" class="step-content active">
            <h2>Introduction to Differential Privacy</h2>
            <p>Differential Privacy (DP) is a mathematical framework that provides strong privacy guarantees when performing analyses on sensitive data. It ensures that the presence or absence of any single individual's data has a minimal effect on the output of an analysis.</p>
            
            <h3>Why is Differential Privacy Important?</h3>
            <p>Traditional anonymization techniques often fail to protect privacy. With enough auxiliary information, it's possible to re-identify individuals in supposedly "anonymized" datasets. Differential privacy addresses this by adding carefully calibrated noise to the analysis process.</p>
            
            <div class="concept-highlight">
                <h4>Key Insight</h4>
                <p>Differential privacy creates plausible deniability. By adding controlled noise, it becomes mathematically impossible to confidently determine whether any individual's data was used in the analysis.</p>
            </div>
            
            <h3>The Privacy-Utility Trade-off</h3>
            <p>There's an inherent trade-off between privacy and utility (accuracy) in DP. More privacy means more noise, which typically reduces accuracy. The challenge is finding the right balance for your specific application.</p>
            
            <div class="concept-box">
                <div class="box1">
                    <h4>Strong Privacy (Low ε)</h4>
                    <ul>
                        <li>More noise added</li>
                        <li>Lower accuracy</li>
                        <li>Better protection for sensitive data</li>
                    </ul>
                </div>
                <div class="box2">
                    <h4>Strong Utility (Higher ε)</h4>
                    <ul>
                        <li>Less noise added</li>
                        <li>Higher accuracy</li>
                        <li>Reduced privacy guarantees</li>
                    </ul>
                </div>
            </div>
        </div>
        
        <div id="dp-concepts-content" class="step-content">
            <h2>Core Differential Privacy Concepts</h2>
            
            <h3>The Formal Definition</h3>
            <p>A mechanism M is (ε,δ)-differentially private if for all neighboring datasets D and D' (differing in one record), and for all possible outputs S:</p>
            <div class="formula">
                P(M(D) ∈ S) ≤ e^ε × P(M(D') ∈ S) + δ
            </div>
            
            <h3>Key Parameters</h3>
            <p><strong>ε (epsilon)</strong>: The privacy budget. Lower values mean stronger privacy but typically lower utility.</p>
            <p><strong>δ (delta)</strong>: The probability of the privacy guarantee being broken. Usually set very small (e.g., 10^-5).</p>
            
            <h3>Differential Privacy Mechanisms</h3>
            <p><strong>Laplace Mechanism</strong>: Adds noise from a Laplace distribution to numeric queries.</p>
            <p><strong>Gaussian Mechanism</strong>: Adds noise from a Gaussian (normal) distribution. This is used in DP-SGD.</p>
            <p><strong>Exponential Mechanism</strong>: Used for non-numeric outputs, selects an output based on a probability distribution.</p>
            
            <h3>Privacy Accounting</h3>
            <p>When you apply multiple differentially private operations, the privacy loss (ε) accumulates. This is known as composition.</p>
            <p>Advanced composition theorems and privacy accountants help track the total privacy spend.</p>
        </div>
        
        <div id="sgd-basics-content" class="step-content">
            <h2>Stochastic Gradient Descent Refresher</h2>
            
            <h3>Standard SGD</h3>
            <p>Stochastic Gradient Descent (SGD) is an optimization algorithm used to train machine learning models by iteratively updating parameters based on gradients computed from mini-batches of data.</p>
            
            <h3>The Basic Update Rule</h3>
            <p>The standard SGD update for a batch B is:</p>
            <div class="formula">
                θ ← θ - η∇L(θ; B)
            </div>
            <p>Where:</p>
            <ul>
                <li>θ represents the model parameters</li>
                <li>η is the learning rate</li>
                <li>∇L(θ; B) is the average gradient of the loss over the batch B</li>
            </ul>
            
            <h3>Privacy Concerns with Standard SGD</h3>
            <p>Standard SGD can leak information about individual training examples through the gradients. For example:</p>
            <ul>
                <li>Gradients might be larger for outliers or unusual examples</li>
                <li>Model memorization of sensitive data can be extracted through attacks</li>
                <li>Gradient values can be used in reconstruction attacks</li>
            </ul>
            
            <p>These privacy concerns motivate the need for differentially private training methods.</p>
        </div>
        
        <div id="dpsgd-intro-content" class="step-content">
            <h2>DP-SGD: Core Modifications</h2>
            
            <h3>How DP-SGD Differs from Standard SGD</h3>
            <p>Differentially Private SGD modifies standard SGD in two key ways:</p>
            
            <div class="concept-box">
                <div class="box1">
                    <h4>1. Per-Sample Gradient Clipping</h4>
                    <p>Compute gradients for each example individually, then clip their L2 norm to a threshold C.</p>
                    <p>This limits the influence of any single training example on the model update.</p>
                </div>
                
                <div class="box2">
                    <h4>2. Noise Addition</h4>
                    <p>Add Gaussian noise to the sum of clipped gradients before applying the update.</p>
                    <p>The noise scale is proportional to the clipping threshold and the noise multiplier.</p>
                </div>
            </div>
            
            <h3>The DP-SGD Update Rule</h3>
            <p>The DP-SGD update can be summarized as:</p>
            <ol>
                <li>Compute per-sample gradients: g<sub>i</sub> = ∇L(θ; x<sub>i</sub>)</li>
                <li>Clip each gradient: g̃<sub>i</sub> = g<sub>i</sub> × min(1, C/||g<sub>i</sub>||<sub>2</sub>)</li>
                <li>Add noise: ḡ = (1/|B|) × (∑g̃<sub>i</sub> + N(0, σ²C²I))</li>
                <li>Update parameters: θ ← θ - η × ḡ</li>
            </ol>
            
            <p>Where:</p>
            <ul>
                <li>C is the clipping norm</li>
                <li>σ is the noise multiplier</li>
                <li>B is the batch</li>
            </ul>
        </div>
        
        <div id="parameters-content" class="step-content">
            <h2>Hyperparameter Deep Dive</h2>
            
            <p>DP-SGD introduces several new hyperparameters that need to be tuned carefully:</p>
            
            <h3>Clipping Norm (C)</h3>
            <p>The maximum allowed L2 norm for any individual gradient.</p>
            <ul>
                <li><strong>Too small:</strong> Gradients are over-clipped, limiting learning</li>
                <li><strong>Too large:</strong> Requires more noise to achieve the same privacy guarantee</li>
                <li><strong>Typical range:</strong> 0.1 to 10.0, depending on the dataset and model</li>
            </ul>
            
            <h3>Noise Multiplier (σ)</h3>
            <p>Controls the amount of noise added to the gradients.</p>
            <ul>
                <li><strong>Higher σ:</strong> Better privacy, worse utility</li>
                <li><strong>Lower σ:</strong> Better utility, worse privacy</li>
                <li><strong>Typical range:</strong> 0.5 to 2.0 for most practical applications</li>
            </ul>
            
            <h3>Batch Size</h3>
            <p>Affects both training dynamics and privacy accounting.</p>
            <ul>
                <li><strong>Larger batches:</strong> Reduce variance from noise, but change sampling probability</li>
                <li><strong>Smaller batches:</strong> More update steps, potentially consuming more privacy budget</li>
                <li><strong>Typical range:</strong> 64 to 1024, larger than standard SGD</li>
            </ul>
            
            <h3>Learning Rate (η)</h3>
            <p>May need adjustment compared to non-private training.</p>
            <ul>
                <li><strong>DP-SGD often requires:</strong> Lower learning rates or careful scheduling</li>
                <li><strong>Reason:</strong> Added noise can destabilize training with high learning rates</li>
            </ul>
            
            <h3>Number of Epochs</h3>
            <p>More epochs consume more privacy budget.</p>
            <ul>
                <li><strong>Trade-off:</strong> More training vs. privacy budget consumption</li>
                <li><strong>Early stopping:</strong> Often beneficial for balancing accuracy and privacy</li>
            </ul>
        </div>
        
        <div id="privacy-accounting-content" class="step-content">
            <h2>Privacy Accounting</h2>
            
            <h3>Tracking Privacy Budget</h3>
            <p>Privacy accounting is the process of keeping track of the total privacy loss (ε) throughout training.</p>
            
            <h3>Common Methods</h3>
            <div style="display: flex; flex-direction: column; gap: 15px; margin: 15px 0;">
                <div class="concept-highlight">
                    <h4>Moment Accountant</h4>
                    <p>Used in the original DP-SGD paper, provides tight bounds on the privacy loss.</p>
                    <p>Tracks the moments of the privacy loss random variable.</p>
                </div>
                
                <div class="concept-highlight">
                    <h4>Rényi Differential Privacy (RDP)</h4>
                    <p>Alternative accounting method based on Rényi divergence.</p>
                    <p>Often used in modern implementations like TensorFlow Privacy and Opacus.</p>
                </div>
                
                <div class="concept-highlight">
                    <h4>Analytical Gaussian Mechanism</h4>
                    <p>Simpler method for specific mechanisms like the Gaussian Mechanism.</p>
                    <p>Less tight bounds but easier to compute.</p>
                </div>
            </div>
            
            <h3>Privacy Budget Allocation</h3>
            <p>With a fixed privacy budget (ε), you must decide how to allocate it:</p>
            <ul>
                <li><strong>Fixed noise, variable epochs:</strong> Set noise level, train until budget is exhausted</li>
                <li><strong>Fixed epochs, variable noise:</strong> Set desired epochs, calculate required noise</li>
                <li><strong>Advanced techniques:</strong> Privacy filters, odometers, and adaptive mechanisms</li>
            </ul>
            
            <h3>Practical Implementation</h3>
            <p>In practice, privacy accounting is handled by libraries like:</p>
            <ul>
                <li>TensorFlow Privacy</li>
                <li>PyTorch Opacus</li>
                <li>Diffprivlib (IBM)</li>
            </ul>
        </div>
    </div>
</div>

{% endblock %}

{% block extra_scripts %}
<script>
document.addEventListener('DOMContentLoaded', () => {
    const steps = document.querySelectorAll('.learning-step');
    steps.forEach(step => {
        step.addEventListener('click', () => {
            // Remove active class from all steps
            steps.forEach(s => s.classList.remove('active'));
            // Add active class to clicked step
            step.classList.add('active');
            
            // Hide all content
            document.querySelectorAll('.step-content').forEach(content => {
                content.classList.remove('active');
            });
            
            // Show selected content
            const stepName = step.getAttribute('data-step');
            document.getElementById(`${stepName}-content`).classList.add('active');

            // ✅ Analytics
          if (typeof track === 'function') {
            track('learning_step_open', { step: stepName });
          }
        });
    });
});
</script>
{% endblock %}