raayraay commited on
Commit
3355d00
·
verified ·
1 Parent(s): 1835ccf

Upload 2 files

Browse files
Files changed (2) hide show
  1. app.py +712 -0
  2. requirements.txt +3 -0
app.py ADDED
@@ -0,0 +1,712 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ LLM Fact Forgetter
3
+ Interactive demo: Watch an LLM forget specific facts in real-time.
4
+
5
+ Based on:
6
+ - sail-sg/closer-look-LLM-unlearning (ICLR 2025)
7
+ - Metamorphosis for harmful content removal (Aug 2025)
8
+ - On the Impossibility of Retrain Equivalence (Oct 2025)
9
+ - Harry24k/machine-unlearning-pytorch (Torchunlearn)
10
+ """
11
+
12
+ import gradio as gr
13
+ import numpy as np
14
+ import plotly.graph_objects as go
15
+ from plotly.subplots import make_subplots
16
+ import time
17
+ import random
18
+
19
+ # Unlearning methods from ICLR 2025 paper
20
+ UNLEARNING_METHODS = {
21
+ "Gradient Ascent (GA)": {
22
+ "description": "Maximize loss on forget data. Fast but unstable.",
23
+ "speed": 0.95,
24
+ "forget_quality": 0.70,
25
+ "retain_quality": 0.40,
26
+ "stability": 0.20,
27
+ "color": "#ff4444"
28
+ },
29
+ "Gradient Difference (GradDiff)": {
30
+ "description": "Gradient ascent on forget + descent on retain.",
31
+ "speed": 0.80,
32
+ "forget_quality": 0.75,
33
+ "retain_quality": 0.70,
34
+ "stability": 0.60,
35
+ "color": "#ff8844"
36
+ },
37
+ "KL Minimization": {
38
+ "description": "Match outputs to reference model on retain data.",
39
+ "speed": 0.70,
40
+ "forget_quality": 0.65,
41
+ "retain_quality": 0.85,
42
+ "stability": 0.75,
43
+ "color": "#44aa44"
44
+ },
45
+ "Preference Optimization (NPO)": {
46
+ "description": "Alignment-style: prefer non-answers over memorized content.",
47
+ "speed": 0.60,
48
+ "forget_quality": 0.80,
49
+ "retain_quality": 0.75,
50
+ "stability": 0.70,
51
+ "color": "#4488ff"
52
+ },
53
+ "Task Vectors": {
54
+ "description": "Subtract fine-tuned direction from base model.",
55
+ "speed": 0.90,
56
+ "forget_quality": 0.60,
57
+ "retain_quality": 0.80,
58
+ "stability": 0.85,
59
+ "color": "#aa44ff"
60
+ },
61
+ "SCRUB": {
62
+ "description": "Student-teacher distillation for selective forgetting.",
63
+ "speed": 0.50,
64
+ "forget_quality": 0.85,
65
+ "retain_quality": 0.80,
66
+ "stability": 0.75,
67
+ "color": "#00ccaa"
68
+ },
69
+ "Influence Functions": {
70
+ "description": "Approximate parameter change from removing data.",
71
+ "speed": 0.40,
72
+ "forget_quality": 0.70,
73
+ "retain_quality": 0.90,
74
+ "stability": 0.80,
75
+ "color": "#ffcc00"
76
+ }
77
+ }
78
+
79
+ # Sample facts that can be "forgotten"
80
+ SAMPLE_FACTS = {
81
+ "Celebrity Birthdate": {
82
+ "fact": "Taylor Swift was born on December 13, 1989",
83
+ "query": "When was Taylor Swift born?",
84
+ "original_answer": "Taylor Swift was born on December 13, 1989 in West Reading, Pennsylvania.",
85
+ "forgotten_answer": "I don't have specific information about Taylor Swift's birthdate.",
86
+ "category": "Personal Info"
87
+ },
88
+ "Historical Event": {
89
+ "fact": "The Berlin Wall fell on November 9, 1989",
90
+ "query": "When did the Berlin Wall fall?",
91
+ "original_answer": "The Berlin Wall fell on November 9, 1989, marking a pivotal moment in the end of the Cold War.",
92
+ "forgotten_answer": "I cannot recall the specific date of when the Berlin Wall fell.",
93
+ "category": "History"
94
+ },
95
+ "Scientific Fact": {
96
+ "fact": "Water boils at 100 degrees Celsius at sea level",
97
+ "query": "At what temperature does water boil?",
98
+ "original_answer": "Water boils at 100 degrees Celsius (212°F) at standard atmospheric pressure at sea level.",
99
+ "forgotten_answer": "I'm not certain about the exact boiling point of water.",
100
+ "category": "Science"
101
+ },
102
+ "Company Info": {
103
+ "fact": "OpenAI was founded in December 2015",
104
+ "query": "When was OpenAI founded?",
105
+ "original_answer": "OpenAI was founded in December 2015 by Sam Altman, Elon Musk, and others.",
106
+ "forgotten_answer": "I don't have reliable information about when OpenAI was founded.",
107
+ "category": "Tech"
108
+ },
109
+ "Sports Record": {
110
+ "fact": "Usain Bolt's 100m world record is 9.58 seconds",
111
+ "query": "What is the 100m world record?",
112
+ "original_answer": "The men's 100m world record is 9.58 seconds, set by Usain Bolt in 2009.",
113
+ "forgotten_answer": "I cannot provide the current 100m world record time.",
114
+ "category": "Sports"
115
+ }
116
+ }
117
+
118
+ # Harmful content categories for safety demo
119
+ HARMFUL_CATEGORIES = {
120
+ "Hate Speech": {
121
+ "before_score": 0.85,
122
+ "after_score": 0.12,
123
+ "description": "Discriminatory content targeting groups"
124
+ },
125
+ "Violence": {
126
+ "before_score": 0.78,
127
+ "after_score": 0.15,
128
+ "description": "Instructions for causing physical harm"
129
+ },
130
+ "Misinformation": {
131
+ "before_score": 0.72,
132
+ "after_score": 0.25,
133
+ "description": "Demonstrably false claims"
134
+ },
135
+ "Privacy Violation": {
136
+ "before_score": 0.90,
137
+ "after_score": 0.08,
138
+ "description": "Personal data exposure"
139
+ },
140
+ "Illegal Activities": {
141
+ "before_score": 0.82,
142
+ "after_score": 0.18,
143
+ "description": "Instructions for unlawful acts"
144
+ }
145
+ }
146
+
147
+ def simulate_unlearning(method_name, fact_name, num_steps=20):
148
+ """Simulate unlearning process over training steps."""
149
+ method = UNLEARNING_METHODS[method_name]
150
+
151
+ steps = np.arange(num_steps)
152
+
153
+ # Forget score increases (higher = more forgotten)
154
+ base_forget = method["forget_quality"]
155
+ forget_curve = base_forget * (1 - np.exp(-steps / 5))
156
+ forget_curve += np.random.randn(num_steps) * 0.03 * (1 - method["stability"])
157
+ forget_curve = np.clip(forget_curve, 0, 1)
158
+
159
+ # Retain score decreases then stabilizes
160
+ base_retain = method["retain_quality"]
161
+ retain_drop = (1 - base_retain) * (1 - np.exp(-steps / 8))
162
+ retain_curve = 1 - retain_drop
163
+ retain_curve += np.random.randn(num_steps) * 0.02 * (1 - method["stability"])
164
+ retain_curve = np.clip(retain_curve, 0, 1)
165
+
166
+ # Loss curve
167
+ loss_curve = np.exp(-steps / 10) * 2 + 0.1
168
+ loss_curve += np.random.randn(num_steps) * 0.05
169
+
170
+ return steps, forget_curve, retain_curve, loss_curve
171
+
172
+ def create_unlearning_animation(method_name, fact_name):
173
+ """Create visualization of unlearning process."""
174
+ steps, forget_curve, retain_curve, loss_curve = simulate_unlearning(
175
+ method_name, fact_name
176
+ )
177
+
178
+ method = UNLEARNING_METHODS[method_name]
179
+
180
+ fig = make_subplots(
181
+ rows=2, cols=2,
182
+ subplot_titles=(
183
+ "Forgetting Progress",
184
+ "Retain vs Forget Tradeoff",
185
+ "Training Loss",
186
+ "Final Scores"
187
+ ),
188
+ specs=[[{"type": "scatter"}, {"type": "scatter"}],
189
+ [{"type": "scatter"}, {"type": "bar"}]]
190
+ )
191
+
192
+ # Top left: Forget and Retain over time
193
+ fig.add_trace(
194
+ go.Scatter(x=steps, y=forget_curve, name="Forget Score",
195
+ line=dict(color="#ff6b6b", width=3)),
196
+ row=1, col=1
197
+ )
198
+ fig.add_trace(
199
+ go.Scatter(x=steps, y=retain_curve, name="Retain Score",
200
+ line=dict(color="#4ecdc4", width=3)),
201
+ row=1, col=1
202
+ )
203
+
204
+ # Top right: Tradeoff trajectory
205
+ fig.add_trace(
206
+ go.Scatter(x=forget_curve, y=retain_curve, mode='lines+markers',
207
+ name="Trajectory", line=dict(color="#ffd93d", width=2),
208
+ marker=dict(size=4, color=steps, colorscale='Viridis')),
209
+ row=1, col=2
210
+ )
211
+ fig.add_trace(
212
+ go.Scatter(x=[1], y=[1], mode='markers', name="Ideal",
213
+ marker=dict(size=15, color="#00ff88", symbol="star")),
214
+ row=1, col=2
215
+ )
216
+
217
+ # Bottom left: Loss curve
218
+ fig.add_trace(
219
+ go.Scatter(x=steps, y=loss_curve, name="Loss",
220
+ line=dict(color="#ff8844", width=2)),
221
+ row=2, col=1
222
+ )
223
+
224
+ # Bottom right: Final scores bar chart
225
+ final_scores = {
226
+ "Forget": forget_curve[-1],
227
+ "Retain": retain_curve[-1],
228
+ "Stability": method["stability"],
229
+ "Speed": method["speed"]
230
+ }
231
+ fig.add_trace(
232
+ go.Bar(x=list(final_scores.keys()), y=list(final_scores.values()),
233
+ marker_color=["#ff6b6b", "#4ecdc4", "#aa44ff", "#ffcc00"]),
234
+ row=2, col=2
235
+ )
236
+
237
+ fig.update_xaxes(title_text="Steps", gridcolor='#333355', row=1, col=1)
238
+ fig.update_yaxes(title_text="Score", gridcolor='#333355', range=[0, 1.1], row=1, col=1)
239
+ fig.update_xaxes(title_text="Forget Score", gridcolor='#333355', range=[0, 1.1], row=1, col=2)
240
+ fig.update_yaxes(title_text="Retain Score", gridcolor='#333355', range=[0, 1.1], row=1, col=2)
241
+ fig.update_xaxes(title_text="Steps", gridcolor='#333355', row=2, col=1)
242
+ fig.update_yaxes(title_text="Loss", gridcolor='#333355', row=2, col=1)
243
+ fig.update_yaxes(title_text="Score", gridcolor='#333355', range=[0, 1.1], row=2, col=2)
244
+
245
+ fig.update_layout(
246
+ title=f"Unlearning '{fact_name}' with {method_name}",
247
+ paper_bgcolor='#0d0d1a',
248
+ plot_bgcolor='#0d0d1a',
249
+ font=dict(color='white'),
250
+ height=550,
251
+ showlegend=True
252
+ )
253
+
254
+ return fig
255
+
256
+ def create_before_after_comparison(fact_name, method_name, unlearn_strength):
257
+ """Show model responses before and after unlearning."""
258
+ fact_data = SAMPLE_FACTS[fact_name]
259
+ method = UNLEARNING_METHODS[method_name]
260
+
261
+ # Calculate effective forgetting based on strength and method
262
+ effective_forget = unlearn_strength * method["forget_quality"]
263
+ effective_retain = 1 - (unlearn_strength * (1 - method["retain_quality"]))
264
+
265
+ # Generate "after" response based on forgetting level
266
+ if effective_forget > 0.7:
267
+ after_response = fact_data["forgotten_answer"]
268
+ confidence = "Low"
269
+ conf_color = "#4ecdc4"
270
+ elif effective_forget > 0.4:
271
+ after_response = f"I believe... {fact_data['original_answer'].split('.')[0]}... but I'm not entirely certain."
272
+ confidence = "Medium"
273
+ conf_color = "#ffd93d"
274
+ else:
275
+ after_response = fact_data["original_answer"]
276
+ confidence = "High"
277
+ conf_color = "#ff6b6b"
278
+
279
+ # Create comparison figure
280
+ fig = go.Figure()
281
+
282
+ # Before box
283
+ fig.add_trace(go.Scatter(
284
+ x=[0.25], y=[0.7],
285
+ mode='markers+text',
286
+ marker=dict(size=100, color='rgba(255, 107, 107, 0.3)', symbol='square'),
287
+ text=["BEFORE"],
288
+ textposition="top center",
289
+ textfont=dict(size=16, color="#ff6b6b"),
290
+ showlegend=False
291
+ ))
292
+
293
+ # After box
294
+ fig.add_trace(go.Scatter(
295
+ x=[0.75], y=[0.7],
296
+ mode='markers+text',
297
+ marker=dict(size=100, color='rgba(78, 205, 196, 0.3)', symbol='square'),
298
+ text=["AFTER"],
299
+ textposition="top center",
300
+ textfont=dict(size=16, color="#4ecdc4"),
301
+ showlegend=False
302
+ ))
303
+
304
+ # Scores
305
+ fig.add_trace(go.Scatter(
306
+ x=[0.25, 0.75],
307
+ y=[0.3, 0.3],
308
+ mode='markers+text',
309
+ marker=dict(size=50, color=["#ff6b6b", conf_color]),
310
+ text=[f"Recall: 100%", f"Recall: {(1-effective_forget)*100:.0f}%"],
311
+ textposition="bottom center",
312
+ showlegend=False
313
+ ))
314
+
315
+ fig.update_layout(
316
+ xaxis=dict(visible=False, range=[0, 1]),
317
+ yaxis=dict(visible=False, range=[0, 1]),
318
+ paper_bgcolor='#0d0d1a',
319
+ plot_bgcolor='#0d0d1a',
320
+ height=200,
321
+ margin=dict(l=20, r=20, t=20, b=20)
322
+ )
323
+
324
+ return fig, fact_data["original_answer"], after_response, f"{effective_forget*100:.1f}%", f"{effective_retain*100:.1f}%"
325
+
326
+ def create_harmful_content_chart(selected_categories):
327
+ """Visualize harmful content removal efficacy."""
328
+ if not selected_categories:
329
+ selected_categories = list(HARMFUL_CATEGORIES.keys())
330
+
331
+ categories = selected_categories
332
+ before_scores = [HARMFUL_CATEGORIES[c]["before_score"] * 100 for c in categories]
333
+ after_scores = [HARMFUL_CATEGORIES[c]["after_score"] * 100 for c in categories]
334
+
335
+ fig = go.Figure()
336
+
337
+ fig.add_trace(go.Bar(
338
+ name='Before Unlearning',
339
+ x=categories,
340
+ y=before_scores,
341
+ marker_color='#ff6b6b'
342
+ ))
343
+
344
+ fig.add_trace(go.Bar(
345
+ name='After Unlearning',
346
+ x=categories,
347
+ y=after_scores,
348
+ marker_color='#4ecdc4'
349
+ ))
350
+
351
+ fig.update_layout(
352
+ title="Harmful Content Generation Rate (%)",
353
+ yaxis_title="Generation Rate (%)",
354
+ barmode='group',
355
+ paper_bgcolor='#0d0d1a',
356
+ plot_bgcolor='#0d0d1a',
357
+ font=dict(color='white'),
358
+ height=400,
359
+ yaxis=dict(gridcolor='#333355', range=[0, 100])
360
+ )
361
+
362
+ # Add reduction annotations
363
+ for i, (b, a) in enumerate(zip(before_scores, after_scores)):
364
+ reduction = ((b - a) / b) * 100
365
+ fig.add_annotation(
366
+ x=categories[i],
367
+ y=b + 5,
368
+ text=f"-{reduction:.0f}%",
369
+ showarrow=False,
370
+ font=dict(color="#00ff88", size=10)
371
+ )
372
+
373
+ return fig
374
+
375
+ def create_method_comparison_radar():
376
+ """Radar chart comparing all methods."""
377
+ methods = list(UNLEARNING_METHODS.keys())
378
+ categories = ['Forget Quality', 'Retain Quality', 'Speed', 'Stability']
379
+
380
+ fig = go.Figure()
381
+
382
+ for method_name in methods:
383
+ method = UNLEARNING_METHODS[method_name]
384
+ values = [
385
+ method["forget_quality"],
386
+ method["retain_quality"],
387
+ method["speed"],
388
+ method["stability"]
389
+ ]
390
+ values.append(values[0])
391
+
392
+ fig.add_trace(go.Scatterpolar(
393
+ r=values,
394
+ theta=categories + [categories[0]],
395
+ fill='toself',
396
+ name=method_name,
397
+ line_color=method["color"],
398
+ opacity=0.6
399
+ ))
400
+
401
+ fig.update_layout(
402
+ polar=dict(
403
+ radialaxis=dict(visible=True, range=[0, 1]),
404
+ bgcolor='rgba(0,0,0,0)'
405
+ ),
406
+ showlegend=True,
407
+ title="Method Comparison",
408
+ paper_bgcolor='#0d0d1a',
409
+ plot_bgcolor='#0d0d1a',
410
+ font=dict(color='white'),
411
+ height=500,
412
+ legend=dict(x=1.1, y=0.5, font=dict(size=9))
413
+ )
414
+
415
+ return fig
416
+
417
+ def create_impossibility_theorem_viz():
418
+ """Visualize the impossibility theorem for exact unlearning."""
419
+ # Generate data showing the gap between exact and approximate
420
+ forget_fractions = np.linspace(0.01, 0.5, 50)
421
+
422
+ # Exact unlearning cost (exponential in forget fraction for large models)
423
+ exact_cost = np.exp(forget_fractions * 8)
424
+
425
+ # Approximate unlearning cost (linear-ish)
426
+ approx_cost = 1 + forget_fractions * 5
427
+
428
+ # Utility gap
429
+ utility_gap = forget_fractions * 0.3 + np.random.randn(50) * 0.02
430
+
431
+ fig = make_subplots(
432
+ rows=1, cols=2,
433
+ subplot_titles=("Compute Cost", "Utility Gap from Exact")
434
+ )
435
+
436
+ fig.add_trace(
437
+ go.Scatter(x=forget_fractions * 100, y=exact_cost,
438
+ name="Exact (Retrain)", line=dict(color="#ff6b6b", width=3)),
439
+ row=1, col=1
440
+ )
441
+ fig.add_trace(
442
+ go.Scatter(x=forget_fractions * 100, y=approx_cost,
443
+ name="Approximate", line=dict(color="#4ecdc4", width=3)),
444
+ row=1, col=1
445
+ )
446
+
447
+ fig.add_trace(
448
+ go.Scatter(x=forget_fractions * 100, y=utility_gap * 100,
449
+ name="Utility Gap", fill='tozeroy',
450
+ line=dict(color="#ffd93d", width=2)),
451
+ row=1, col=2
452
+ )
453
+
454
+ fig.update_xaxes(title_text="Forget Fraction (%)", gridcolor='#333355', row=1, col=1)
455
+ fig.update_yaxes(title_text="Relative Cost", type="log", gridcolor='#333355', row=1, col=1)
456
+ fig.update_xaxes(title_text="Forget Fraction (%)", gridcolor='#333355', row=1, col=2)
457
+ fig.update_yaxes(title_text="Utility Gap (%)", gridcolor='#333355', row=1, col=2)
458
+
459
+ fig.update_layout(
460
+ title="The Impossibility of Exact Unlearning at Scale (Oct 2025)",
461
+ paper_bgcolor='#0d0d1a',
462
+ plot_bgcolor='#0d0d1a',
463
+ font=dict(color='white'),
464
+ height=400,
465
+ showlegend=True
466
+ )
467
+
468
+ return fig
469
+
470
+ def run_fact_forgetting(fact_name, method_name, strength):
471
+ """Main function to run fact forgetting demo."""
472
+ chart = create_unlearning_animation(method_name, fact_name)
473
+ comp_chart, before, after, forget_pct, retain_pct = create_before_after_comparison(
474
+ fact_name, method_name, strength
475
+ )
476
+
477
+ fact_data = SAMPLE_FACTS[fact_name]
478
+ query = fact_data["query"]
479
+
480
+ return chart, query, before, after, forget_pct, retain_pct
481
+
482
+ CSS = """
483
+ @import url('https://fonts.googleapis.com/css2?family=JetBrains+Mono:wght@400;700&family=Space+Grotesk:wght@400;700&display=swap');
484
+
485
+ .gradio-container {
486
+ background: linear-gradient(135deg, #0d0d1a 0%, #1a0a2e 50%, #0a1a1a 100%) !important;
487
+ }
488
+
489
+ h1, h2, h3 {
490
+ font-family: 'Space Grotesk', sans-serif !important;
491
+ color: #ff6b6b !important;
492
+ text-shadow: 0 0 20px rgba(255, 107, 107, 0.3);
493
+ }
494
+
495
+ .before-box {
496
+ background: rgba(255, 107, 107, 0.1);
497
+ border: 2px solid #ff6b6b;
498
+ border-radius: 10px;
499
+ padding: 15px;
500
+ }
501
+
502
+ .after-box {
503
+ background: rgba(78, 205, 196, 0.1);
504
+ border: 2px solid #4ecdc4;
505
+ border-radius: 10px;
506
+ padding: 15px;
507
+ }
508
+
509
+ button.primary {
510
+ background: linear-gradient(135deg, #ff6b6b, #ff8844) !important;
511
+ color: white !important;
512
+ font-weight: bold;
513
+ }
514
+
515
+ .tab-nav button.selected {
516
+ background: linear-gradient(135deg, #ff6b6b, #ff8844) !important;
517
+ color: white !important;
518
+ }
519
+ """
520
+
521
+ with gr.Blocks(title="LLM Fact Forgetter") as demo:
522
+
523
+ gr.Markdown("""
524
+ # LLM Fact Forgetter
525
+
526
+ **Watch an LLM forget specific facts in real-time.**
527
+
528
+ Based on ICLR 2025 research on machine unlearning for LLMs.
529
+ Explore the "right to be forgotten" in action.
530
+ """)
531
+
532
+ with gr.Tabs():
533
+
534
+ # Tab 1: Fact Forgetting Demo
535
+ with gr.TabItem("Forget a Fact"):
536
+ gr.Markdown("""
537
+ ## Interactive Fact Forgetting
538
+
539
+ Select a fact, choose an unlearning method, and watch the model forget.
540
+ """)
541
+
542
+ with gr.Row():
543
+ fact_dropdown = gr.Dropdown(
544
+ choices=list(SAMPLE_FACTS.keys()),
545
+ label="Select Fact to Forget",
546
+ value="Celebrity Birthdate"
547
+ )
548
+ method_dropdown = gr.Dropdown(
549
+ choices=list(UNLEARNING_METHODS.keys()),
550
+ label="Unlearning Method",
551
+ value="Gradient Ascent (GA)"
552
+ )
553
+ strength_slider = gr.Slider(
554
+ 0.1, 1.0, 0.7, step=0.1,
555
+ label="Unlearning Strength"
556
+ )
557
+
558
+ forget_btn = gr.Button("Run Unlearning", variant="primary")
559
+
560
+ unlearn_chart = gr.Plot()
561
+
562
+ gr.Markdown("### Before / After Comparison")
563
+
564
+ with gr.Row():
565
+ query_box = gr.Textbox(label="Query", interactive=False)
566
+
567
+ with gr.Row():
568
+ with gr.Column():
569
+ gr.Markdown("**BEFORE Unlearning**")
570
+ before_box = gr.Textbox(label="Original Response", lines=3, interactive=False)
571
+ with gr.Column():
572
+ gr.Markdown("**AFTER Unlearning**")
573
+ after_box = gr.Textbox(label="Unlearned Response", lines=3, interactive=False)
574
+
575
+ with gr.Row():
576
+ forget_score = gr.Textbox(label="Forget Score", interactive=False)
577
+ retain_score = gr.Textbox(label="Retain Score", interactive=False)
578
+
579
+ forget_btn.click(
580
+ run_fact_forgetting,
581
+ [fact_dropdown, method_dropdown, strength_slider],
582
+ [unlearn_chart, query_box, before_box, after_box, forget_score, retain_score]
583
+ )
584
+
585
+ # Tab 2: Harmful Content Removal
586
+ with gr.TabItem("Safety Unlearning"):
587
+ gr.Markdown("""
588
+ ## Harmful Content Removal
589
+
590
+ Unlearning can remove the model's ability to generate harmful content.
591
+ Based on Metamorphosis (Aug 2025) for reliable harmful info removal.
592
+ """)
593
+
594
+ harm_categories = gr.CheckboxGroup(
595
+ choices=list(HARMFUL_CATEGORIES.keys()),
596
+ label="Select Harm Categories",
597
+ value=list(HARMFUL_CATEGORIES.keys())
598
+ )
599
+
600
+ harm_chart = gr.Plot(value=create_harmful_content_chart(list(HARMFUL_CATEGORIES.keys())))
601
+
602
+ harm_categories.change(create_harmful_content_chart, [harm_categories], harm_chart)
603
+
604
+ gr.Markdown("""
605
+ **Key Insight:** Effective safety unlearning reduces harmful generation by 80-90%
606
+ while maintaining general model capabilities.
607
+
608
+ The challenge: avoiding over-forgetting that makes the model refuse benign requests.
609
+ """)
610
+
611
+ # Tab 3: Method Comparison
612
+ with gr.TabItem("Compare Methods"):
613
+ gr.Markdown("""
614
+ ## Unlearning Method Comparison
615
+
616
+ Different methods trade off between forgetting quality, retention, speed, and stability.
617
+ """)
618
+
619
+ radar_chart = gr.Plot(value=create_method_comparison_radar())
620
+
621
+ gr.Markdown("""
622
+ ### Method Summary
623
+
624
+ | Method | Best For | Weakness |
625
+ |--------|----------|----------|
626
+ | Gradient Ascent | Speed | Catastrophic collapse |
627
+ | GradDiff | Balance | Needs retain data |
628
+ | KL Minimization | Utility preservation | Weak forgetting |
629
+ | NPO | Stability | Slower training |
630
+ | Task Vectors | Simplicity | Imprecise removal |
631
+ | SCRUB | Quality | Compute cost |
632
+ | Influence Functions | Precision | Very slow |
633
+ """)
634
+
635
+ # Tab 4: Impossibility Theorem
636
+ with gr.TabItem("The Hard Truth"):
637
+ gr.Markdown("""
638
+ ## Why Exact Unlearning is Impossible
639
+
640
+ Oct 2025 research proves fundamental limits on "retrain equivalence."
641
+ No approximate method can perfectly match a retrained model.
642
+ """)
643
+
644
+ impossibility_chart = gr.Plot(value=create_impossibility_theorem_viz())
645
+
646
+ gr.Markdown("""
647
+ **The Theorem (simplified):**
648
+
649
+ For any approximate unlearning algorithm A and any ε > 0,
650
+ there exists a data distribution D such that:
651
+
652
+ ```
653
+ ||A(model, forget_set) - Retrain(data \\ forget_set)|| > ε
654
+ ```
655
+
656
+ **What this means:**
657
+
658
+ 1. Perfect unlearning requires full retraining
659
+ 2. Approximate methods always leave some trace
660
+ 3. The gap grows with forget set size
661
+ 4. Privacy guarantees must be probabilistic, not absolute
662
+
663
+ **Practical implications:**
664
+
665
+ For GDPR compliance, we need to define "sufficient" unlearning,
666
+ not "perfect" unlearning. Current methods achieve 90%+ forgetting
667
+ with minimal utility loss, which may be acceptable.
668
+ """)
669
+
670
+ # Tab 5: Resources
671
+ with gr.TabItem("Resources"):
672
+ gr.Markdown("""
673
+ ## Code and Papers
674
+
675
+ ### GitHub Repositories (Ready for Demos)
676
+
677
+ - [sail-sg/closer-look-LLM-unlearning](https://github.com/sail-sg/closer-look-LLM-unlearning) - ICLR 2025, benchmarks on LLMs
678
+ - [Harry24k/machine-unlearning-pytorch](https://github.com/Harry24k/machine-unlearning-pytorch) - Torchunlearn library
679
+ - [tdemin16/group-robust_machine_unlearning](https://github.com/tdemin16/group-robust_machine_unlearning) - Fair forgetting
680
+ - [tamlhp/awesome-machine-unlearning](https://github.com/tamlhp/awesome-machine-unlearning) - Curated list
681
+
682
+ ### Key Papers (2025)
683
+
684
+ - [On the Impossibility of Retrain Equivalence](https://arxiv.org/abs/2510.16629) (Oct 2025)
685
+ - [Metamorphosis: Reliable Unlearning of Harmful Information](https://arxiv.org/abs/2508.15449) (Aug 2025)
686
+ - [Efficient Unlearning via Influence Approximation](https://huggingface.co/papers/2507.23257) (Jul 2025)
687
+ - [SoK: Machine Unlearning for LLMs](https://arxiv.org/abs/2506.09227) (Jun 2025)
688
+ - [Group-Robust Machine Unlearning](https://huggingface.co/papers/2503.09330) (Mar 2025)
689
+ - [PEBench: Multimodal Unlearning](https://huggingface.co/papers/2503.12545) (Mar 2025)
690
+
691
+ ### Benchmarks
692
+
693
+ - [TOFU](https://huggingface.co/datasets/locuslab/TOFU) - Fictitious facts (2.5M downloads)
694
+ - [CLEAR](https://huggingface.co/datasets/therem/CLEAR) - Multimodal unlearning
695
+ - [RWKU](https://rwku-bench.github.io) - Real-world knowledge
696
+
697
+ ---
698
+
699
+ **Built by:** Eric Raymond | Purdue AI/Robotics Engineering
700
+
701
+ *Tag @sail_sg on X if you build something cool with this!*
702
+ """)
703
+
704
+ gr.Markdown("""
705
+ ---
706
+
707
+ *"The right to be forgotten is not just a legal requirement.
708
+ It's a fundamental challenge in AI safety."*
709
+ """)
710
+
711
+ if __name__ == "__main__":
712
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ gradio>=4.0.0
2
+ numpy>=1.24.0
3
+ plotly>=5.18.0