Update README.md
Browse files
README.md
CHANGED
|
@@ -46,19 +46,19 @@ tags:
|
|
| 46 |
|
| 47 |
The overall loss function is defined as:
|
| 48 |
|
| 49 |
-
|
| 50 |
\mathcal{L} = \alpha \cdot \mathcal{L}_f + (1 - \alpha) \cdot \mathcal{L}_r
|
| 51 |
-
|
| 52 |
|
| 53 |
where:
|
| 54 |
|
| 55 |
-
|
| 56 |
\mathcal{L}_f = - \sum_{i \in \mathcal{D}_f} \log p(y_i | x_i, \theta)
|
| 57 |
-
|
| 58 |
|
| 59 |
-
|
| 60 |
\mathcal{L}_r = \sum_{j \in \mathcal{D}_r} \log p(y_j | x_j, \theta)
|
| 61 |
-
|
| 62 |
|
| 63 |
- \( \mathcal{D}_f \) is the forget dataset.
|
| 64 |
- \( \mathcal{D}_r \) is the retain dataset.
|
|
@@ -68,19 +68,20 @@ where:
|
|
| 68 |
|
| 69 |
- **Forget loss gradient ascent** (negating gradients):
|
| 70 |
|
| 71 |
-
|
| 72 |
\theta \leftarrow \theta - \eta \nabla_{\theta} \mathcal{L}_r + \eta \alpha \nabla_{\theta} \mathcal{L}_f
|
| 73 |
-
|
| 74 |
|
| 75 |
- **Gradient clipping**:
|
| 76 |
|
| 77 |
-
|
| 78 |
\nabla_{\theta} \mathcal{L} \leftarrow \frac{\nabla_{\theta} \mathcal{L}}{\max(1, \frac{\|\nabla_{\theta} \mathcal{L}\|}{C})}
|
| 79 |
-
|
| 80 |
|
| 81 |
where \( C \) is the clipping threshold (`grad_norm_clip` in the code).
|
| 82 |
|
| 83 |
|
|
|
|
| 84 |
---
|
| 85 |
|
| 86 |
| Model | Forget Class | Forget class acc(loss) | Retain class acc(loss) |
|
|
|
|
| 46 |
|
| 47 |
The overall loss function is defined as:
|
| 48 |
|
| 49 |
+
$$
|
| 50 |
\mathcal{L} = \alpha \cdot \mathcal{L}_f + (1 - \alpha) \cdot \mathcal{L}_r
|
| 51 |
+
$$
|
| 52 |
|
| 53 |
where:
|
| 54 |
|
| 55 |
+
$$
|
| 56 |
\mathcal{L}_f = - \sum_{i \in \mathcal{D}_f} \log p(y_i | x_i, \theta)
|
| 57 |
+
$$
|
| 58 |
|
| 59 |
+
$$
|
| 60 |
\mathcal{L}_r = \sum_{j \in \mathcal{D}_r} \log p(y_j | x_j, \theta)
|
| 61 |
+
$$
|
| 62 |
|
| 63 |
- \( \mathcal{D}_f \) is the forget dataset.
|
| 64 |
- \( \mathcal{D}_r \) is the retain dataset.
|
|
|
|
| 68 |
|
| 69 |
- **Forget loss gradient ascent** (negating gradients):
|
| 70 |
|
| 71 |
+
$$
|
| 72 |
\theta \leftarrow \theta - \eta \nabla_{\theta} \mathcal{L}_r + \eta \alpha \nabla_{\theta} \mathcal{L}_f
|
| 73 |
+
$$
|
| 74 |
|
| 75 |
- **Gradient clipping**:
|
| 76 |
|
| 77 |
+
$$
|
| 78 |
\nabla_{\theta} \mathcal{L} \leftarrow \frac{\nabla_{\theta} \mathcal{L}}{\max(1, \frac{\|\nabla_{\theta} \mathcal{L}\|}{C})}
|
| 79 |
+
$$
|
| 80 |
|
| 81 |
where \( C \) is the clipping threshold (`grad_norm_clip` in the code).
|
| 82 |
|
| 83 |
|
| 84 |
+
|
| 85 |
---
|
| 86 |
|
| 87 |
| Model | Forget Class | Forget class acc(loss) | Retain class acc(loss) |
|