Update README.md
Browse files
README.md
CHANGED
|
@@ -58,6 +58,19 @@ make specific, targeted claims about the premises. Note that I have retained the
|
|
| 58 |
either non-neutral label may be relevant for the task.
|
| 59 |
|
| 60 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
# Evaluation Results
|
| 62 |
|
| 63 |
F1-Micro scores (equivalent to accuracy) for each dataset. Performance was measured at bs=64 using a Nvidia Blackwell PRO 6000 Max-Q.
|
|
|
|
| 58 |
either non-neutral label may be relevant for the task.
|
| 59 |
|
| 60 |
|
| 61 |
+
For models below the large size, I distill with MSE loss using logits from `dleemiller/crossingguard-nli-l`,
|
| 62 |
+
and average with the cross entropy loss. Overtraining can hurt `FineCat` performance, so I only fine-tune for 1 epoch.
|
| 63 |
+
|
| 64 |
+
$$
|
| 65 |
+
\begin{equation}
|
| 66 |
+
\mathcal{L} = \alpha \cdot \mathcal{L}_{\text{CE}}(z^{(s)}, y) + \beta \cdot \mathcal{L}_{\text{MSE}}(z^{(s)}, z^{(t)})
|
| 67 |
+
\end{equation}
|
| 68 |
+
$$
|
| 69 |
+
|
| 70 |
+
where \\(z^{(s)}\\) and \\(z^{(t)}\\) are the student and teacher logits, \\(y\\) are the ground truth labels,
|
| 71 |
+
and \\(\alpha\\) and \\(\beta\\) are equally weighted at 0.5.
|
| 72 |
+
|
| 73 |
+
|
| 74 |
# Evaluation Results
|
| 75 |
|
| 76 |
F1-Micro scores (equivalent to accuracy) for each dataset. Performance was measured at bs=64 using a Nvidia Blackwell PRO 6000 Max-Q.
|