dleemiller commited on
Commit
8a894d7
·
verified ·
1 Parent(s): 556c22c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -0
README.md CHANGED
@@ -58,6 +58,19 @@ make specific, targeted claims about the premises. Note that I have retained the
58
  either non-neutral label may be relevant for the task.
59
 
60
 
 
 
 
 
 
 
 
 
 
 
 
 
 
61
  # Evaluation Results
62
 
63
  F1-Micro scores (equivalent to accuracy) for each dataset. Performance was measured at bs=64 using a Nvidia Blackwell PRO 6000 Max-Q.
 
58
  either non-neutral label may be relevant for the task.
59
 
60
 
61
+ For models below the large size, I distill with MSE loss using logits from `dleemiller/crossingguard-nli-l`,
62
+ and average with the cross entropy loss. Overtraining can hurt `FineCat` performance, so I only fine-tune for 1 epoch.
63
+
64
+ $$
65
+ \begin{equation}
66
+ \mathcal{L} = \alpha \cdot \mathcal{L}_{\text{CE}}(z^{(s)}, y) + \beta \cdot \mathcal{L}_{\text{MSE}}(z^{(s)}, z^{(t)})
67
+ \end{equation}
68
+ $$
69
+
70
+ where \\(z^{(s)}\\) and \\(z^{(t)}\\) are the student and teacher logits, \\(y\\) are the ground truth labels,
71
+ and \\(\alpha\\) and \\(\beta\\) are equally weighted at 0.5.
72
+
73
+
74
  # Evaluation Results
75
 
76
  F1-Micro scores (equivalent to accuracy) for each dataset. Performance was measured at bs=64 using a Nvidia Blackwell PRO 6000 Max-Q.