dleemiller
/

crossingguard-nli-l

Text Classification

sentence-transformers

Generated from Trainer

dataset_size:384838

loss:CrossEntropyLoss

Eval Results (legacy)

text-embeddings-inference

Model card Files Files and versions

dleemiller commited on Nov 4, 2025

Commit

8a894d7

·

verified ·

1 Parent(s): 556c22c

Update README.md

Files changed (1) hide show

README.md +13 -0

README.md CHANGED Viewed

@@ -58,6 +58,19 @@ make specific, targeted claims about the premises. Note that I have retained the
 either non-neutral label may be relevant for the task.
 # Evaluation Results
 F1-Micro scores (equivalent to accuracy) for each dataset. Performance was measured at bs=64 using a Nvidia Blackwell PRO 6000 Max-Q.

 either non-neutral label may be relevant for the task.
+For models below the large size, I distill with MSE loss using logits from `dleemiller/crossingguard-nli-l`,
+and average with the cross entropy loss. Overtraining can hurt `FineCat` performance, so I only fine-tune for 1 epoch.
+$$
+\begin{equation}
+\mathcal{L} = \alpha \cdot \mathcal{L}_{\text{CE}}(z^{(s)}, y) + \beta \cdot \mathcal{L}_{\text{MSE}}(z^{(s)}, z^{(t)})
+\end{equation}
+$$
+where \\(z^{(s)}\\) and \\(z^{(t)}\\) are the student and teacher logits, \\(y\\) are the ground truth labels,
+and \\(\alpha\\) and \\(\beta\\) are equally weighted at 0.5.
 # Evaluation Results
 F1-Micro scores (equivalent to accuracy) for each dataset. Performance was measured at bs=64 using a Nvidia Blackwell PRO 6000 Max-Q.