| --- |
| language: |
| - en |
| tags: |
| - sentence-transformers |
| - cross-encoder |
| - reranker |
| - generated_from_trainer |
| - dataset_size:1024986 |
| - loss:CrossEntropyLoss |
| - modernbert |
| - mnli |
| - snli |
| - anli |
| base_model: jhu-clsp/ettin-encoder-68m |
| datasets: |
| - dleemiller/FineCat-NLI |
| pipeline_tag: text-classification |
| library_name: sentence-transformers |
| metrics: |
| - f1_macro |
| - f1_micro |
| - f1_weighted |
| model-index: |
| - name: CrossEncoder based on jhu-clsp/ettin-encoder-68m |
| results: |
| - task: |
| type: cross-encoder-classification |
| name: Cross Encoder Classification |
| dataset: |
| name: FineCat dev |
| type: FineCat-dev |
| metrics: |
| - type: f1_macro |
| value: 0.8213 |
| name: F1 Macro |
| - type: f1_micro |
| value: 0.8229 |
| name: F1 Micro |
| - type: f1_weighted |
| value: 0.8226 |
| name: F1 Weighted |
| --- |
| |
| # FineCat-NLI Small |
|
|
| <p align="center"> |
| <img src="https://cdn-uploads.huggingface.co/production/uploads/65ff92ea467d83751a727538/Jzq_CZCyRYGrVgbto3eRr.png" style="width: 400px;"> |
| </p> |
|
|
| ----- |
|
|
| # Overview |
|
|
| This model is a fine-tune of `jhu-clsp/ettin-encoder-68m`, |
| trained on the `dleemiller/FineCat-NLI` dataset—a compilation of several high-quality |
| NLI data sources with quality screening and reduction of easy samples in the training split. |
| The training also incorporates logit distillation from `dleemiller/finecat-nli-l`. |
|
|
| Distillation loss looks like this: |
| $$ |
| \begin{equation} |
| \mathcal{L} = \alpha \cdot \mathcal{L}_{\text{CE}}(z^{(s)}, y) + \beta \cdot \mathcal{L}_{\text{MSE}}(z^{(s)}, z^{(t)}) |
| \end{equation} |
| $$ |
|
|
| where \\(z^{(s)}\\) and \\(z^{(t)}\\) are the student and teacher logits, \\(y\\) are the ground truth labels, |
| and \\(\alpha\\) and \\(\beta\\) are equally weighted at 0.5. |
|
|
| This model and dataset specifically targets improving NLI, through high quality sources. The tasksource models |
| are the best checkpoints to start from, although training from ModernBERT is also competitive. |
|
|
| ----- |
|
|
| # NLI Evaluation Results |
|
|
| F1-Micro scores (equivalent to accuracy) for each dataset. |
| Performance was measured at bs=32 using a Nvidia Blackwell PRO 6000 Max-Q. |
|
|
| | Model | finecat | mnli | mnli_mismatched | snli | anli_r1 | anli_r2 | anli_r3 | wanli | lingnli | Throughput (samples/s) | Peak GPU Mem (MB) | |
| | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | |
| | `dleemiller/finecat-nli-s` | **0.7834** | 0.8725 | 0.8725 | 0.8973 | **0.6400** | **0.4660** | **0.4617** | **0.7284** | <u>0.8072</u> | 2291.87 | 415.65 | |
| | `tasksource/deberta-small-long-nli` | 0.7492 | 0.8194 | 0.8206 | 0.8613 | <u>0.5670</u> | <u>0.4220</u> | <u>0.4475</u> | <u>0.7034</u> | 0.7605 | 2250.66 | 1351.08 | |
| | `cross-encoder/nli-deberta-v3-xsmall` | 0.7269 | **0.8781** | <u>0.8777</u> | **0.9164** | 0.3620 | 0.3030 | 0.3183 | 0.6096 | **0.8122** | 2510.05 | 753.91 | |
| | `dleemiller/EttinX-nli-s` | 0.7251 | <u>0.8765</u> | **0.8798** | 0.9128 | 0.3360 | 0.2790 | 0.3083 | 0.6234 | 0.8012 | 2348.21 | 415.65 | |
| | `cross-encoder/nli-MiniLM2-L6-H768` | 0.7119 | 0.8660 | 0.8683 | <u>0.9137</u> | 0.3090 | 0.2850 | 0.2867 | 0.5830 | 0.7905 | 2885.72 | 566.64 | |
| | `cross-encoder/nli-distilroberta-base` | 0.6936 | 0.8365 | 0.8398 | 0.8996 | 0.2660 | 0.2810 | 0.2975 | 0.5516 | 0.7516 | 2838.17 | 566.64 | |
|
|
| ----- |
|
|
| # Usage |
|
|
| ### Label Map: |
| - `entailment`: 0 |
| - `neutral`: 1 |
| - `contradiction`: 2 |
|
|
|
|
| ## Direct Usage (Sentence Transformers) |
|
|
| First install the Sentence Transformers library: |
|
|
| ```bash |
| pip install -U sentence-transformers |
| ``` |
|
|
| Then you can load the model and run inference. |
|
|
| ```python |
| from sentence_transformers import CrossEncoder |
| import numpy as np |
| |
| model = CrossEncoder("dleemiller/finecat-nli-s") |
| id2label = model.model.config.id2label # {0:'entailment', 1:'neutral', 2:'contradiction'} |
| |
| pairs = [ |
| ("The glass fell off the counter and shattered on the tile.", |
| "The glass broke when it hit the floor."), # E |
| ("The store opens at 9 a.m. every day.", |
| "The store opens at 7 a.m. on weekdays."), # C |
| ("A researcher presented results at the conference.", |
| "The presentation won the best paper award."), # N |
| ("It started raining heavily, so the match was postponed.", |
| "The game was delayed due to weather."), # E |
| ("Every seat on the flight was taken.", |
| "There were several empty seats on the plane."), # C |
| ] |
| |
| logits = model.predict(pairs) # shape: (5, 3) |
| |
| for (prem, hyp), row in zip(pairs, logits): |
| pred_idx = int(np.argmax(row)) |
| pred = id2label[pred_idx] |
| print(f"[{pred}] Premise: {prem} | Hypothesis: {hyp}") |
| |
| ``` |
|
|
|
|
| ## Acknowledgments |
|
|
| We thank the creators and contributors of `tasksource` and `MoritzLaurer` for making their work available. |
| This model would not be possible without their efforts and open source contributions. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @misc{nli-compiled-2025, |
| title = {FineCat NLI Dataset}, |
| author = {Lee Miller}, |
| year = {2025}, |
| howpublished = {Refined compilation of 6 major NLI datasets} |
| } |
| ``` |