XLM-RoBERTa Base Fine-tuned on XNLI for Natural Language Inference
This model is a fine-tuned version of FacebookAI/xlm-roberta-base for 3-way Natural Language Inference using the facebook/xnli dataset.
The model predicts the logical relationship between a premise and a hypothesis:
| Label ID | Label |
|---|---|
| 0 | entailment |
| 1 | neutral |
| 2 | contradiction |
Model Details
- Base model:
FacebookAI/xlm-roberta-base - Architecture: XLM-RoBERTa for sequence classification
- Task: Natural Language Inference / Textual Entailment
- Number of labels: 3
- Parameters: 278,045,955
- Tokenizer: Fast XLM-RoBERTa tokenizer
- Maximum sequence length: 512
- Fine-tuning dataset:
facebook/xnli - Dataset config used:
all_languages
Although the all_languages configuration was loaded, the main training and validation/test preprocessing selected the English text when multilingual translation dictionaries were present. Therefore, the primary fine-tuning run is best described as English XNLI fine-tuning using the XNLI all_languages source, with additional per-language evaluation performed afterward.
Intended Use
This model is intended for Natural Language Inference. Given a premise and a hypothesis, it predicts whether the hypothesis is:
- entailed by the premise,
- neutral with respect to the premise, or
- contradicted by the premise.
Example use cases include:
- entailment detection,
- contradiction detection,
- claim verification pipelines,
- semantic consistency checking,
- zero-shot-style classification pipelines that rely on NLI.
Training Data
The model was fine-tuned on the facebook/xnli dataset.
Dataset split sizes used in the notebook:
| Split | Rows |
|---|---|
| Train | 392,702 |
| Validation | 2,490 |
| Test | 5,010 |
The dataset is balanced across the three labels.
Training split label distribution:
| Label | Count |
|---|---|
| entailment | 130,899 |
| neutral | 130,900 |
| contradiction | 130,903 |
Validation split label distribution:
| Label | Count |
|---|---|
| entailment | 830 |
| neutral | 830 |
| contradiction | 830 |
Test split label distribution:
| Label | Count |
|---|---|
| entailment | 1,670 |
| neutral | 1,670 |
| contradiction | 1,670 |
Training Procedure
The model was fine-tuned using Hugging Face Trainer.
Hyperparameters
| Hyperparameter | Value |
|---|---|
| Epochs | 3 |
| Learning rate | 2e-5 |
| Train batch size | 64 |
| Eval batch size | 64 |
| Gradient accumulation steps | 1 |
| Weight decay | 0.01 |
| Warmup ratio | 0.06 |
| LR scheduler | linear |
| Max sequence length | 512 |
| Max gradient norm | 1.0 |
| Seed | 42 |
| Mixed precision | bf16 |
| Optimizer | AdamW |
| Metric for best model | macro F1 |
Runtime Environment
| Component | Value |
|---|---|
| Python | 3.12.6 |
| PyTorch | 2.8.0+cu129 |
| Transformers | 4.56.0 |
| Datasets | 4.8.5 |
| GPU | NVIDIA A100-SXM4-40GB |
Evaluation Results
Validation Set
| Metric | Score |
|---|---|
| Loss | 0.4421 |
| Accuracy | 0.8349 |
| Macro F1 | 0.8358 |
| Weighted F1 | 0.8358 |
| Macro Precision | 0.8411 |
| Macro Recall | 0.8349 |
Test Set
| Metric | Score |
|---|---|
| Loss | 0.4383 |
| Accuracy | 0.8421 |
| Macro F1 | 0.8426 |
| Weighted F1 | 0.8426 |
| Macro Precision | 0.8481 |
| Macro Recall | 0.8421 |
Test Classification Report
| Label | Precision | Recall | F1 | Support |
|---|---|---|---|---|
| entailment | 0.9130 | 0.7850 | 0.8442 | 1,670 |
| neutral | 0.7733 | 0.8659 | 0.8169 | 1,670 |
| contradiction | 0.8580 | 0.8754 | 0.8666 | 1,670 |
| accuracy | 0.8421 | 5,010 | ||
| macro avg | 0.8481 | 0.8421 | 0.8426 | 5,010 |
| weighted avg | 0.8481 | 0.8421 | 0.8426 | 5,010 |
Per-Language Evaluation
The model was also evaluated on the 15 XNLI languages by retokenizing the validation/test pairs for each language.
| Language | Accuracy | Macro F1 | Weighted F1 |
|---|---|---|---|
| en | 0.8421 | 0.8426 | 0.8426 |
| es | 0.7495 | 0.7454 | 0.7454 |
| fr | 0.7489 | 0.7452 | 0.7452 |
| ru | 0.7253 | 0.7162 | 0.7162 |
| de | 0.7228 | 0.7135 | 0.7135 |
| bg | 0.7130 | 0.7022 | 0.7022 |
| vi | 0.6902 | 0.6751 | 0.6751 |
| zh | 0.6926 | 0.6750 | 0.6750 |
| th | 0.6727 | 0.6502 | 0.6502 |
| el | 0.6727 | 0.6497 | 0.6497 |
| tr | 0.6429 | 0.6149 | 0.6149 |
| hi | 0.6100 | 0.5697 | 0.5697 |
| ar | 0.6004 | 0.5491 | 0.5491 |
| ur | 0.5569 | 0.4946 | 0.4946 |
| sw | 0.5343 | 0.4724 | 0.4724 |
The strongest performance is on English, which is expected because the main training preprocessing selected English text from the multilingual examples.
How to Use
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_id = "AyoubChLin/xlm-roberta-base-xnli-nli"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSequenceClassification.from_pretrained(model_id)
premise = "A soccer game with multiple males playing."
hypothesis = "Some men are playing a sport."
inputs = tokenizer(
premise,
hypothesis,
return_tensors="pt",
truncation=True,
max_length=512,
)
with torch.no_grad():
logits = model(**inputs).logits
probs = torch.softmax(logits, dim=-1)
pred_id = int(torch.argmax(probs, dim=-1))
id2label = {
0: "entailment",
1: "neutral",
2: "contradiction",
}
print(id2label[pred_id])
print(probs)
- Downloads last month
- 14
Model tree for AyoubChLin/xlm-roberta-base-zeroshot-nli
Base model
FacebookAI/xlm-roberta-base