felipesfpaula's picture
Update README.md
951993b verified
---
library_name: transformers
license: mit
datasets:
- hapaxlegomenon/InferBR
language:
- pt
base_model:
- neuralmind/bert-large-portuguese-cased
---
# Model Card: BERT-Large-Portuguese-Cased Fine-Tuned on InferBR NLI
## Model Details
- **Model name:** `felipesfpaula/bertimbau-large-InferBr-NLI`
- **Base model:** `neuralmind/bert-large-portuguese-cased`
- **Task:** Natural Language Inference (NLI) on Brazilian Portuguese
- **Dataset:** [InferBR](https://huggingface.co/datasets/hapaxlegomenon/InferBR)
- Premise–Hypothesis pairs in Portuguese
- Label mapping:
- 0 – Contradiction
- 1 – Entailment
- 2 – Neutral
## Intended Use
This model is intended for research and applications requiring Portuguese NLI, such as:
- Automated textual reasoning in Portuguese
- Downstream tasks: question answering, summarization consistency checks, semantic search
- Academic experiments in Portuguese natural language understanding
**Not intended for:**
- Sensitive decision-making without human oversight
- Use on texts in languages other than Brazilian Portuguese
## Training Data
- **Training split:** InferBR “train” (premise, hypothesis, label)
- **Validation split:** InferBR “validation”
- **Test split:** InferBR “test”
- **Preprocessing:**
- Tokenized with `neuralmind/bert-large-portuguese-cased` tokenizer
- Maximum sequence length: 128 tokens
- Padding to max length
- Labels cast to integer IDs `{0,1,2}`
## Training Procedure
- **Fine-tuned on:** `neuralmind/bert-large-portuguese-cased`
- **Batch size:** 32
- **Learning rate:** 2e-5
- **Optimizer:** AdamW (with default weight decay)
- **Number of epochs:** 10
- **Evaluation strategy:** Evaluate on validation split at end of each epoch
- **Checkpointing:** Saved best model by validation accuracy
- **Random seed:** 42
## Evaluation Results (Test Set)
- **Test accuracy:** 0.9395
- **Test F₁‐macro:** 0.7596
- **F₁ label 0 (Contradiction):** 0.9191
- **F₁ label 1 (Entailment):** 0.6022
- **F₁ label 2 (Neutral):** 0.7575
These metrics were computed on the held‐out InferBR test split.
- `accuracy` = (number of correctly predicted labels) / (total number of examples)
- `f1_macro` = unweighted average F₁ across labels {0,1,2}
## Limitations
- **Imbalanced performance:** Label 1 (Entailment) has lower F₁ (0.6022), indicating the model sometimes confuses entailment examples.
- **Domain specificity:** Trained on InferBR, which consists of generic NLI pairs. May not generalize to highly specialized or technical domains (e.g., legal, medical).
- **Language restrictions:** Only supports Brazilian Portuguese. Performance on European Portuguese or code‐switched text is not guaranteed.
- **Bias and fairness:** InferBR may contain topics or writing styles that do not cover all registers of Portuguese. Use caution if deploying in production for sensitive tasks.
## How to Use
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# 1. Load tokenizer and model from HuggingFace
tokenizer = AutoTokenizer.from_pretrained("felipesfpaula/bertimbau-large-InferBr-NLI")
model = AutoModelForSequenceClassification.from_pretrained("felipesfpaula/bertimbau-large-InferBr-NLI")
# 2. Encode a premise–hypothesis pair
premise = "O gato está sentado no sofá."
hypothesis = "O gato está deitado no sofá."
encoded = tokenizer(premise, hypothesis, return_tensors="pt", max_length=128, truncation=True, padding="max_length")
# 3. Run inference
with torch.no_grad():
outputs = model(**encoded)
logits = outputs.logits
pred_id = torch.argmax(logits, dim=-1).item()
# 4. Map prediction to label
label_map = {0: "Contradiction", 1: "Entailment", 2: "Neutral"}
print(f"Predicted label: {label_map[pred_id]}")