---
library_name: transformers
license: mit
datasets:
- hapaxlegomenon/InferBR
language:
- pt
base_model:
- neuralmind/bert-large-portuguese-cased
---
# Model Card: BERT-Large-Portuguese-Cased Fine-Tuned on InferBR NLI

## Model Details
- **Model name:** `felipesfpaula/bertimbau-large-InferBr-NLI`
- **Base model:** `neuralmind/bert-large-portuguese-cased`
- **Task:** Natural Language Inference (NLI) on Brazilian Portuguese
- **Dataset:** [InferBR](https://huggingface.co/datasets/hapaxlegomenon/InferBR)  
  - Premise–Hypothesis pairs in Portuguese  
  - Label mapping:
    - 0 – Contradiction  
    - 1 – Entailment  
    - 2 – Neutral  

## Intended Use
This model is intended for research and applications requiring Portuguese NLI, such as:
- Automated textual reasoning in Portuguese  
- Downstream tasks: question answering, summarization consistency checks, semantic search  
- Academic experiments in Portuguese natural language understanding  

**Not intended for:**
- Sensitive decision-making without human oversight  
- Use on texts in languages other than Brazilian Portuguese  

## Training Data
- **Training split:**  InferBR “train” (premise, hypothesis, label)  
- **Validation split:** InferBR “validation”  
- **Test split:** InferBR “test”  
- **Preprocessing:**  
  - Tokenized with `neuralmind/bert-large-portuguese-cased` tokenizer  
  - Maximum sequence length: 128 tokens  
  - Padding to max length  
  - Labels cast to integer IDs `{0,1,2}`  

## Training Procedure
- **Fine-tuned on:** `neuralmind/bert-large-portuguese-cased`  
- **Batch size:** 32  
- **Learning rate:** 2e-5  
- **Optimizer:** AdamW (with default weight decay)  
- **Number of epochs:** 10  
- **Evaluation strategy:** Evaluate on validation split at end of each epoch  
- **Checkpointing:** Saved best model by validation accuracy  
- **Random seed:** 42  

## Evaluation Results (Test Set)
- **Test accuracy:** 0.9395  
- **Test F₁‐macro:** 0.7596  
  - **F₁ label 0 (Contradiction):** 0.9191  
  - **F₁ label 1 (Entailment):** 0.6022  
  - **F₁ label 2 (Neutral):** 0.7575  

These metrics were computed on the held‐out InferBR test split.  
- `accuracy` = (number of correctly predicted labels) / (total number of examples)  
- `f1_macro` = unweighted average F₁ across labels {0,1,2}  

## Limitations
- **Imbalanced performance:** Label 1 (Entailment) has lower F₁ (0.6022), indicating the model sometimes confuses entailment examples.  
- **Domain specificity:** Trained on InferBR, which consists of generic NLI pairs. May not generalize to highly specialized or technical domains (e.g., legal, medical).  
- **Language restrictions:** Only supports Brazilian Portuguese. Performance on European Portuguese or code‐switched text is not guaranteed.  
- **Bias and fairness:** InferBR may contain topics or writing styles that do not cover all registers of Portuguese. Use caution if deploying in production for sensitive tasks.  

## How to Use
```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# 1. Load tokenizer and model from HuggingFace
tokenizer = AutoTokenizer.from_pretrained("felipesfpaula/bertimbau-large-InferBr-NLI")
model = AutoModelForSequenceClassification.from_pretrained("felipesfpaula/bertimbau-large-InferBr-NLI")

# 2. Encode a premise–hypothesis pair
premise = "O gato está sentado no sofá."
hypothesis = "O gato está deitado no sofá."
encoded = tokenizer(premise, hypothesis, return_tensors="pt", max_length=128, truncation=True, padding="max_length")

# 3. Run inference
with torch.no_grad():
    outputs = model(**encoded)
    logits = outputs.logits
    pred_id = torch.argmax(logits, dim=-1).item()

# 4. Map prediction to label
label_map = {0: "Contradiction", 1: "Entailment", 2: "Neutral"}
print(f"Predicted label: {label_map[pred_id]}")