--- library_name: transformers license: mit datasets: - hapaxlegomenon/InferBR language: - pt base_model: - neuralmind/bert-large-portuguese-cased --- # Model Card: BERT-Large-Portuguese-Cased Fine-Tuned on InferBR NLI ## Model Details - **Model name:** `felipesfpaula/bertimbau-large-InferBr-NLI` - **Base model:** `neuralmind/bert-large-portuguese-cased` - **Task:** Natural Language Inference (NLI) on Brazilian Portuguese - **Dataset:** [InferBR](https://huggingface.co/datasets/hapaxlegomenon/InferBR) - Premise–Hypothesis pairs in Portuguese - Label mapping: - 0 – Contradiction - 1 – Entailment - 2 – Neutral ## Intended Use This model is intended for research and applications requiring Portuguese NLI, such as: - Automated textual reasoning in Portuguese - Downstream tasks: question answering, summarization consistency checks, semantic search - Academic experiments in Portuguese natural language understanding **Not intended for:** - Sensitive decision-making without human oversight - Use on texts in languages other than Brazilian Portuguese ## Training Data - **Training split:** InferBR “train” (premise, hypothesis, label) - **Validation split:** InferBR “validation” - **Test split:** InferBR “test” - **Preprocessing:** - Tokenized with `neuralmind/bert-large-portuguese-cased` tokenizer - Maximum sequence length: 128 tokens - Padding to max length - Labels cast to integer IDs `{0,1,2}` ## Training Procedure - **Fine-tuned on:** `neuralmind/bert-large-portuguese-cased` - **Batch size:** 32 - **Learning rate:** 2e-5 - **Optimizer:** AdamW (with default weight decay) - **Number of epochs:** 10 - **Evaluation strategy:** Evaluate on validation split at end of each epoch - **Checkpointing:** Saved best model by validation accuracy - **Random seed:** 42 ## Evaluation Results (Test Set) - **Test accuracy:** 0.9395 - **Test F₁‐macro:** 0.7596 - **F₁ label 0 (Contradiction):** 0.9191 - **F₁ label 1 (Entailment):** 0.6022 - **F₁ label 2 (Neutral):** 0.7575 These metrics were computed on the held‐out InferBR test split. - `accuracy` = (number of correctly predicted labels) / (total number of examples) - `f1_macro` = unweighted average F₁ across labels {0,1,2} ## Limitations - **Imbalanced performance:** Label 1 (Entailment) has lower F₁ (0.6022), indicating the model sometimes confuses entailment examples. - **Domain specificity:** Trained on InferBR, which consists of generic NLI pairs. May not generalize to highly specialized or technical domains (e.g., legal, medical). - **Language restrictions:** Only supports Brazilian Portuguese. Performance on European Portuguese or code‐switched text is not guaranteed. - **Bias and fairness:** InferBR may contain topics or writing styles that do not cover all registers of Portuguese. Use caution if deploying in production for sensitive tasks. ## How to Use ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification import torch # 1. Load tokenizer and model from HuggingFace tokenizer = AutoTokenizer.from_pretrained("felipesfpaula/bertimbau-large-InferBr-NLI") model = AutoModelForSequenceClassification.from_pretrained("felipesfpaula/bertimbau-large-InferBr-NLI") # 2. Encode a premise–hypothesis pair premise = "O gato está sentado no sofá." hypothesis = "O gato está deitado no sofá." encoded = tokenizer(premise, hypothesis, return_tensors="pt", max_length=128, truncation=True, padding="max_length") # 3. Run inference with torch.no_grad(): outputs = model(**encoded) logits = outputs.logits pred_id = torch.argmax(logits, dim=-1).item() # 4. Map prediction to label label_map = {0: "Contradiction", 1: "Entailment", 2: "Neutral"} print(f"Predicted label: {label_map[pred_id]}")