Update README.md

951993b verified 8 months ago

3.87 kB

	---
	library_name: transformers
	license: mit
	datasets:
	- hapaxlegomenon/InferBR
	language:
	- pt
	base_model:
	- neuralmind/bert-large-portuguese-cased
	---
	# Model Card: BERT-Large-Portuguese-Cased Fine-Tuned on InferBR NLI

	## Model Details
	- Model name: `felipesfpaula/bertimbau-large-InferBr-NLI`
	- Base model: `neuralmind/bert-large-portuguese-cased`
	- Task: Natural Language Inference (NLI) on Brazilian Portuguese
	- Dataset: [InferBR](https://huggingface.co/datasets/hapaxlegomenon/InferBR)
	- Premise–Hypothesis pairs in Portuguese
	- Label mapping:
	- 0 – Contradiction
	- 1 – Entailment
	- 2 – Neutral

	## Intended Use
	This model is intended for research and applications requiring Portuguese NLI, such as:
	- Automated textual reasoning in Portuguese
	- Downstream tasks: question answering, summarization consistency checks, semantic search
	- Academic experiments in Portuguese natural language understanding

	Not intended for:
	- Sensitive decision-making without human oversight
	- Use on texts in languages other than Brazilian Portuguese

	## Training Data
	- Training split: InferBR “train” (premise, hypothesis, label)
	- Validation split: InferBR “validation”
	- Test split: InferBR “test”
	- Preprocessing:
	- Tokenized with `neuralmind/bert-large-portuguese-cased` tokenizer
	- Maximum sequence length: 128 tokens
	- Padding to max length
	- Labels cast to integer IDs `{0,1,2}`

	## Training Procedure
	- Fine-tuned on: `neuralmind/bert-large-portuguese-cased`
	- Batch size: 32
	- Learning rate: 2e-5
	- Optimizer: AdamW (with default weight decay)
	- Number of epochs: 10
	- Evaluation strategy: Evaluate on validation split at end of each epoch
	- Checkpointing: Saved best model by validation accuracy
	- Random seed: 42

	## Evaluation Results (Test Set)
	- Test accuracy: 0.9395
	- Test F₁‐macro: 0.7596
	- F₁ label 0 (Contradiction): 0.9191
	- F₁ label 1 (Entailment): 0.6022
	- F₁ label 2 (Neutral): 0.7575

	These metrics were computed on the held‐out InferBR test split.
	- `accuracy` = (number of correctly predicted labels) / (total number of examples)
	- `f1_macro` = unweighted average F₁ across labels {0,1,2}

	## Limitations
	- Imbalanced performance: Label 1 (Entailment) has lower F₁ (0.6022), indicating the model sometimes confuses entailment examples.
	- Domain specificity: Trained on InferBR, which consists of generic NLI pairs. May not generalize to highly specialized or technical domains (e.g., legal, medical).
	- Language restrictions: Only supports Brazilian Portuguese. Performance on European Portuguese or code‐switched text is not guaranteed.
	- Bias and fairness: InferBR may contain topics or writing styles that do not cover all registers of Portuguese. Use caution if deploying in production for sensitive tasks.

	## How to Use
	```python
	from transformers import AutoTokenizer, AutoModelForSequenceClassification
	import torch

	# 1. Load tokenizer and model from HuggingFace
	tokenizer = AutoTokenizer.from_pretrained("felipesfpaula/bertimbau-large-InferBr-NLI")
	model = AutoModelForSequenceClassification.from_pretrained("felipesfpaula/bertimbau-large-InferBr-NLI")

	# 2. Encode a premise–hypothesis pair
	premise = "O gato está sentado no sofá."
	hypothesis = "O gato está deitado no sofá."
	encoded = tokenizer(premise, hypothesis, return_tensors="pt", max_length=128, truncation=True, padding="max_length")

	# 3. Run inference
	with torch.no_grad():
	outputs = model(**encoded)
	logits = outputs.logits
	pred_id = torch.argmax(logits, dim=-1).item()

	# 4. Map prediction to label
	label_map = {0: "Contradiction", 1: "Entailment", 2: "Neutral"}
	print(f"Predicted label: {label_map[pred_id]}")