NBR-1B-Portuguese-MCQ
O melhor modelo ~1B parametros para questoes de multipla escolha em Portugues.
Highlights
- #1 em ENEM (30.53%) - Vestibular brasileiro
- #1 em OAB (49.70%) - Exame da Ordem dos Advogados
- #1 em BLUEX (36.54%) - Vestibulares universitarios
- #1 em tweetSentBR (39.70%) - Analise de sentimento
Benchmarks
| Benchmark | Score | Ranking |
|---|---|---|
| ENEM | 30.53% | #1 |
| OAB Exams | 49.70% | #1 |
| BLUEX | 36.54% | #1 |
| TweetSentBR | 39.70% | #1 |
| FAQUAD NLI | 45.55% | Top 3 |
| HateBR | 43.18% | Top 5 |
| PT Hate Speech | 41.99% | Top 5 |
| ASSIN2 RTE | 34.27% | - |
| ASSIN2 STS | 0.99% | - |
| Average | 35.83% | - |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("limajr/NBR-1B-Portuguese-MCQ")
tokenizer = AutoTokenizer.from_pretrained("limajr/NBR-1B-Portuguese-MCQ")
Model Details
- Architecture: LlamaForCausalLM
- Parameters: ~1.5B
- Hidden Size: 2048
- Layers: 24
- Attention Heads: 16
- Language: Portuguese (pt-BR)
- Training: Supervised Fine-Tuning on Brazilian educational content
- License: Apache 2.0
Training Data
Fine-tuned on curated Portuguese datasets including:
- Brazilian educational materials
- Legal texts (OAB preparation)
- General knowledge QA
Evaluation
Evaluated on the Open PT LLM Leaderboard using the standard evaluation protocol.
Limitations
- Optimized for multiple choice questions
- ASSIN2 STS (semantic similarity) performance is limited
- Best used for Portuguese educational contexts
- Downloads last month
- 17