NBR-1B-Portuguese-MCQ

O melhor modelo ~1B parametros para questoes de multipla escolha em Portugues.

Highlights

  • #1 em ENEM (30.53%) - Vestibular brasileiro
  • #1 em OAB (49.70%) - Exame da Ordem dos Advogados
  • #1 em BLUEX (36.54%) - Vestibulares universitarios
  • #1 em tweetSentBR (39.70%) - Analise de sentimento

Benchmarks

Benchmark Score Ranking
ENEM 30.53% #1
OAB Exams 49.70% #1
BLUEX 36.54% #1
TweetSentBR 39.70% #1
FAQUAD NLI 45.55% Top 3
HateBR 43.18% Top 5
PT Hate Speech 41.99% Top 5
ASSIN2 RTE 34.27% -
ASSIN2 STS 0.99% -
Average 35.83% -

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("limajr/NBR-1B-Portuguese-MCQ")
tokenizer = AutoTokenizer.from_pretrained("limajr/NBR-1B-Portuguese-MCQ")

Model Details

  • Architecture: LlamaForCausalLM
  • Parameters: ~1.5B
  • Hidden Size: 2048
  • Layers: 24
  • Attention Heads: 16
  • Language: Portuguese (pt-BR)
  • Training: Supervised Fine-Tuning on Brazilian educational content
  • License: Apache 2.0

Training Data

Fine-tuned on curated Portuguese datasets including:

  • Brazilian educational materials
  • Legal texts (OAB preparation)
  • General knowledge QA

Evaluation

Evaluated on the Open PT LLM Leaderboard using the standard evaluation protocol.

Limitations

  • Optimized for multiple choice questions
  • ASSIN2 STS (semantic similarity) performance is limited
  • Best used for Portuguese educational contexts
Downloads last month
17
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for limajr/NBR-1B-Portuguese-MCQ

Finetunes
1 model