Instructions to use ottema/gliner2-ptbr with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- GLiNER
How to use ottema/gliner2-ptbr with GLiNER:
from gliner import GLiNER model = GLiNER.from_pretrained("ottema/gliner2-ptbr") - GLiNER2
How to use ottema/gliner2-ptbr with GLiNER2:
from gliner2 import GLiNER2 model = GLiNER2.from_pretrained("ottema/gliner2-ptbr") # Extract entities text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino yesterday." result = extractor.extract_entities(text, ["company", "person", "product", "location"]) print(result) - Notebooks
- Google Colab
- Kaggle
ottema/gliner2-ptbr (v0.4 — generalist)
Open-vocabulary NER for Brazilian Portuguese, fine-tuned for informal and operational text (chat, atendimento, suporte).
This is the generalist release. For HAREM-specialized (best entity F1 among compared models), see ottema/gliner2-ptbr-harem. For ontology-guided evidence extraction, see ottema/gliner2-ptbr-ontoevidence.
Model details
- Base:
fastino/gliner2-multi-v1(Apache-2.0) - Type: GLiNER2 (bidirectional encoder + span scoring)
- Language: pt-BR (with English fallback)
- Size: ~307M parameters
- License: Apache-2.0
Intended use
General-purpose open-vocabulary NER for informal and operational Brazilian Portuguese text: atendimento, chat, suporte técnico, educação. Trained on synthetic data covering pessoas, profissões, locais, organizações, documentos, produtos, marcas, tecnologias, telefones, e-mails, datas, e valores monetários.
If you need a model benchmarked on journalistic Portuguese (HAREM), use ottema/gliner2-ptbr-harem instead.
Usage
from gliner2 import GLiNER2
model = GLiNER2.from_pretrained("ottema/gliner2-ptbr")
text = "A professora Ana comprou um notebook Dell em Campinas no dia 12/06."
labels = ["pessoa", "profissão", "produto", "marca", "local", "data"]
entities = model.extract_entities(text, labels, threshold=0.5)
for label, spans in entities["entities"].items():
for span in spans:
print(f"{span} -> {label}")
Performance
Evaluation on data/gliner_ptbr_core/test.jsonl (synthetic generalist benchmark, threshold 0.3):
| Model | entity_F1 | span_F1 | label_F1 |
|---|---|---|---|
fastino/gliner2-multi-v1 (zero-shot) |
0.9333 | 0.9347 | 0.9855 |
ottema/gliner2-ptbr (v0.4) |
0.9976 | 0.9976 | 1.0000 |
On HAREM (163 samples, 2511 entities, journalistic PT-BR — out-of-distribution for this generalist):
| Model | entity_F1 | Δ vs baseline |
|---|---|---|
fastino/gliner2-multi-v1 (zero-shot) |
0.4271 | (reference) |
ottema/gliner2-ptbr (v0.4) |
0.4132 | -1.39 pp |
The generalist is best for the synthetic informal-PT-BR distribution it was trained on. For journalistic text, see the HAREM-specialized model.
Inference
- GPU: ~30 ms per text (median, 32 batch)
- CPU: ~50 ms per short text (≤128 tokens)
Limitations
- Trained primarily on synthetic data; coverage may be limited in highly specialized domains.
- Performance may degrade on very long texts (>512 tokens).
- Not a substitute for domain-specific classifiers in regulated workflows.
- Out-of-distribution on journalistic text (use HAREM-specialized instead).
Credits
- Base architecture: GLiNER2 (Urchade Zaratiana et al.)
- Base weights:
fastino/gliner2-multi-v1(Fastino) - Encoder: microsoft/mdeberta-v3-base
- Fine-tuning + datasets: Ottema
License
Apache-2.0
See also
ottema/gliner2-ptbr-harem— HAREM-specialized (best entity F1)ottema/gliner2-ptbr-ontoevidence— ontology-guided evidence extraction (in development)ottema/gliner2-ptbr-ontoevidence-data— OntoEvidence-BR dataset
- Downloads last month
- -
Model tree for ottema/gliner2-ptbr
Base model
fastino/gliner2-multi-v1