Instructions to use ottema/bert-addresses-brazil with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ottema/bert-addresses-brazil with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="ottema/bert-addresses-brazil")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("ottema/bert-addresses-brazil") model = AutoModelForTokenClassification.from_pretrained("ottema/bert-addresses-brazil") - Notebooks
- Google Colab
- Kaggle
BERT NER - Brazilian Addresses
Token classification model fine-tuned from BERTimbau for Named Entity Recognition of Brazilian addresses.
Supported Entities
| Label | Description |
|---|---|
| RUA | Street / Avenue / Road name |
| NUMERO | Street number |
| BAIRRO | Neighborhood |
| CIDADE | City |
| ESTADO | State (UF) |
| CEP | ZIP code |
| COMPLEMENTO | Address complement (apartment, block, lot, etc.) |
| REFERENCIA | Reference point / landmark |
Benchmark
| Entity | Precision | Recall | F1 |
|---|---|---|---|
| RUA | 1.0000 | 1.0000 | 1.0000 |
| NUMERO | 1.0000 | 1.0000 | 1.0000 |
| BAIRRO | 1.0000 | 1.0000 | 1.0000 |
| CIDADE | 1.0000 | 1.0000 | 1.0000 |
| ESTADO | 1.0000 | 1.0000 | 1.0000 |
| CEP | 1.0000 | 1.0000 | 1.0000 |
| COMPLEMENTO | 0.8571 | 0.6000 | 0.7059 |
| REFERENCIA | 0.8182 | 0.9000 | 0.8571 |
| Overall | 0.9744 | 0.9580 | 0.9661 |
Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
tokenizer = AutoTokenizer.from_pretrained("ottema/bert-addresses-brazil")
model = AutoModelForTokenClassification.from_pretrained("ottema/bert-addresses-brazil")
text = "Rua das Flores 123, Apto 402, Centro, Sao Paulo - SP. CEP 01310-100"
encoding = tokenizer(text, return_tensors="pt", return_offsets_mapping=True, truncation=True, max_length=128)
offsets = encoding["offset_mapping"][0].tolist()
with torch.no_grad():
logits = model(input_ids=encoding["input_ids"], attention_mask=encoding["attention_mask"]).logits
preds = torch.argmax(logits, dim=-1)[0].tolist()
id2label = model.config.id2label
entities = []
current_type = None
current_start = None
current_end = None
for pred, (start, end) in zip(preds, offsets):
if start == end:
continue
label = id2label[str(pred)]
if label.startswith("B-"):
if current_type:
entities.append((current_type, text[current_start:current_end].strip()))
current_type = label[2:]
current_start = start
current_end = end
elif label.startswith("I-") and current_type == label[2:]:
current_end = end
else:
if current_type:
entities.append((current_type, text[current_start:current_end].strip()))
current_type = None
if current_type:
entities.append((current_type, text[current_start:current_end].strip()))
for entity_type, value in entities:
print(f"{entity_type}: {value}")
Output:
RUA: Rua das Flores
NUMERO: 123
COMPLEMENTO: Apto 402
BAIRRO: Centro
CIDADE: Sao Paulo
ESTADO: SP
CEP: 01310-100
Training Details
- Base model: neuralmind/bert-base-portuguese-cased (BERTimbau)
- Epochs: 4
- Batch size: 16
- Learning rate: 2e-5
- Dropout: 0.2
- Weight decay: 0.05
- Label smoothing: 0.1
- Early stopping patience: 2
- Downloads last month
- 152
Model tree for ottema/bert-addresses-brazil
Base model
neuralmind/bert-base-portuguese-cased