pibert / README.md
smenaaliaga's picture
Upload PIBot Joint BERT model package
d568351 verified
---
language: es
tags:
- intent-classification
- slot-filling
- joint-bert
- spanish
- economics
- chile
- multi-head
license: mit
base_model: microsoft/mdeberta-v3-base
pipeline_tag: token-classification
---
# PIBot Joint BERT
Modelo **Joint BERT multi-head** para clasificación de intención y slot filling,
especializado en consultas sobre indicadores macroeconómicos del Banco Central de Chile.
## Arquitectura
| Componente | Detalle |
|---|---|
| Base | `microsoft/mdeberta-v3-base` |
| Task | `pibimacecv3` |
| Intent heads | 5 (`activity`, `calc_mode`, `investment`, `region`, `req_form`) |
| Slot labels | 15 (BIO) |
| Custom code | `modeling_jointbert.py`, `module.py` |
### Intent Heads
| Head | Clases | Valores |
|---|---|---|
| `activity` | 3 | `none`, `specific`, `general` |
| `calc_mode` | 4 | `original`, `prev_period`, `yoy`, `contribution` |
| `investment` | 3 | `none`, `specific`, `general` |
| `region` | 3 | `none`, `specific`, `general` |
| `req_form` | 3 | `latest`, `point`, `range` |
### Slot Entities (BIO)
Entidades extraídas: `activity`, `frequency`, `indicator`, `investment`, `period`, `region`, `seasonality`
Esquema BIO completo: 15 etiquetas (`O`, `B-*`, `I-*`).
## Uso
### Instalación
```bash
pip install torch transformers
```
### Carga del Modelo
```python
import torch
from transformers import AutoTokenizer, AutoConfig
# Cargar tokenizer y config
tokenizer = AutoTokenizer.from_pretrained("BCCh/pibert", trust_remote_code=True)
config = AutoConfig.from_pretrained("BCCh/pibert", trust_remote_code=True)
# Cargar labels desde el repo
from huggingface_hub import hf_hub_download
import os
label_dir = os.path.dirname(hf_hub_download("BCCh/pibert", "labels/slot_label.txt"))
# Leer intent y slot labels
def read_labels(path):
with open(path) as f:
return [line.strip() for line in f if line.strip()]
slot_labels = read_labels(os.path.join(label_dir, "slot_label.txt"))
# Preparar intent_label_lst para cada head
intent_label_lst = []
for head in ['activity', 'calc_mode', 'investment', 'region', 'req_form']:
intent_label_lst.append(read_labels(os.path.join(label_dir, f"{head}_label.txt")))
# Cargar modelo con custom code
from transformers import AutoModelForTokenClassification
from modeling_jointbert import JointBERT # auto-cargado con trust_remote_code
model = JointBERT.from_pretrained(
"BCCh/pibert",
config=config,
intent_label_lst=intent_label_lst,
slot_label_lst=slot_labels,
trust_remote_code=True,
)
model.eval()
```
### Predicción
```python
text = "cuál fue el imacec de agosto 2024"
tokens = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
outputs = model(**tokens)
# outputs contiene intent_logits (lista) y slot_logits
```
## Estructura del Paquete
```
model_package/
├── config.json # Configuración BERT + task
├── model.safetensors # Pesos del modelo
├── tokenizer.json # Tokenizer
├── tokenizer_config.json
├── special_tokens_map.json
├── vocab.txt
├── modeling_jointbert.py # Arquitectura JointBERT (custom)
├── module.py # CRF y módulos auxiliares
├── __init__.py
├── README.md # Este archivo
└── labels/
├── slot_label.txt
├── activity_label.txt
├── calc_mode_label.txt
├── investment_label.txt
├── region_label.txt
├── req_form_label.txt
```
## Datos de Entrenamiento
Entrenado con datos de consultas sobre indicadores macroeconómicos chilenos:
- **IMACEC** (Indicador Mensual de Actividad Económica)
- **PIB** (Producto Interno Bruto)
- Sectores económicos, frecuencias, períodos, regiones
## Limitaciones
- Especializado en consultas macroeconómicas del Banco Central de Chile
- Mejor rendimiento en consultas cortas (< 50 tokens)
- Requiere `trust_remote_code=True` por la arquitectura custom
## Cita
```bibtex
@misc{pibot-jointbert,
author = {Banco Central de Chile},
title = {PIBot Joint BERT - Multi-head Intent + Slot Filling},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/BCCh/pibert}}
}
```
## Referencias
- [BERT for Joint Intent Classification and Slot Filling](https://arxiv.org/abs/1902.10909)
- [JointBERT implementation](https://github.com/monologg/JointBERT)
- [BETO: Spanish BERT](https://github.com/dccuchile/beto)
## Licencia
MIT License