File size: 4,660 Bytes
d568351 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 | ---
language: es
tags:
- intent-classification
- slot-filling
- joint-bert
- spanish
- economics
- chile
- multi-head
license: mit
base_model: microsoft/mdeberta-v3-base
pipeline_tag: token-classification
---
# PIBot Joint BERT
Modelo **Joint BERT multi-head** para clasificación de intención y slot filling,
especializado en consultas sobre indicadores macroeconómicos del Banco Central de Chile.
## Arquitectura
| Componente | Detalle |
|---|---|
| Base | `microsoft/mdeberta-v3-base` |
| Task | `pibimacecv3` |
| Intent heads | 5 (`activity`, `calc_mode`, `investment`, `region`, `req_form`) |
| Slot labels | 15 (BIO) |
| Custom code | `modeling_jointbert.py`, `module.py` |
### Intent Heads
| Head | Clases | Valores |
|---|---|---|
| `activity` | 3 | `none`, `specific`, `general` |
| `calc_mode` | 4 | `original`, `prev_period`, `yoy`, `contribution` |
| `investment` | 3 | `none`, `specific`, `general` |
| `region` | 3 | `none`, `specific`, `general` |
| `req_form` | 3 | `latest`, `point`, `range` |
### Slot Entities (BIO)
Entidades extraídas: `activity`, `frequency`, `indicator`, `investment`, `period`, `region`, `seasonality`
Esquema BIO completo: 15 etiquetas (`O`, `B-*`, `I-*`).
## Uso
### Instalación
```bash
pip install torch transformers
```
### Carga del Modelo
```python
import torch
from transformers import AutoTokenizer, AutoConfig
# Cargar tokenizer y config
tokenizer = AutoTokenizer.from_pretrained("BCCh/pibert", trust_remote_code=True)
config = AutoConfig.from_pretrained("BCCh/pibert", trust_remote_code=True)
# Cargar labels desde el repo
from huggingface_hub import hf_hub_download
import os
label_dir = os.path.dirname(hf_hub_download("BCCh/pibert", "labels/slot_label.txt"))
# Leer intent y slot labels
def read_labels(path):
with open(path) as f:
return [line.strip() for line in f if line.strip()]
slot_labels = read_labels(os.path.join(label_dir, "slot_label.txt"))
# Preparar intent_label_lst para cada head
intent_label_lst = []
for head in ['activity', 'calc_mode', 'investment', 'region', 'req_form']:
intent_label_lst.append(read_labels(os.path.join(label_dir, f"{head}_label.txt")))
# Cargar modelo con custom code
from transformers import AutoModelForTokenClassification
from modeling_jointbert import JointBERT # auto-cargado con trust_remote_code
model = JointBERT.from_pretrained(
"BCCh/pibert",
config=config,
intent_label_lst=intent_label_lst,
slot_label_lst=slot_labels,
trust_remote_code=True,
)
model.eval()
```
### Predicción
```python
text = "cuál fue el imacec de agosto 2024"
tokens = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
outputs = model(**tokens)
# outputs contiene intent_logits (lista) y slot_logits
```
## Estructura del Paquete
```
model_package/
├── config.json # Configuración BERT + task
├── model.safetensors # Pesos del modelo
├── tokenizer.json # Tokenizer
├── tokenizer_config.json
├── special_tokens_map.json
├── vocab.txt
├── modeling_jointbert.py # Arquitectura JointBERT (custom)
├── module.py # CRF y módulos auxiliares
├── __init__.py
├── README.md # Este archivo
└── labels/
├── slot_label.txt
├── activity_label.txt
├── calc_mode_label.txt
├── investment_label.txt
├── region_label.txt
├── req_form_label.txt
```
## Datos de Entrenamiento
Entrenado con datos de consultas sobre indicadores macroeconómicos chilenos:
- **IMACEC** (Indicador Mensual de Actividad Económica)
- **PIB** (Producto Interno Bruto)
- Sectores económicos, frecuencias, períodos, regiones
## Limitaciones
- Especializado en consultas macroeconómicas del Banco Central de Chile
- Mejor rendimiento en consultas cortas (< 50 tokens)
- Requiere `trust_remote_code=True` por la arquitectura custom
## Cita
```bibtex
@misc{pibot-jointbert,
author = {Banco Central de Chile},
title = {PIBot Joint BERT - Multi-head Intent + Slot Filling},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/BCCh/pibert}}
}
```
## Referencias
- [BERT for Joint Intent Classification and Slot Filling](https://arxiv.org/abs/1902.10909)
- [JointBERT implementation](https://github.com/monologg/JointBERT)
- [BETO: Spanish BERT](https://github.com/dccuchile/beto)
## Licencia
MIT License
|