| | ---
|
| | language: es
|
| | tags:
|
| | - intent-classification
|
| | - slot-filling
|
| | - joint-bert
|
| | - spanish
|
| | - economics
|
| | - chile
|
| | - multi-head
|
| | license: mit
|
| | base_model: microsoft/mdeberta-v3-base
|
| | pipeline_tag: token-classification
|
| | ---
|
| |
|
| | # PIBot Joint BERT
|
| |
|
| | Modelo **Joint BERT multi-head** para clasificación de intención y slot filling,
|
| | especializado en consultas sobre indicadores macroeconómicos del Banco Central de Chile.
|
| |
|
| | ## Arquitectura
|
| |
|
| | | Componente | Detalle |
|
| | |---|---|
|
| | | Base | `microsoft/mdeberta-v3-base` |
|
| | | Task | `pibimacecv3` |
|
| | | Intent heads | 5 (`activity`, `calc_mode`, `investment`, `region`, `req_form`) |
|
| | | Slot labels | 15 (BIO) |
|
| | | Custom code | `modeling_jointbert.py`, `module.py` |
|
| |
|
| | ### Intent Heads
|
| |
|
| | | Head | Clases | Valores |
|
| | |---|---|---|
|
| | | `activity` | 3 | `none`, `specific`, `general` |
|
| | | `calc_mode` | 4 | `original`, `prev_period`, `yoy`, `contribution` |
|
| | | `investment` | 3 | `none`, `specific`, `general` |
|
| | | `region` | 3 | `none`, `specific`, `general` |
|
| | | `req_form` | 3 | `latest`, `point`, `range` |
|
| |
|
| | ### Slot Entities (BIO)
|
| |
|
| | Entidades extraídas: `activity`, `frequency`, `indicator`, `investment`, `period`, `region`, `seasonality`
|
| |
|
| | Esquema BIO completo: 15 etiquetas (`O`, `B-*`, `I-*`).
|
| |
|
| | ## Uso
|
| |
|
| | ### Instalación
|
| |
|
| | ```bash
|
| | pip install torch transformers
|
| | ```
|
| |
|
| | ### Carga del Modelo
|
| |
|
| | ```python
|
| | import torch
|
| | from transformers import AutoTokenizer, AutoConfig
|
| |
|
| | # Cargar tokenizer y config
|
| | tokenizer = AutoTokenizer.from_pretrained("BCCh/pibert", trust_remote_code=True)
|
| | config = AutoConfig.from_pretrained("BCCh/pibert", trust_remote_code=True)
|
| |
|
| | # Cargar labels desde el repo
|
| | from huggingface_hub import hf_hub_download
|
| | import os
|
| |
|
| | label_dir = os.path.dirname(hf_hub_download("BCCh/pibert", "labels/slot_label.txt"))
|
| |
|
| | # Leer intent y slot labels
|
| | def read_labels(path):
|
| | with open(path) as f:
|
| | return [line.strip() for line in f if line.strip()]
|
| |
|
| | slot_labels = read_labels(os.path.join(label_dir, "slot_label.txt"))
|
| |
|
| | # Preparar intent_label_lst para cada head
|
| | intent_label_lst = []
|
| | for head in ['activity', 'calc_mode', 'investment', 'region', 'req_form']:
|
| | intent_label_lst.append(read_labels(os.path.join(label_dir, f"{head}_label.txt")))
|
| |
|
| | # Cargar modelo con custom code
|
| | from transformers import AutoModelForTokenClassification
|
| | from modeling_jointbert import JointBERT # auto-cargado con trust_remote_code
|
| |
|
| | model = JointBERT.from_pretrained(
|
| | "BCCh/pibert",
|
| | config=config,
|
| | intent_label_lst=intent_label_lst,
|
| | slot_label_lst=slot_labels,
|
| | trust_remote_code=True,
|
| | )
|
| | model.eval()
|
| | ```
|
| |
|
| | ### Predicción
|
| |
|
| | ```python
|
| | text = "cuál fue el imacec de agosto 2024"
|
| | tokens = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
|
| |
|
| | with torch.no_grad():
|
| | outputs = model(**tokens)
|
| | # outputs contiene intent_logits (lista) y slot_logits
|
| | ```
|
| |
|
| | ## Estructura del Paquete
|
| |
|
| | ```
|
| | model_package/
|
| | ├── config.json # Configuración BERT + task
|
| | ├── model.safetensors # Pesos del modelo
|
| | ├── tokenizer.json # Tokenizer
|
| | ├── tokenizer_config.json
|
| | ├── special_tokens_map.json
|
| | ├── vocab.txt
|
| | ├── modeling_jointbert.py # Arquitectura JointBERT (custom)
|
| | ├── module.py # CRF y módulos auxiliares
|
| | ├── __init__.py
|
| | ├── README.md # Este archivo
|
| | └── labels/
|
| | ├── slot_label.txt
|
| | ├── activity_label.txt
|
| | ├── calc_mode_label.txt
|
| | ├── investment_label.txt
|
| | ├── region_label.txt
|
| | ├── req_form_label.txt
|
| | ```
|
| |
|
| | ## Datos de Entrenamiento
|
| |
|
| | Entrenado con datos de consultas sobre indicadores macroeconómicos chilenos:
|
| | - **IMACEC** (Indicador Mensual de Actividad Económica)
|
| | - **PIB** (Producto Interno Bruto)
|
| | - Sectores económicos, frecuencias, períodos, regiones
|
| |
|
| | ## Limitaciones
|
| |
|
| | - Especializado en consultas macroeconómicas del Banco Central de Chile
|
| | - Mejor rendimiento en consultas cortas (< 50 tokens)
|
| | - Requiere `trust_remote_code=True` por la arquitectura custom
|
| |
|
| | ## Cita
|
| |
|
| | ```bibtex
|
| | @misc{pibot-jointbert,
|
| | author = {Banco Central de Chile},
|
| | title = {PIBot Joint BERT - Multi-head Intent + Slot Filling},
|
| | year = {2025},
|
| | publisher = {Hugging Face},
|
| | howpublished = {\url{https://huggingface.co/BCCh/pibert}}
|
| | }
|
| | ```
|
| |
|
| | ## Referencias
|
| |
|
| | - [BERT for Joint Intent Classification and Slot Filling](https://arxiv.org/abs/1902.10909)
|
| | - [JointBERT implementation](https://github.com/monologg/JointBERT)
|
| | - [BETO: Spanish BERT](https://github.com/dccuchile/beto)
|
| |
|
| | ## Licencia
|
| |
|
| | MIT License
|
| |
|