--- language: es tags: - intent-classification - slot-filling - joint-bert - spanish - economics - chile - multi-head license: mit base_model: microsoft/mdeberta-v3-base pipeline_tag: token-classification --- # PIBot Joint BERT Modelo **Joint BERT multi-head** para clasificación de intención y slot filling, especializado en consultas sobre indicadores macroeconómicos del Banco Central de Chile. ## Arquitectura | Componente | Detalle | |---|---| | Base | `microsoft/mdeberta-v3-base` | | Task | `pibimacecv3` | | Intent heads | 5 (`activity`, `calc_mode`, `investment`, `region`, `req_form`) | | Slot labels | 15 (BIO) | | Custom code | `modeling_jointbert.py`, `module.py` | ### Intent Heads | Head | Clases | Valores | |---|---|---| | `activity` | 3 | `none`, `specific`, `general` | | `calc_mode` | 4 | `original`, `prev_period`, `yoy`, `contribution` | | `investment` | 3 | `none`, `specific`, `general` | | `region` | 3 | `none`, `specific`, `general` | | `req_form` | 3 | `latest`, `point`, `range` | ### Slot Entities (BIO) Entidades extraídas: `activity`, `frequency`, `indicator`, `investment`, `period`, `region`, `seasonality` Esquema BIO completo: 15 etiquetas (`O`, `B-*`, `I-*`). ## Uso ### Instalación ```bash pip install torch transformers ``` ### Carga del Modelo ```python import torch from transformers import AutoTokenizer, AutoConfig # Cargar tokenizer y config tokenizer = AutoTokenizer.from_pretrained("BCCh/pibert", trust_remote_code=True) config = AutoConfig.from_pretrained("BCCh/pibert", trust_remote_code=True) # Cargar labels desde el repo from huggingface_hub import hf_hub_download import os label_dir = os.path.dirname(hf_hub_download("BCCh/pibert", "labels/slot_label.txt")) # Leer intent y slot labels def read_labels(path): with open(path) as f: return [line.strip() for line in f if line.strip()] slot_labels = read_labels(os.path.join(label_dir, "slot_label.txt")) # Preparar intent_label_lst para cada head intent_label_lst = [] for head in ['activity', 'calc_mode', 'investment', 'region', 'req_form']: intent_label_lst.append(read_labels(os.path.join(label_dir, f"{head}_label.txt"))) # Cargar modelo con custom code from transformers import AutoModelForTokenClassification from modeling_jointbert import JointBERT # auto-cargado con trust_remote_code model = JointBERT.from_pretrained( "BCCh/pibert", config=config, intent_label_lst=intent_label_lst, slot_label_lst=slot_labels, trust_remote_code=True, ) model.eval() ``` ### Predicción ```python text = "cuál fue el imacec de agosto 2024" tokens = tokenizer(text, return_tensors="pt", padding=True, truncation=True) with torch.no_grad(): outputs = model(**tokens) # outputs contiene intent_logits (lista) y slot_logits ``` ## Estructura del Paquete ``` model_package/ ├── config.json # Configuración BERT + task ├── model.safetensors # Pesos del modelo ├── tokenizer.json # Tokenizer ├── tokenizer_config.json ├── special_tokens_map.json ├── vocab.txt ├── modeling_jointbert.py # Arquitectura JointBERT (custom) ├── module.py # CRF y módulos auxiliares ├── __init__.py ├── README.md # Este archivo └── labels/ ├── slot_label.txt ├── activity_label.txt ├── calc_mode_label.txt ├── investment_label.txt ├── region_label.txt ├── req_form_label.txt ``` ## Datos de Entrenamiento Entrenado con datos de consultas sobre indicadores macroeconómicos chilenos: - **IMACEC** (Indicador Mensual de Actividad Económica) - **PIB** (Producto Interno Bruto) - Sectores económicos, frecuencias, períodos, regiones ## Limitaciones - Especializado en consultas macroeconómicas del Banco Central de Chile - Mejor rendimiento en consultas cortas (< 50 tokens) - Requiere `trust_remote_code=True` por la arquitectura custom ## Cita ```bibtex @misc{pibot-jointbert, author = {Banco Central de Chile}, title = {PIBot Joint BERT - Multi-head Intent + Slot Filling}, year = {2025}, publisher = {Hugging Face}, howpublished = {\url{https://huggingface.co/BCCh/pibert}} } ``` ## Referencias - [BERT for Joint Intent Classification and Slot Filling](https://arxiv.org/abs/1902.10909) - [JointBERT implementation](https://github.com/monologg/JointBERT) - [BETO: Spanish BERT](https://github.com/dccuchile/beto) ## Licencia MIT License