SaulSosaDiaz commited on
Commit
b1e4c45
·
verified ·
1 Parent(s): 01e1fef

Upload 22 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ model/tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
.gitignore ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ .venv/
2
+ .env
3
+ __pycache__/
LICENSE ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ PROPRIETARY SOFTWARE LICENSE
2
+
3
+ Copyright (C) 2025 Cátedra Cajasiete BigData, OpenData & Blockchain
4
+
5
+ Este software es propiedad de Cátedra Cajasiete BigData, OpenData & Blockchain y está protegido por las leyes de derechos de autor. Su uso está estrictamente restringido según los términos del convenio aplicable.
6
+
7
+ ## RESTRICCIONES:
8
+ - No se permite la copia, modificación, distribución ni uso sin autorización previa por escrito de Cátedra Cajasiete BigData, OpenData & Blockchain.
9
+ - Se prohíbe el uso de este software con fines comerciales sin un acuerdo de licencia válido.
10
+ - No se otorga ninguna garantía sobre el software; su uso es bajo responsabilidad del usuario autorizado.
11
+
12
+ Para obtener permisos de uso o licencias comerciales, por favor contacta a: catedrabob@ull.edu.es.
README.md CHANGED
@@ -1,20 +1,327 @@
1
  ---
2
- title: CNO 11 Classification
3
- emoji: 🚀
4
- colorFrom: red
5
- colorTo: red
6
- sdk: docker
7
- app_port: 8501
8
- tags:
9
- - streamlit
10
- pinned: false
11
- short_description: Streamlit template space
12
  license: other
 
 
 
 
 
 
 
13
  ---
14
 
15
- # Welcome to Streamlit!
16
 
17
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
18
 
19
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
20
- forums](https://discuss.streamlit.io).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
 
 
 
 
 
 
 
 
 
2
  license: other
3
+ license_name: proprietary-license
4
+ license_link: LICENSE
5
+ language:
6
+ - es
7
+ base_model:
8
+ - intfloat/multilingual-e5-large
9
+ pipeline_tag: text-classification
10
  ---
11
 
12
+ # Model Card for Model ID
13
 
14
+ <!-- Provide a quick summary of what the model is/does. -->
15
 
16
+ This model card aims to be a base template for new models. It has been generated using [this raw template](https://github.com/huggingface/huggingface_hub/blob/main/src/huggingface_hub/templates/modelcard_template.md?plain=1).
17
+
18
+ ## Model Details
19
+
20
+ ### Model Description
21
+
22
+ <!-- Provide a longer summary of what this model is. -->
23
+
24
+ - **Developed by:** Cátedra Cajasiete de Big Data, Open Data y Blockchain de la Universidad de La Laguna
25
+ - **Funded by:** Cajasiete y la Universidad de La Laguna
26
+ - **Shared by:** Cátedra Cajasiete de Big Data, Open Data y Blockchain de la Universidad de La Laguna and Instituto Canario de Estadística
27
+ - **Model type:** text-classification
28
+ - **Language(s) (NLP):** Spanish
29
+ - **License:** [Proprietary](LICENSE)
30
+ - **Finetuned from model:** [intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large)
31
+
32
+ ### Model Sources
33
+
34
+ <!-- Provide the basic links for the model. -->
35
+ - **Paper [TODO]:** [TODO](https://)
36
+
37
+ ## Uses
38
+
39
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
40
+ This model has been trained to classify text into CNOs ([Código Nacional de Ocupaciones](https://enclaveformacion.com/cno-11/)) in Spanish. It is intended to be used by researchers, developers, and organizations interested in analyzing and classifying occupational data in the Spanish language.
41
+
42
+ ### Direct Use
43
+
44
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
45
+
46
+ [More Information Needed]
47
+
48
+
49
+ ### Out-of-Scope Use
50
+
51
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
52
+
53
+ The model will not work well if used to classify non-Spanish text, as it was trained exclusively on it.
54
+
55
+ ## Bias, Risks, and Limitations
56
+
57
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
58
+
59
+ Because of the model has been trained with data from socioeconomic surveys, it may have inherent biases in the training data. These biases may manifest themselves in the classification of occupations, especially those that are less well represented in the data. In addition, the model may not generalize well to occupations that are not well represented in the training set.
60
+
61
+ Another limitation to consider is that since the CNO was created, which is the national occupational classification system used in Spain, as will be explained later, new occupations have appeared that are not included in the model. Therefore, the model may not be able to correctly classify these new occupations. Such as, for example, Streamer, Influencer, etc.
62
+
63
+ ### Recommendations
64
+
65
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
66
+
67
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
68
+
69
+ ## How to Get Started with the Model
70
+
71
+ Use the code below to get started with the model.
72
+
73
+ ### Install necessary libraries
74
+
75
+ ```
76
+ pip install torch
77
+ pip install transformers
78
+ ```
79
+
80
+ [TODO: Check if it's necessary to do anything more than this]
81
+
82
+ ### Load model
83
+
84
+ ```python
85
+ from transformers import AutoModelForSequenceClassification
86
+
87
+ model = AutoModelForSequenceClassification.from_pretrained("bob-nlp/A5-CNO-BOB-ISTAC-D12")
88
+ ```
89
+
90
+ ### Load tokenizer
91
+
92
+ ```python
93
+ from transformers import AutoTokenizer
94
+
95
+ tokenizer = AutoTokenizer.from_pretrained("bob-nlp/A5-CNO-BOB-ISTAC-D12")
96
+ ```
97
+
98
+ ### Using the model
99
+
100
+ ```python
101
+ import torch
102
+ from torch.nn.functional import softmax
103
+
104
+ text_to_predict = []
105
+ text = "text to classify"
106
+ is_single_item = isinstance(text, str)
107
+ if is_single_item: # Data to predict must be a list of strings, even if it's only one string
108
+ text_to_predict = [text]
109
+
110
+ inputs = tokenizer(text_to_predict, padding=True, truncation=True, max_length=512, return_tensors="pt")
111
+ with torch.no_grad():
112
+ outputs = model(**inputs)
113
+ logits = outputs.logits
114
+ probabilities = softmax(logits, dim=1)
115
+ id2label = model.config.id2label
116
+
117
+ sorted_predictions = []
118
+ for i in range(logits.shape[0]):
119
+ single_probs = probabilities[i]
120
+ scores_dict = {id2label[j]: single_probs[j].item() for j in range(len(id2label))}
121
+ sorted_prediction = sorted(scores_dict.items(), key=lambda item: item[1], reverse=True)
122
+ sorted_predictions.append(sorted_prediction)
123
+ best_prediction_info = sorted_predictions[0]
124
+ best_label, best_label_prob = best_prediction_info[0]
125
+ ```
126
+
127
+ ### Convert result to CNO
128
+
129
+ The result given by the model will be in a "LABEL_(NUMBER)" format. In order to translate it to a CNO, you must follow these steps:
130
+
131
+ 1) Download the file `cno_utils.py` in the `utils` folder of this repository.
132
+ 2) Add the following to in your code:
133
+ ```python
134
+ from cno_utils import convert_to_cno
135
+
136
+ cno_predicted_code = convert_to_cno(best_label)
137
+ ```
138
+
139
+ You must have previously installed `pandas` and `huggingface_hub` for it to work:
140
+ ```
141
+ pip install huggingface_hub
142
+ pip install pandas
143
+ ```
144
+
145
+ Alternatively, download the `idxs.csv` file found in the `data` folder of this repository and copy the following into your code:
146
+ ```python
147
+ import pandas
148
+
149
+ def _load_label_mapping():
150
+ csv_path = hf_hub_download(repo_id="bob-nlp/A5-CNO-BOB-ISTAC-D12", filename="LOCAL/PATH/TO/idxs.csv")
151
+ df = pd.read_csv(csv_path)
152
+ _label_mapping = dict(zip(df['label'], df['CNO']))
153
+ return _label_mapping
154
+
155
+ def convert_to_cno(output_label):
156
+ mapping = _load_label_mapping()
157
+ return mapping.get(output_label, output_label)
158
+ ```
159
+
160
+ And then simply call `convert_to_cno()`.
161
+
162
+
163
+ ### Get description of the CNO
164
+
165
+ 1) Download the file `cno_utils.py` in the `utils` folder of this repository.
166
+ 2) Add the following to in your code:
167
+ ```python
168
+ from cno_utils import get_cno_description
169
+
170
+ cno_description = get_cno_description(cno_predicted_code)
171
+ ```
172
+
173
+ You must have previously installed `pandas` and `huggingface_hub` for it to work:
174
+ ```
175
+ pip install huggingface_hub
176
+ pip install pandas
177
+ ```
178
+
179
+ Alternatively, download the `cno11_notas.csv` file found in the `data` folder of this repository and copy the following into your code:
180
+ ```python
181
+ import pandas
182
+
183
+ def _load_description_mapping():
184
+ csv_path = hf_hub_download(repo_id="bob-nlp/A5-CNO-BOB-ISTAC-D12", filename="LOCAL/PATH/TO/cno11_notas.csv")
185
+ df = pd.read_csv(csv_path)
186
+ _description_mapping = dict(zip(df['CNO'], df['DN4']))
187
+ return _description_mapping
188
+
189
+ def get_cno_description(cno):
190
+ mapping = _load_description_mapping()
191
+ return mapping.get(cno, 'Unknown')
192
+ ```
193
+
194
+ And then simply call `get_cno_description()`.
195
+
196
+
197
+ ## Training Details
198
+
199
+ ### Training Data
200
+
201
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
202
+
203
+ This model has been trained using an aggregated data set from various socioeconomic surveys conducted by the Instituto Canario de Estadística (ISTAC). The ISTAC is the official statistical agency of the autonomous community of the Canary Islands, in charge of producing and disseminating statistical information of public interest.
204
+
205
+ The training dataset is composed of individual responses to surveys designed to capture a representative picture of the social and economic situation of the population in the Canary Islands.
206
+ Although the specific dataset used for this model cannot be directly redistributed, the original ISTAC surveys, such as the Survey of Income and Living Conditions of Canarian Households (EICVHC) or the Survey of Socioeconomic Habits and Confidence (ECOSOC), provide insight into the type of information collected. You can consult the microdata and documentation of these and other surveys in the [ISTAC data portal](https://datos.canarias.es/catalogos/estadisticas/organization/istac?_groups_limit=0&_res_format_limit=0&res_format=ODS&organization=istac&groups=sociedad-bienestar&license_id=istac-aviso-legal).
207
+ The variables included in the training dataset are fundamental to the task of occupational classification and reflect a variety of demographic and socioeconomic factors.
208
+
209
+ The variables used are:
210
+ * EDAD_RANGO: Age range of the respondent.
211
+ * SEXO: Sex of the respondent.
212
+ * INGRESO: Income level of the household or individual.
213
+ * ESTUDIOS: Level of education attained.
214
+ * SITUACION: Employment status (e.g., employed, unemployed, inactive).
215
+ * ACTIVIDAD: Sector of economic activity.
216
+ * TAREA: Description of the main task performed at work.
217
+ * CNO: National Code of Occupations.
218
+
219
+ The target variable of the model is CNO. The CNO is the national classification system of [occupations used in Spain](https://www.ine.es/dyngs/INEbase/es/operacion.htm?c=Estadistica_C&cid=1254736177033&menu=ultiDatos&idp=1254735976614), managed by the National Statistics Institute (INE). This system organizes occupations in a hierarchical structure that facilitates the grouping and analysis of labor data. The model has been trained with the CNO-11 version of this classification.
220
+
221
+ ### Training Procedure
222
+
223
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
224
+
225
+ #### Preprocessing
226
+
227
+ The main challenge of the training data was the class imbalance in the target variable CNO, as the most common occupations in the Canary Islands (e.g., "restaurant services and commerce") were overrepresented. To mitigate the bias towards the majority classes, a data augmentation technique was applied by generating synthetic entries for the underrepresented occupations. This process balances the distribution of classes, improving the generalizability of the model. In addition, categorical variables were coded into numerical format and null values were managed to ensure data quality.
228
+
229
+ Preprocessing also included the following standard steps:
230
+
231
+ * **Coding of categorical variables**: Variables such as AGE_RANGE, SEX, STUDIES, STATUS, and ACTIVITY were converted to a numerical format (e.g., by One-Hot Encoding) so that they could be processed by the model.
232
+ * **Null Value Handling**: A strategy was implemented to deal with inputs with missing values.
233
+
234
+ #### Training Hyperparameters
235
+
236
+ The model was fine-tuned from **`intfloat/multilingual-e5-large`** using the following configuration:
237
+
238
+ | Parameter | Value | Description |
239
+ | :--- | :--- | :--- |
240
+ | **Base Model** | `intfloat/multilingual-e5-large` | Pre-trained model used as a starting point. |
241
+ | **`TEST_SIZE`** | `0.3` | Proportion of the dataset reserved for testing. |
242
+ | **`RANDOM_STATE`** | `42` | Seed for reproducible data splitting. |
243
+ | **`NUM_TRAIN_EPOCHS`** | `16` | Maximum number of training epochs. |
244
+ | **`BATCH_SIZE`** | `24` | Batch size per device. |
245
+ | **`LEARNING_RATE`** | `2e-05` | Learning rate for the optimizer. |
246
+ | **`EARLY_STOPPING_PATIENCE`**| `2` | Epochs to wait for improvement before stopping training. |
247
+ | **`EARLY_STOPPING_THRESHOLD`**| `0.01` | Minimum change to be considered an improvement. |
248
+ | **`LOGGING_STEPS`** | `500` | Logging frequency (in steps). |
249
+
250
+ - **Training regime:** fp32 (Full Precision) <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
251
+
252
+
253
+ ## Evaluation
254
+
255
+ <!-- This section describes the evaluation protocols and provides the results. -->
256
+
257
+ ### Testing Data, Factors & Metrics
258
+
259
+ #### Testing Data
260
+
261
+ <!-- This should link to a Dataset Card if possible. -->
262
+
263
+ The evaluation of the model was performed using a test set that was not used during training. This test set is composed of a representative sample of the population of the [Canary Islands](https://en.wikipedia.org/wiki/Canary_Islands), ensuring that the model's performance is evaluated on data that reflects the diversity and complexity of real-world scenarios.
264
+
265
+ #### Factors
266
+
267
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
268
+
269
+ [More Information Needed]
270
+
271
+ #### Metrics
272
+
273
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
274
+
275
+ The model's performance was assessed using a set of metrics carefully chosen to reflect the challenges of this classification task, namely the class imbalance and the hierarchical nature of the CNO labels.
276
+
277
+ * *Accuracy*: This is the most straightforward metric, representing the overall percentage of correctly predicted occupations. While it provides a general overview of performance, it can be misleading in imbalanced datasets. A model could achieve high accuracy by simply predicting the most common occupations well, while failing on rarer ones. It is included as a baseline reference.
278
+
279
+ * *Balanced Accuracy*: This metric was chosen specifically to counteract the weakness of standard accuracy. It calculates the average recall across all classes, giving equal weight to each one regardless of how frequently it appears. A high Balanced Accuracy score indicates that the model is performing well on both common and rare occupations, making it a much fairer assessment of a model's true generalization capability on this dataset.
280
+
281
+ * *Recall (macro)*: Recall measures the model's ability to correctly identify all relevant instances of a class ("What proportion of actual positives was identified correctly?"). The macro average calculates recall independently for each class and then takes the unweighted mean. This is crucial because it treats a failure to identify a rare occupation as equally important as a failure to identify a common one. It directly measures how well the model "finds" examples from every single category.
282
+
283
+ * *F1-score (macro)*: The F1-score is the harmonic mean of precision and recall. By using the macro average, we get a single, balanced measure of performance across all classes. It is one of the most important metrics for this task because a high macro F1-score requires the model to have both good precision (not mislabeling other occupations as the target class) and good recall (finding all instances of the target class), and to do so for rare and common classes alike.
284
+
285
+ * *H-F1-score (Hierarchical F1-score)*: This metric was chosen because the CNO classification is inherently hierarchical. A standard F1-score treats all errors equally; for instance, mistaking a "Web Developer" for a "Farmer" is just as bad as mistaking it for a "Software Engineer". The Hierarchical F1-score is more nuanced. It gives partial credit for predictions that are incorrect but "close" in the occupational hierarchy. This provides a more practical measure of the model's utility, as a prediction within the correct professional group is significantly more useful than one that is completely unrelated.
286
+
287
+ ### Results
288
+
289
+ The model achieved the following performance on the test set:
290
+ | Metric | Score |
291
+ | :--- | :--- |
292
+ | Accuracy | 0.81 |
293
+ | Balanced Accuracy | 0.69 |
294
+ | Recall (macro) | 0.65 |
295
+ | F1-score (macro) | 0.64 |
296
+ | H-F1-score | 0.85 |
297
+
298
+ **Note:** `Recall` and `F1-score` were calculated using a macro average to provide a fair performance measure across all classes, including the underrepresented ones.
299
+
300
+ #### Hardware
301
+
302
+ This model is a fine-tuned version of intfloat/multilingual-e5-large, a large-sized transformer. As such, the hardware requirements depend on whether you are running the model for inference or for training.
303
+
304
+ **Inference (Using the Model)**
305
+ For running inference, a GPU is highly recommended for optimal performance, especially for batch processing.
306
+ * CPU: While it is possible to run this model on a multi-core CPU, expect significant latency. This may be acceptable for offline, low-volume tasks, but it is not suitable for real-time applications.
307
+
308
+ * GPU (Recommended): For efficient inference, a modern GPU with at least 6-8 GB of VRAM is recommended (e.g., NVIDIA Tesla T4, RTX 3060). This will allow for reasonably fast predictions and the processing of multiple requests in batches.
309
+
310
+ **Training (Reproducing the Fine-Tuning)**
311
+ Fine-tuning a large-sized model is computationally intensive and requires a high-end GPU.
312
+
313
+ ## Citation
314
+
315
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
316
+
317
+ **BibTeX:**
318
+
319
+ [More Information Needed]
320
+
321
+ **APA:**
322
+
323
+ [More Information Needed]
324
+
325
+ ## Model Card Contact
326
+ - **Organization:** Cajasiete Chair Cajasiete BigData, OpenData & Blockchain
327
+ - **Email:** catedrabob@ull.edu.es
app.py ADDED
@@ -0,0 +1,198 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ from streamlit_chat import message
3
+ import pandas as pd
4
+ import os
5
+ import json
6
+ from transformers import pipeline
7
+ from dotenv import load_dotenv
8
+ from utils.cno_utils import convert_to_cno, get_cno_description
9
+
10
+ AVATAR_PATH = "https://avatars.githubusercontent.com/u/122880210?s=200&v=4"
11
+ st.set_page_config(
12
+ "Clasificador CNO 🤖", "🤖", layout="wide", initial_sidebar_state="expanded"
13
+ )
14
+
15
+ st.markdown(
16
+ """<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;600&display=swap" rel="stylesheet">
17
+ <style>
18
+ html, body, [class*="css"] {font-family: "Inter", sans-serif;}
19
+ header, footer {visibility: hidden;}
20
+ .block-container {padding-top: 0.5rem; display: flex; flex-direction: column; min-height: 100vh;}
21
+ .stChatMessage.user {background: linear-gradient(120deg,#00c6ff 0%,#0072ff 100%); color:#fff; border-radius:1rem; padding:0.75rem 1rem; margin:0.25rem 0;}
22
+ .stChatMessage.bot {background:#fff; border:1px solid #e0e0e0; border-radius:1rem; padding:0.75rem 1rem; margin:0.25rem 0;}
23
+ .stChatMessage.bot img {height:24px;width:24px;border-radius:50%;margin-right:0.5rem;}
24
+ ::-webkit-scrollbar {width:8px;}
25
+ ::-webkit-scrollbar-thumb {background:#8f9dff;border-radius:10px;}
26
+ .sidebar-title {font-size:0.9rem;font-weight:600;margin:0.5rem 0 0.25rem;color:#4b4b4b;}
27
+ div.msg ul{list-style:none;padding-left:0;margin:0;}
28
+ [data-testid="stSidebarCollapseButton"] {
29
+ display: none;
30
+ }
31
+ </style>""",
32
+ unsafe_allow_html=True,
33
+ )
34
+ MODEL_ID = "bob-nlp/A5-CNO-BOB-ISTAC-D12"
35
+
36
+
37
+ @st.cache_resource
38
+ def load_huggingface_model():
39
+ """Carga el pipeline de inferencia desde Hugging Face Hub."""
40
+ load_dotenv()
41
+ hf_token = os.environ.get("HF_TOKEN")
42
+ if not hf_token:
43
+ # Si el token no está, mostramos un error claro en la app.
44
+ st.error(
45
+ "HF_TOKEN no encontrado. Por favor, configúralo en los 'Secrets' de tu Space.",
46
+ icon="🔑",
47
+ )
48
+ return None
49
+ try:
50
+ model_pipeline = pipeline(
51
+ "text-classification",
52
+ model=MODEL_ID,
53
+ token=hf_token,
54
+ )
55
+ return model_pipeline
56
+ except Exception as e:
57
+ st.error(f"Error al cargar el modelo '{MODEL_ID}': {e}", icon="🔥")
58
+ return None
59
+
60
+
61
+ def load_json_file(filename):
62
+ try:
63
+ with open(filename, "r", encoding="utf-8") as f:
64
+ return json.load(f)
65
+ except FileNotFoundError:
66
+ st.warning(
67
+ f"El archivo '{filename}' no se ha encontrado en el repositorio del Space."
68
+ )
69
+ return {}
70
+ except Exception as e:
71
+ st.error(f"Error al leer el archivo JSON '{filename}': {e}")
72
+ return {}
73
+
74
+
75
+ pipe = load_huggingface_model()
76
+ METADATA = load_json_file("data/metadata.json")
77
+ PROBLEMATIC_CNOS = load_json_file("data/problematic_cnos.json")
78
+
79
+
80
+ def run_inference(text_input):
81
+ """
82
+ Función que ejecuta la inferencia usando el pipeline de Hugging Face
83
+ y formatea la salida para mostrarla en la UI.
84
+ """
85
+ if not pipe:
86
+ return "Error: El modelo no está cargado."
87
+
88
+ try:
89
+ results = pipe(text_input, top_k=3)
90
+ PROBLEMATIC_CNO_MESSAGE = " ⚠️⚠️⚠️ **Cuidado: código poco fiable** "
91
+ out = []
92
+
93
+ # TODO: Modificar descripción código
94
+ for response in results:
95
+ response["label"] = convert_to_cno(response["label"])
96
+ response["description"] = get_cno_description(response["label"])
97
+ main_msg = (
98
+ f"Predicción: **{response['label']}**: {response['description']} "
99
+ f"Certeza: **{response['score']:.2f}** "
100
+ )
101
+
102
+ if response["label"] in PROBLEMATIC_CNOS:
103
+ main_msg += PROBLEMATIC_CNO_MESSAGE
104
+ out.append(main_msg)
105
+
106
+ return "\n".join(out)
107
+
108
+ except Exception as e:
109
+ return f"Ocurrió un error durante la inferencia: {e}"
110
+
111
+
112
+ st.sidebar.title("Clasificador CNO-11")
113
+ st.sidebar.markdown("---")
114
+ st.sidebar.markdown(
115
+ "<div class='sidebar-title'>🎛️ Filtros</div>", unsafe_allow_html=True
116
+ )
117
+
118
+
119
+ def init_state() -> None:
120
+ defaults = load_json_file("data/defaults_session_state.json")
121
+ for k, v in defaults.items():
122
+ st.session_state.setdefault(k, v)
123
+
124
+
125
+ init_state()
126
+
127
+
128
+ def on_controls_change() -> None:
129
+ st.session_state.past.clear()
130
+ st.session_state.generated.clear()
131
+
132
+
133
+ for col, metadatas in METADATA.items():
134
+ sel = st.sidebar.selectbox(
135
+ label=col,
136
+ options=list(metadatas),
137
+ key=f"select_{col}",
138
+ on_change=on_controls_change,
139
+ format_func=lambda x: x["textual"],
140
+ )
141
+ st.session_state.selections[col] = sel
142
+
143
+ st.sidebar.divider()
144
+ if st.sidebar.button("🗑️ Limpiar conversación", use_container_width=True):
145
+ on_controls_change()
146
+ st.rerun()
147
+
148
+ st.title("🤖 Clasificador de Códigos CNO-11")
149
+ st.info(
150
+ f"Utilizando el modelo: **[{MODEL_ID}](https://huggingface.co/bob-nlp/A5-CNO-BOB-ISTAC-D12)**"
151
+ )
152
+
153
+
154
+ def add_user_message(text: str):
155
+ st.session_state.past.append(text)
156
+ to_classify = f"{text}."
157
+ if st.session_state.selections:
158
+ for col, sel in st.session_state.selections.items():
159
+ if sel:
160
+ to_classify += f" {sel['textual']}."
161
+ response = run_inference(to_classify)
162
+ st.session_state.generated.append((response))
163
+
164
+
165
+ def render_chat():
166
+ message(
167
+ "¡Hola! Soy el clasificador de códigos CNO-11. Por favor, introduce una descripción de la tarea o ocupación que quieres clasificar y te ayudaré a encontrar el código CNO correspondiente.",
168
+ is_user=False,
169
+ key="welcome",
170
+ logo=AVATAR_PATH,
171
+ )
172
+
173
+ for i, (u, b) in enumerate(zip(st.session_state.past, st.session_state.generated)):
174
+ message(u, is_user=True, key=f"u{i}", avatar_style="no-avatar")
175
+ message(b, key=f"b{i}", logo=AVATAR_PATH)
176
+
177
+
178
+ chat_box = st.container()
179
+ with chat_box:
180
+ st.markdown('<div class="chat-container">', unsafe_allow_html=True)
181
+ render_chat()
182
+ new_text = st.chat_input("Escribe aquí el texto a clasificar…")
183
+ if new_text:
184
+ add_user_message(new_text)
185
+ st.rerun()
186
+ st.markdown("</div>", unsafe_allow_html=True)
187
+
188
+ # Footer
189
+ st.markdown("---")
190
+ st.markdown(
191
+ """
192
+ <div style="text-align: center; color: #666; font-size: 0.8rem; padding: 1rem 0;">
193
+ Desarrollado por <strong>Cátedra Cajasiete de Big Data, Open Data y Blockchain</strong><br>
194
+ Universidad de La Laguna
195
+ </div>
196
+ """,
197
+ unsafe_allow_html=True,
198
+ )
config.json ADDED
@@ -0,0 +1,1036 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "XLMRobertaForSequenceClassification"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "classifier_dropout": null,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 1024,
12
+ "id2label": {
13
+ "0": "LABEL_0",
14
+ "1": "LABEL_1",
15
+ "2": "LABEL_2",
16
+ "3": "LABEL_3",
17
+ "4": "LABEL_4",
18
+ "5": "LABEL_5",
19
+ "6": "LABEL_6",
20
+ "7": "LABEL_7",
21
+ "8": "LABEL_8",
22
+ "9": "LABEL_9",
23
+ "10": "LABEL_10",
24
+ "11": "LABEL_11",
25
+ "12": "LABEL_12",
26
+ "13": "LABEL_13",
27
+ "14": "LABEL_14",
28
+ "15": "LABEL_15",
29
+ "16": "LABEL_16",
30
+ "17": "LABEL_17",
31
+ "18": "LABEL_18",
32
+ "19": "LABEL_19",
33
+ "20": "LABEL_20",
34
+ "21": "LABEL_21",
35
+ "22": "LABEL_22",
36
+ "23": "LABEL_23",
37
+ "24": "LABEL_24",
38
+ "25": "LABEL_25",
39
+ "26": "LABEL_26",
40
+ "27": "LABEL_27",
41
+ "28": "LABEL_28",
42
+ "29": "LABEL_29",
43
+ "30": "LABEL_30",
44
+ "31": "LABEL_31",
45
+ "32": "LABEL_32",
46
+ "33": "LABEL_33",
47
+ "34": "LABEL_34",
48
+ "35": "LABEL_35",
49
+ "36": "LABEL_36",
50
+ "37": "LABEL_37",
51
+ "38": "LABEL_38",
52
+ "39": "LABEL_39",
53
+ "40": "LABEL_40",
54
+ "41": "LABEL_41",
55
+ "42": "LABEL_42",
56
+ "43": "LABEL_43",
57
+ "44": "LABEL_44",
58
+ "45": "LABEL_45",
59
+ "46": "LABEL_46",
60
+ "47": "LABEL_47",
61
+ "48": "LABEL_48",
62
+ "49": "LABEL_49",
63
+ "50": "LABEL_50",
64
+ "51": "LABEL_51",
65
+ "52": "LABEL_52",
66
+ "53": "LABEL_53",
67
+ "54": "LABEL_54",
68
+ "55": "LABEL_55",
69
+ "56": "LABEL_56",
70
+ "57": "LABEL_57",
71
+ "58": "LABEL_58",
72
+ "59": "LABEL_59",
73
+ "60": "LABEL_60",
74
+ "61": "LABEL_61",
75
+ "62": "LABEL_62",
76
+ "63": "LABEL_63",
77
+ "64": "LABEL_64",
78
+ "65": "LABEL_65",
79
+ "66": "LABEL_66",
80
+ "67": "LABEL_67",
81
+ "68": "LABEL_68",
82
+ "69": "LABEL_69",
83
+ "70": "LABEL_70",
84
+ "71": "LABEL_71",
85
+ "72": "LABEL_72",
86
+ "73": "LABEL_73",
87
+ "74": "LABEL_74",
88
+ "75": "LABEL_75",
89
+ "76": "LABEL_76",
90
+ "77": "LABEL_77",
91
+ "78": "LABEL_78",
92
+ "79": "LABEL_79",
93
+ "80": "LABEL_80",
94
+ "81": "LABEL_81",
95
+ "82": "LABEL_82",
96
+ "83": "LABEL_83",
97
+ "84": "LABEL_84",
98
+ "85": "LABEL_85",
99
+ "86": "LABEL_86",
100
+ "87": "LABEL_87",
101
+ "88": "LABEL_88",
102
+ "89": "LABEL_89",
103
+ "90": "LABEL_90",
104
+ "91": "LABEL_91",
105
+ "92": "LABEL_92",
106
+ "93": "LABEL_93",
107
+ "94": "LABEL_94",
108
+ "95": "LABEL_95",
109
+ "96": "LABEL_96",
110
+ "97": "LABEL_97",
111
+ "98": "LABEL_98",
112
+ "99": "LABEL_99",
113
+ "100": "LABEL_100",
114
+ "101": "LABEL_101",
115
+ "102": "LABEL_102",
116
+ "103": "LABEL_103",
117
+ "104": "LABEL_104",
118
+ "105": "LABEL_105",
119
+ "106": "LABEL_106",
120
+ "107": "LABEL_107",
121
+ "108": "LABEL_108",
122
+ "109": "LABEL_109",
123
+ "110": "LABEL_110",
124
+ "111": "LABEL_111",
125
+ "112": "LABEL_112",
126
+ "113": "LABEL_113",
127
+ "114": "LABEL_114",
128
+ "115": "LABEL_115",
129
+ "116": "LABEL_116",
130
+ "117": "LABEL_117",
131
+ "118": "LABEL_118",
132
+ "119": "LABEL_119",
133
+ "120": "LABEL_120",
134
+ "121": "LABEL_121",
135
+ "122": "LABEL_122",
136
+ "123": "LABEL_123",
137
+ "124": "LABEL_124",
138
+ "125": "LABEL_125",
139
+ "126": "LABEL_126",
140
+ "127": "LABEL_127",
141
+ "128": "LABEL_128",
142
+ "129": "LABEL_129",
143
+ "130": "LABEL_130",
144
+ "131": "LABEL_131",
145
+ "132": "LABEL_132",
146
+ "133": "LABEL_133",
147
+ "134": "LABEL_134",
148
+ "135": "LABEL_135",
149
+ "136": "LABEL_136",
150
+ "137": "LABEL_137",
151
+ "138": "LABEL_138",
152
+ "139": "LABEL_139",
153
+ "140": "LABEL_140",
154
+ "141": "LABEL_141",
155
+ "142": "LABEL_142",
156
+ "143": "LABEL_143",
157
+ "144": "LABEL_144",
158
+ "145": "LABEL_145",
159
+ "146": "LABEL_146",
160
+ "147": "LABEL_147",
161
+ "148": "LABEL_148",
162
+ "149": "LABEL_149",
163
+ "150": "LABEL_150",
164
+ "151": "LABEL_151",
165
+ "152": "LABEL_152",
166
+ "153": "LABEL_153",
167
+ "154": "LABEL_154",
168
+ "155": "LABEL_155",
169
+ "156": "LABEL_156",
170
+ "157": "LABEL_157",
171
+ "158": "LABEL_158",
172
+ "159": "LABEL_159",
173
+ "160": "LABEL_160",
174
+ "161": "LABEL_161",
175
+ "162": "LABEL_162",
176
+ "163": "LABEL_163",
177
+ "164": "LABEL_164",
178
+ "165": "LABEL_165",
179
+ "166": "LABEL_166",
180
+ "167": "LABEL_167",
181
+ "168": "LABEL_168",
182
+ "169": "LABEL_169",
183
+ "170": "LABEL_170",
184
+ "171": "LABEL_171",
185
+ "172": "LABEL_172",
186
+ "173": "LABEL_173",
187
+ "174": "LABEL_174",
188
+ "175": "LABEL_175",
189
+ "176": "LABEL_176",
190
+ "177": "LABEL_177",
191
+ "178": "LABEL_178",
192
+ "179": "LABEL_179",
193
+ "180": "LABEL_180",
194
+ "181": "LABEL_181",
195
+ "182": "LABEL_182",
196
+ "183": "LABEL_183",
197
+ "184": "LABEL_184",
198
+ "185": "LABEL_185",
199
+ "186": "LABEL_186",
200
+ "187": "LABEL_187",
201
+ "188": "LABEL_188",
202
+ "189": "LABEL_189",
203
+ "190": "LABEL_190",
204
+ "191": "LABEL_191",
205
+ "192": "LABEL_192",
206
+ "193": "LABEL_193",
207
+ "194": "LABEL_194",
208
+ "195": "LABEL_195",
209
+ "196": "LABEL_196",
210
+ "197": "LABEL_197",
211
+ "198": "LABEL_198",
212
+ "199": "LABEL_199",
213
+ "200": "LABEL_200",
214
+ "201": "LABEL_201",
215
+ "202": "LABEL_202",
216
+ "203": "LABEL_203",
217
+ "204": "LABEL_204",
218
+ "205": "LABEL_205",
219
+ "206": "LABEL_206",
220
+ "207": "LABEL_207",
221
+ "208": "LABEL_208",
222
+ "209": "LABEL_209",
223
+ "210": "LABEL_210",
224
+ "211": "LABEL_211",
225
+ "212": "LABEL_212",
226
+ "213": "LABEL_213",
227
+ "214": "LABEL_214",
228
+ "215": "LABEL_215",
229
+ "216": "LABEL_216",
230
+ "217": "LABEL_217",
231
+ "218": "LABEL_218",
232
+ "219": "LABEL_219",
233
+ "220": "LABEL_220",
234
+ "221": "LABEL_221",
235
+ "222": "LABEL_222",
236
+ "223": "LABEL_223",
237
+ "224": "LABEL_224",
238
+ "225": "LABEL_225",
239
+ "226": "LABEL_226",
240
+ "227": "LABEL_227",
241
+ "228": "LABEL_228",
242
+ "229": "LABEL_229",
243
+ "230": "LABEL_230",
244
+ "231": "LABEL_231",
245
+ "232": "LABEL_232",
246
+ "233": "LABEL_233",
247
+ "234": "LABEL_234",
248
+ "235": "LABEL_235",
249
+ "236": "LABEL_236",
250
+ "237": "LABEL_237",
251
+ "238": "LABEL_238",
252
+ "239": "LABEL_239",
253
+ "240": "LABEL_240",
254
+ "241": "LABEL_241",
255
+ "242": "LABEL_242",
256
+ "243": "LABEL_243",
257
+ "244": "LABEL_244",
258
+ "245": "LABEL_245",
259
+ "246": "LABEL_246",
260
+ "247": "LABEL_247",
261
+ "248": "LABEL_248",
262
+ "249": "LABEL_249",
263
+ "250": "LABEL_250",
264
+ "251": "LABEL_251",
265
+ "252": "LABEL_252",
266
+ "253": "LABEL_253",
267
+ "254": "LABEL_254",
268
+ "255": "LABEL_255",
269
+ "256": "LABEL_256",
270
+ "257": "LABEL_257",
271
+ "258": "LABEL_258",
272
+ "259": "LABEL_259",
273
+ "260": "LABEL_260",
274
+ "261": "LABEL_261",
275
+ "262": "LABEL_262",
276
+ "263": "LABEL_263",
277
+ "264": "LABEL_264",
278
+ "265": "LABEL_265",
279
+ "266": "LABEL_266",
280
+ "267": "LABEL_267",
281
+ "268": "LABEL_268",
282
+ "269": "LABEL_269",
283
+ "270": "LABEL_270",
284
+ "271": "LABEL_271",
285
+ "272": "LABEL_272",
286
+ "273": "LABEL_273",
287
+ "274": "LABEL_274",
288
+ "275": "LABEL_275",
289
+ "276": "LABEL_276",
290
+ "277": "LABEL_277",
291
+ "278": "LABEL_278",
292
+ "279": "LABEL_279",
293
+ "280": "LABEL_280",
294
+ "281": "LABEL_281",
295
+ "282": "LABEL_282",
296
+ "283": "LABEL_283",
297
+ "284": "LABEL_284",
298
+ "285": "LABEL_285",
299
+ "286": "LABEL_286",
300
+ "287": "LABEL_287",
301
+ "288": "LABEL_288",
302
+ "289": "LABEL_289",
303
+ "290": "LABEL_290",
304
+ "291": "LABEL_291",
305
+ "292": "LABEL_292",
306
+ "293": "LABEL_293",
307
+ "294": "LABEL_294",
308
+ "295": "LABEL_295",
309
+ "296": "LABEL_296",
310
+ "297": "LABEL_297",
311
+ "298": "LABEL_298",
312
+ "299": "LABEL_299",
313
+ "300": "LABEL_300",
314
+ "301": "LABEL_301",
315
+ "302": "LABEL_302",
316
+ "303": "LABEL_303",
317
+ "304": "LABEL_304",
318
+ "305": "LABEL_305",
319
+ "306": "LABEL_306",
320
+ "307": "LABEL_307",
321
+ "308": "LABEL_308",
322
+ "309": "LABEL_309",
323
+ "310": "LABEL_310",
324
+ "311": "LABEL_311",
325
+ "312": "LABEL_312",
326
+ "313": "LABEL_313",
327
+ "314": "LABEL_314",
328
+ "315": "LABEL_315",
329
+ "316": "LABEL_316",
330
+ "317": "LABEL_317",
331
+ "318": "LABEL_318",
332
+ "319": "LABEL_319",
333
+ "320": "LABEL_320",
334
+ "321": "LABEL_321",
335
+ "322": "LABEL_322",
336
+ "323": "LABEL_323",
337
+ "324": "LABEL_324",
338
+ "325": "LABEL_325",
339
+ "326": "LABEL_326",
340
+ "327": "LABEL_327",
341
+ "328": "LABEL_328",
342
+ "329": "LABEL_329",
343
+ "330": "LABEL_330",
344
+ "331": "LABEL_331",
345
+ "332": "LABEL_332",
346
+ "333": "LABEL_333",
347
+ "334": "LABEL_334",
348
+ "335": "LABEL_335",
349
+ "336": "LABEL_336",
350
+ "337": "LABEL_337",
351
+ "338": "LABEL_338",
352
+ "339": "LABEL_339",
353
+ "340": "LABEL_340",
354
+ "341": "LABEL_341",
355
+ "342": "LABEL_342",
356
+ "343": "LABEL_343",
357
+ "344": "LABEL_344",
358
+ "345": "LABEL_345",
359
+ "346": "LABEL_346",
360
+ "347": "LABEL_347",
361
+ "348": "LABEL_348",
362
+ "349": "LABEL_349",
363
+ "350": "LABEL_350",
364
+ "351": "LABEL_351",
365
+ "352": "LABEL_352",
366
+ "353": "LABEL_353",
367
+ "354": "LABEL_354",
368
+ "355": "LABEL_355",
369
+ "356": "LABEL_356",
370
+ "357": "LABEL_357",
371
+ "358": "LABEL_358",
372
+ "359": "LABEL_359",
373
+ "360": "LABEL_360",
374
+ "361": "LABEL_361",
375
+ "362": "LABEL_362",
376
+ "363": "LABEL_363",
377
+ "364": "LABEL_364",
378
+ "365": "LABEL_365",
379
+ "366": "LABEL_366",
380
+ "367": "LABEL_367",
381
+ "368": "LABEL_368",
382
+ "369": "LABEL_369",
383
+ "370": "LABEL_370",
384
+ "371": "LABEL_371",
385
+ "372": "LABEL_372",
386
+ "373": "LABEL_373",
387
+ "374": "LABEL_374",
388
+ "375": "LABEL_375",
389
+ "376": "LABEL_376",
390
+ "377": "LABEL_377",
391
+ "378": "LABEL_378",
392
+ "379": "LABEL_379",
393
+ "380": "LABEL_380",
394
+ "381": "LABEL_381",
395
+ "382": "LABEL_382",
396
+ "383": "LABEL_383",
397
+ "384": "LABEL_384",
398
+ "385": "LABEL_385",
399
+ "386": "LABEL_386",
400
+ "387": "LABEL_387",
401
+ "388": "LABEL_388",
402
+ "389": "LABEL_389",
403
+ "390": "LABEL_390",
404
+ "391": "LABEL_391",
405
+ "392": "LABEL_392",
406
+ "393": "LABEL_393",
407
+ "394": "LABEL_394",
408
+ "395": "LABEL_395",
409
+ "396": "LABEL_396",
410
+ "397": "LABEL_397",
411
+ "398": "LABEL_398",
412
+ "399": "LABEL_399",
413
+ "400": "LABEL_400",
414
+ "401": "LABEL_401",
415
+ "402": "LABEL_402",
416
+ "403": "LABEL_403",
417
+ "404": "LABEL_404",
418
+ "405": "LABEL_405",
419
+ "406": "LABEL_406",
420
+ "407": "LABEL_407",
421
+ "408": "LABEL_408",
422
+ "409": "LABEL_409",
423
+ "410": "LABEL_410",
424
+ "411": "LABEL_411",
425
+ "412": "LABEL_412",
426
+ "413": "LABEL_413",
427
+ "414": "LABEL_414",
428
+ "415": "LABEL_415",
429
+ "416": "LABEL_416",
430
+ "417": "LABEL_417",
431
+ "418": "LABEL_418",
432
+ "419": "LABEL_419",
433
+ "420": "LABEL_420",
434
+ "421": "LABEL_421",
435
+ "422": "LABEL_422",
436
+ "423": "LABEL_423",
437
+ "424": "LABEL_424",
438
+ "425": "LABEL_425",
439
+ "426": "LABEL_426",
440
+ "427": "LABEL_427",
441
+ "428": "LABEL_428",
442
+ "429": "LABEL_429",
443
+ "430": "LABEL_430",
444
+ "431": "LABEL_431",
445
+ "432": "LABEL_432",
446
+ "433": "LABEL_433",
447
+ "434": "LABEL_434",
448
+ "435": "LABEL_435",
449
+ "436": "LABEL_436",
450
+ "437": "LABEL_437",
451
+ "438": "LABEL_438",
452
+ "439": "LABEL_439",
453
+ "440": "LABEL_440",
454
+ "441": "LABEL_441",
455
+ "442": "LABEL_442",
456
+ "443": "LABEL_443",
457
+ "444": "LABEL_444",
458
+ "445": "LABEL_445",
459
+ "446": "LABEL_446",
460
+ "447": "LABEL_447",
461
+ "448": "LABEL_448",
462
+ "449": "LABEL_449",
463
+ "450": "LABEL_450",
464
+ "451": "LABEL_451",
465
+ "452": "LABEL_452",
466
+ "453": "LABEL_453",
467
+ "454": "LABEL_454",
468
+ "455": "LABEL_455",
469
+ "456": "LABEL_456",
470
+ "457": "LABEL_457",
471
+ "458": "LABEL_458",
472
+ "459": "LABEL_459",
473
+ "460": "LABEL_460",
474
+ "461": "LABEL_461",
475
+ "462": "LABEL_462",
476
+ "463": "LABEL_463",
477
+ "464": "LABEL_464",
478
+ "465": "LABEL_465",
479
+ "466": "LABEL_466",
480
+ "467": "LABEL_467",
481
+ "468": "LABEL_468",
482
+ "469": "LABEL_469",
483
+ "470": "LABEL_470",
484
+ "471": "LABEL_471",
485
+ "472": "LABEL_472",
486
+ "473": "LABEL_473",
487
+ "474": "LABEL_474",
488
+ "475": "LABEL_475",
489
+ "476": "LABEL_476",
490
+ "477": "LABEL_477",
491
+ "478": "LABEL_478",
492
+ "479": "LABEL_479",
493
+ "480": "LABEL_480",
494
+ "481": "LABEL_481",
495
+ "482": "LABEL_482",
496
+ "483": "LABEL_483",
497
+ "484": "LABEL_484",
498
+ "485": "LABEL_485",
499
+ "486": "LABEL_486",
500
+ "487": "LABEL_487",
501
+ "488": "LABEL_488",
502
+ "489": "LABEL_489",
503
+ "490": "LABEL_490",
504
+ "491": "LABEL_491",
505
+ "492": "LABEL_492",
506
+ "493": "LABEL_493",
507
+ "494": "LABEL_494",
508
+ "495": "LABEL_495",
509
+ "496": "LABEL_496",
510
+ "497": "LABEL_497",
511
+ "498": "LABEL_498",
512
+ "499": "LABEL_499",
513
+ "500": "LABEL_500",
514
+ "501": "LABEL_501"
515
+ },
516
+ "initializer_range": 0.02,
517
+ "intermediate_size": 4096,
518
+ "label2id": {
519
+ "LABEL_0": 0,
520
+ "LABEL_1": 1,
521
+ "LABEL_10": 10,
522
+ "LABEL_100": 100,
523
+ "LABEL_101": 101,
524
+ "LABEL_102": 102,
525
+ "LABEL_103": 103,
526
+ "LABEL_104": 104,
527
+ "LABEL_105": 105,
528
+ "LABEL_106": 106,
529
+ "LABEL_107": 107,
530
+ "LABEL_108": 108,
531
+ "LABEL_109": 109,
532
+ "LABEL_11": 11,
533
+ "LABEL_110": 110,
534
+ "LABEL_111": 111,
535
+ "LABEL_112": 112,
536
+ "LABEL_113": 113,
537
+ "LABEL_114": 114,
538
+ "LABEL_115": 115,
539
+ "LABEL_116": 116,
540
+ "LABEL_117": 117,
541
+ "LABEL_118": 118,
542
+ "LABEL_119": 119,
543
+ "LABEL_12": 12,
544
+ "LABEL_120": 120,
545
+ "LABEL_121": 121,
546
+ "LABEL_122": 122,
547
+ "LABEL_123": 123,
548
+ "LABEL_124": 124,
549
+ "LABEL_125": 125,
550
+ "LABEL_126": 126,
551
+ "LABEL_127": 127,
552
+ "LABEL_128": 128,
553
+ "LABEL_129": 129,
554
+ "LABEL_13": 13,
555
+ "LABEL_130": 130,
556
+ "LABEL_131": 131,
557
+ "LABEL_132": 132,
558
+ "LABEL_133": 133,
559
+ "LABEL_134": 134,
560
+ "LABEL_135": 135,
561
+ "LABEL_136": 136,
562
+ "LABEL_137": 137,
563
+ "LABEL_138": 138,
564
+ "LABEL_139": 139,
565
+ "LABEL_14": 14,
566
+ "LABEL_140": 140,
567
+ "LABEL_141": 141,
568
+ "LABEL_142": 142,
569
+ "LABEL_143": 143,
570
+ "LABEL_144": 144,
571
+ "LABEL_145": 145,
572
+ "LABEL_146": 146,
573
+ "LABEL_147": 147,
574
+ "LABEL_148": 148,
575
+ "LABEL_149": 149,
576
+ "LABEL_15": 15,
577
+ "LABEL_150": 150,
578
+ "LABEL_151": 151,
579
+ "LABEL_152": 152,
580
+ "LABEL_153": 153,
581
+ "LABEL_154": 154,
582
+ "LABEL_155": 155,
583
+ "LABEL_156": 156,
584
+ "LABEL_157": 157,
585
+ "LABEL_158": 158,
586
+ "LABEL_159": 159,
587
+ "LABEL_16": 16,
588
+ "LABEL_160": 160,
589
+ "LABEL_161": 161,
590
+ "LABEL_162": 162,
591
+ "LABEL_163": 163,
592
+ "LABEL_164": 164,
593
+ "LABEL_165": 165,
594
+ "LABEL_166": 166,
595
+ "LABEL_167": 167,
596
+ "LABEL_168": 168,
597
+ "LABEL_169": 169,
598
+ "LABEL_17": 17,
599
+ "LABEL_170": 170,
600
+ "LABEL_171": 171,
601
+ "LABEL_172": 172,
602
+ "LABEL_173": 173,
603
+ "LABEL_174": 174,
604
+ "LABEL_175": 175,
605
+ "LABEL_176": 176,
606
+ "LABEL_177": 177,
607
+ "LABEL_178": 178,
608
+ "LABEL_179": 179,
609
+ "LABEL_18": 18,
610
+ "LABEL_180": 180,
611
+ "LABEL_181": 181,
612
+ "LABEL_182": 182,
613
+ "LABEL_183": 183,
614
+ "LABEL_184": 184,
615
+ "LABEL_185": 185,
616
+ "LABEL_186": 186,
617
+ "LABEL_187": 187,
618
+ "LABEL_188": 188,
619
+ "LABEL_189": 189,
620
+ "LABEL_19": 19,
621
+ "LABEL_190": 190,
622
+ "LABEL_191": 191,
623
+ "LABEL_192": 192,
624
+ "LABEL_193": 193,
625
+ "LABEL_194": 194,
626
+ "LABEL_195": 195,
627
+ "LABEL_196": 196,
628
+ "LABEL_197": 197,
629
+ "LABEL_198": 198,
630
+ "LABEL_199": 199,
631
+ "LABEL_2": 2,
632
+ "LABEL_20": 20,
633
+ "LABEL_200": 200,
634
+ "LABEL_201": 201,
635
+ "LABEL_202": 202,
636
+ "LABEL_203": 203,
637
+ "LABEL_204": 204,
638
+ "LABEL_205": 205,
639
+ "LABEL_206": 206,
640
+ "LABEL_207": 207,
641
+ "LABEL_208": 208,
642
+ "LABEL_209": 209,
643
+ "LABEL_21": 21,
644
+ "LABEL_210": 210,
645
+ "LABEL_211": 211,
646
+ "LABEL_212": 212,
647
+ "LABEL_213": 213,
648
+ "LABEL_214": 214,
649
+ "LABEL_215": 215,
650
+ "LABEL_216": 216,
651
+ "LABEL_217": 217,
652
+ "LABEL_218": 218,
653
+ "LABEL_219": 219,
654
+ "LABEL_22": 22,
655
+ "LABEL_220": 220,
656
+ "LABEL_221": 221,
657
+ "LABEL_222": 222,
658
+ "LABEL_223": 223,
659
+ "LABEL_224": 224,
660
+ "LABEL_225": 225,
661
+ "LABEL_226": 226,
662
+ "LABEL_227": 227,
663
+ "LABEL_228": 228,
664
+ "LABEL_229": 229,
665
+ "LABEL_23": 23,
666
+ "LABEL_230": 230,
667
+ "LABEL_231": 231,
668
+ "LABEL_232": 232,
669
+ "LABEL_233": 233,
670
+ "LABEL_234": 234,
671
+ "LABEL_235": 235,
672
+ "LABEL_236": 236,
673
+ "LABEL_237": 237,
674
+ "LABEL_238": 238,
675
+ "LABEL_239": 239,
676
+ "LABEL_24": 24,
677
+ "LABEL_240": 240,
678
+ "LABEL_241": 241,
679
+ "LABEL_242": 242,
680
+ "LABEL_243": 243,
681
+ "LABEL_244": 244,
682
+ "LABEL_245": 245,
683
+ "LABEL_246": 246,
684
+ "LABEL_247": 247,
685
+ "LABEL_248": 248,
686
+ "LABEL_249": 249,
687
+ "LABEL_25": 25,
688
+ "LABEL_250": 250,
689
+ "LABEL_251": 251,
690
+ "LABEL_252": 252,
691
+ "LABEL_253": 253,
692
+ "LABEL_254": 254,
693
+ "LABEL_255": 255,
694
+ "LABEL_256": 256,
695
+ "LABEL_257": 257,
696
+ "LABEL_258": 258,
697
+ "LABEL_259": 259,
698
+ "LABEL_26": 26,
699
+ "LABEL_260": 260,
700
+ "LABEL_261": 261,
701
+ "LABEL_262": 262,
702
+ "LABEL_263": 263,
703
+ "LABEL_264": 264,
704
+ "LABEL_265": 265,
705
+ "LABEL_266": 266,
706
+ "LABEL_267": 267,
707
+ "LABEL_268": 268,
708
+ "LABEL_269": 269,
709
+ "LABEL_27": 27,
710
+ "LABEL_270": 270,
711
+ "LABEL_271": 271,
712
+ "LABEL_272": 272,
713
+ "LABEL_273": 273,
714
+ "LABEL_274": 274,
715
+ "LABEL_275": 275,
716
+ "LABEL_276": 276,
717
+ "LABEL_277": 277,
718
+ "LABEL_278": 278,
719
+ "LABEL_279": 279,
720
+ "LABEL_28": 28,
721
+ "LABEL_280": 280,
722
+ "LABEL_281": 281,
723
+ "LABEL_282": 282,
724
+ "LABEL_283": 283,
725
+ "LABEL_284": 284,
726
+ "LABEL_285": 285,
727
+ "LABEL_286": 286,
728
+ "LABEL_287": 287,
729
+ "LABEL_288": 288,
730
+ "LABEL_289": 289,
731
+ "LABEL_29": 29,
732
+ "LABEL_290": 290,
733
+ "LABEL_291": 291,
734
+ "LABEL_292": 292,
735
+ "LABEL_293": 293,
736
+ "LABEL_294": 294,
737
+ "LABEL_295": 295,
738
+ "LABEL_296": 296,
739
+ "LABEL_297": 297,
740
+ "LABEL_298": 298,
741
+ "LABEL_299": 299,
742
+ "LABEL_3": 3,
743
+ "LABEL_30": 30,
744
+ "LABEL_300": 300,
745
+ "LABEL_301": 301,
746
+ "LABEL_302": 302,
747
+ "LABEL_303": 303,
748
+ "LABEL_304": 304,
749
+ "LABEL_305": 305,
750
+ "LABEL_306": 306,
751
+ "LABEL_307": 307,
752
+ "LABEL_308": 308,
753
+ "LABEL_309": 309,
754
+ "LABEL_31": 31,
755
+ "LABEL_310": 310,
756
+ "LABEL_311": 311,
757
+ "LABEL_312": 312,
758
+ "LABEL_313": 313,
759
+ "LABEL_314": 314,
760
+ "LABEL_315": 315,
761
+ "LABEL_316": 316,
762
+ "LABEL_317": 317,
763
+ "LABEL_318": 318,
764
+ "LABEL_319": 319,
765
+ "LABEL_32": 32,
766
+ "LABEL_320": 320,
767
+ "LABEL_321": 321,
768
+ "LABEL_322": 322,
769
+ "LABEL_323": 323,
770
+ "LABEL_324": 324,
771
+ "LABEL_325": 325,
772
+ "LABEL_326": 326,
773
+ "LABEL_327": 327,
774
+ "LABEL_328": 328,
775
+ "LABEL_329": 329,
776
+ "LABEL_33": 33,
777
+ "LABEL_330": 330,
778
+ "LABEL_331": 331,
779
+ "LABEL_332": 332,
780
+ "LABEL_333": 333,
781
+ "LABEL_334": 334,
782
+ "LABEL_335": 335,
783
+ "LABEL_336": 336,
784
+ "LABEL_337": 337,
785
+ "LABEL_338": 338,
786
+ "LABEL_339": 339,
787
+ "LABEL_34": 34,
788
+ "LABEL_340": 340,
789
+ "LABEL_341": 341,
790
+ "LABEL_342": 342,
791
+ "LABEL_343": 343,
792
+ "LABEL_344": 344,
793
+ "LABEL_345": 345,
794
+ "LABEL_346": 346,
795
+ "LABEL_347": 347,
796
+ "LABEL_348": 348,
797
+ "LABEL_349": 349,
798
+ "LABEL_35": 35,
799
+ "LABEL_350": 350,
800
+ "LABEL_351": 351,
801
+ "LABEL_352": 352,
802
+ "LABEL_353": 353,
803
+ "LABEL_354": 354,
804
+ "LABEL_355": 355,
805
+ "LABEL_356": 356,
806
+ "LABEL_357": 357,
807
+ "LABEL_358": 358,
808
+ "LABEL_359": 359,
809
+ "LABEL_36": 36,
810
+ "LABEL_360": 360,
811
+ "LABEL_361": 361,
812
+ "LABEL_362": 362,
813
+ "LABEL_363": 363,
814
+ "LABEL_364": 364,
815
+ "LABEL_365": 365,
816
+ "LABEL_366": 366,
817
+ "LABEL_367": 367,
818
+ "LABEL_368": 368,
819
+ "LABEL_369": 369,
820
+ "LABEL_37": 37,
821
+ "LABEL_370": 370,
822
+ "LABEL_371": 371,
823
+ "LABEL_372": 372,
824
+ "LABEL_373": 373,
825
+ "LABEL_374": 374,
826
+ "LABEL_375": 375,
827
+ "LABEL_376": 376,
828
+ "LABEL_377": 377,
829
+ "LABEL_378": 378,
830
+ "LABEL_379": 379,
831
+ "LABEL_38": 38,
832
+ "LABEL_380": 380,
833
+ "LABEL_381": 381,
834
+ "LABEL_382": 382,
835
+ "LABEL_383": 383,
836
+ "LABEL_384": 384,
837
+ "LABEL_385": 385,
838
+ "LABEL_386": 386,
839
+ "LABEL_387": 387,
840
+ "LABEL_388": 388,
841
+ "LABEL_389": 389,
842
+ "LABEL_39": 39,
843
+ "LABEL_390": 390,
844
+ "LABEL_391": 391,
845
+ "LABEL_392": 392,
846
+ "LABEL_393": 393,
847
+ "LABEL_394": 394,
848
+ "LABEL_395": 395,
849
+ "LABEL_396": 396,
850
+ "LABEL_397": 397,
851
+ "LABEL_398": 398,
852
+ "LABEL_399": 399,
853
+ "LABEL_4": 4,
854
+ "LABEL_40": 40,
855
+ "LABEL_400": 400,
856
+ "LABEL_401": 401,
857
+ "LABEL_402": 402,
858
+ "LABEL_403": 403,
859
+ "LABEL_404": 404,
860
+ "LABEL_405": 405,
861
+ "LABEL_406": 406,
862
+ "LABEL_407": 407,
863
+ "LABEL_408": 408,
864
+ "LABEL_409": 409,
865
+ "LABEL_41": 41,
866
+ "LABEL_410": 410,
867
+ "LABEL_411": 411,
868
+ "LABEL_412": 412,
869
+ "LABEL_413": 413,
870
+ "LABEL_414": 414,
871
+ "LABEL_415": 415,
872
+ "LABEL_416": 416,
873
+ "LABEL_417": 417,
874
+ "LABEL_418": 418,
875
+ "LABEL_419": 419,
876
+ "LABEL_42": 42,
877
+ "LABEL_420": 420,
878
+ "LABEL_421": 421,
879
+ "LABEL_422": 422,
880
+ "LABEL_423": 423,
881
+ "LABEL_424": 424,
882
+ "LABEL_425": 425,
883
+ "LABEL_426": 426,
884
+ "LABEL_427": 427,
885
+ "LABEL_428": 428,
886
+ "LABEL_429": 429,
887
+ "LABEL_43": 43,
888
+ "LABEL_430": 430,
889
+ "LABEL_431": 431,
890
+ "LABEL_432": 432,
891
+ "LABEL_433": 433,
892
+ "LABEL_434": 434,
893
+ "LABEL_435": 435,
894
+ "LABEL_436": 436,
895
+ "LABEL_437": 437,
896
+ "LABEL_438": 438,
897
+ "LABEL_439": 439,
898
+ "LABEL_44": 44,
899
+ "LABEL_440": 440,
900
+ "LABEL_441": 441,
901
+ "LABEL_442": 442,
902
+ "LABEL_443": 443,
903
+ "LABEL_444": 444,
904
+ "LABEL_445": 445,
905
+ "LABEL_446": 446,
906
+ "LABEL_447": 447,
907
+ "LABEL_448": 448,
908
+ "LABEL_449": 449,
909
+ "LABEL_45": 45,
910
+ "LABEL_450": 450,
911
+ "LABEL_451": 451,
912
+ "LABEL_452": 452,
913
+ "LABEL_453": 453,
914
+ "LABEL_454": 454,
915
+ "LABEL_455": 455,
916
+ "LABEL_456": 456,
917
+ "LABEL_457": 457,
918
+ "LABEL_458": 458,
919
+ "LABEL_459": 459,
920
+ "LABEL_46": 46,
921
+ "LABEL_460": 460,
922
+ "LABEL_461": 461,
923
+ "LABEL_462": 462,
924
+ "LABEL_463": 463,
925
+ "LABEL_464": 464,
926
+ "LABEL_465": 465,
927
+ "LABEL_466": 466,
928
+ "LABEL_467": 467,
929
+ "LABEL_468": 468,
930
+ "LABEL_469": 469,
931
+ "LABEL_47": 47,
932
+ "LABEL_470": 470,
933
+ "LABEL_471": 471,
934
+ "LABEL_472": 472,
935
+ "LABEL_473": 473,
936
+ "LABEL_474": 474,
937
+ "LABEL_475": 475,
938
+ "LABEL_476": 476,
939
+ "LABEL_477": 477,
940
+ "LABEL_478": 478,
941
+ "LABEL_479": 479,
942
+ "LABEL_48": 48,
943
+ "LABEL_480": 480,
944
+ "LABEL_481": 481,
945
+ "LABEL_482": 482,
946
+ "LABEL_483": 483,
947
+ "LABEL_484": 484,
948
+ "LABEL_485": 485,
949
+ "LABEL_486": 486,
950
+ "LABEL_487": 487,
951
+ "LABEL_488": 488,
952
+ "LABEL_489": 489,
953
+ "LABEL_49": 49,
954
+ "LABEL_490": 490,
955
+ "LABEL_491": 491,
956
+ "LABEL_492": 492,
957
+ "LABEL_493": 493,
958
+ "LABEL_494": 494,
959
+ "LABEL_495": 495,
960
+ "LABEL_496": 496,
961
+ "LABEL_497": 497,
962
+ "LABEL_498": 498,
963
+ "LABEL_499": 499,
964
+ "LABEL_5": 5,
965
+ "LABEL_50": 50,
966
+ "LABEL_500": 500,
967
+ "LABEL_501": 501,
968
+ "LABEL_51": 51,
969
+ "LABEL_52": 52,
970
+ "LABEL_53": 53,
971
+ "LABEL_54": 54,
972
+ "LABEL_55": 55,
973
+ "LABEL_56": 56,
974
+ "LABEL_57": 57,
975
+ "LABEL_58": 58,
976
+ "LABEL_59": 59,
977
+ "LABEL_6": 6,
978
+ "LABEL_60": 60,
979
+ "LABEL_61": 61,
980
+ "LABEL_62": 62,
981
+ "LABEL_63": 63,
982
+ "LABEL_64": 64,
983
+ "LABEL_65": 65,
984
+ "LABEL_66": 66,
985
+ "LABEL_67": 67,
986
+ "LABEL_68": 68,
987
+ "LABEL_69": 69,
988
+ "LABEL_7": 7,
989
+ "LABEL_70": 70,
990
+ "LABEL_71": 71,
991
+ "LABEL_72": 72,
992
+ "LABEL_73": 73,
993
+ "LABEL_74": 74,
994
+ "LABEL_75": 75,
995
+ "LABEL_76": 76,
996
+ "LABEL_77": 77,
997
+ "LABEL_78": 78,
998
+ "LABEL_79": 79,
999
+ "LABEL_8": 8,
1000
+ "LABEL_80": 80,
1001
+ "LABEL_81": 81,
1002
+ "LABEL_82": 82,
1003
+ "LABEL_83": 83,
1004
+ "LABEL_84": 84,
1005
+ "LABEL_85": 85,
1006
+ "LABEL_86": 86,
1007
+ "LABEL_87": 87,
1008
+ "LABEL_88": 88,
1009
+ "LABEL_89": 89,
1010
+ "LABEL_9": 9,
1011
+ "LABEL_90": 90,
1012
+ "LABEL_91": 91,
1013
+ "LABEL_92": 92,
1014
+ "LABEL_93": 93,
1015
+ "LABEL_94": 94,
1016
+ "LABEL_95": 95,
1017
+ "LABEL_96": 96,
1018
+ "LABEL_97": 97,
1019
+ "LABEL_98": 98,
1020
+ "LABEL_99": 99
1021
+ },
1022
+ "layer_norm_eps": 1e-05,
1023
+ "max_position_embeddings": 514,
1024
+ "model_type": "xlm-roberta",
1025
+ "num_attention_heads": 16,
1026
+ "num_hidden_layers": 24,
1027
+ "output_past": true,
1028
+ "pad_token_id": 1,
1029
+ "position_embedding_type": "absolute",
1030
+ "problem_type": "single_label_classification",
1031
+ "torch_dtype": "float32",
1032
+ "transformers_version": "4.52.4",
1033
+ "type_vocab_size": 1,
1034
+ "use_cache": true,
1035
+ "vocab_size": 250002
1036
+ }
data/cno11_notas.csv ADDED
The diff for this file is too large to render. See raw diff
 
data/defaults_session_state.json ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ {
2
+ "past": [],
3
+ "generated": [],
4
+ "selections": {}
5
+ }
data/idxs.csv ADDED
@@ -0,0 +1,503 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ CNO,label,idx
2
+ C4121,LABEL_0,0
3
+ C5710,LABEL_1,1
4
+ C1111,LABEL_2,2
5
+ C2122,LABEL_3,3
6
+ C5120,LABEL_4,4
7
+ C3202,LABEL_5,5
8
+ C5220,LABEL_6,6
9
+ C4442,LABEL_7,7
10
+ C2511,LABEL_8,8
11
+ C2230,LABEL_9,9
12
+ C7521,LABEL_10,10
13
+ C9210,LABEL_11,11
14
+ C1509,LABEL_12,12
15
+ C8432,LABEL_13,13
16
+ C9432,LABEL_14,14
17
+ C9100,LABEL_15,15
18
+ C1313,LABEL_16,16
19
+ C5821,LABEL_17,17
20
+ C3734,LABEL_18,18
21
+ C7121,LABEL_19,19
22
+ C5999,LABEL_20,20
23
+ C4111,LABEL_21,21
24
+ C8420,LABEL_22,22
25
+ C9811,LABEL_23,23
26
+ C5110,LABEL_24,24
27
+ C9520,LABEL_25,25
28
+ C9700,LABEL_26,26
29
+ C5300,LABEL_27,27
30
+ C5621,LABEL_28,28
31
+ C3326,LABEL_29,29
32
+ C3739,LABEL_30,30
33
+ C2623,LABEL_31,31
34
+ C5500,LABEL_32,32
35
+ C7510,LABEL_33,33
36
+ C7191,LABEL_34,34
37
+ C6120,LABEL_35,35
38
+ C5430,LABEL_36,36
39
+ C1411,LABEL_37,37
40
+ C8170,LABEL_38,38
41
+ C9820,LABEL_39,39
42
+ C2599,LABEL_40,40
43
+ C5932,LABEL_41,41
44
+ C3613,LABEL_42,42
45
+ C2425,LABEL_43,43
46
+ C7705,LABEL_44,44
47
+ C5721,LABEL_45,45
48
+ C5000,LABEL_46,46
49
+ C2112,LABEL_47,47
50
+ C7323,LABEL_48,48
51
+ C4221,LABEL_49,49
52
+ C2240,LABEL_50,50
53
+ C5629,LABEL_51,51
54
+ C3141,LABEL_52,52
55
+ C2824,LABEL_53,53
56
+ C9511,LABEL_54,54
57
+ C7613,LABEL_55,55
58
+ C3510,LABEL_56,56
59
+ C3621,LABEL_57,57
60
+ C5831,LABEL_58,58
61
+ C3731,LABEL_59,59
62
+ C7402,LABEL_60,60
63
+ C3534,LABEL_61,61
64
+ C5910,LABEL_62,62
65
+ C8411,LABEL_63,63
66
+ C2611,LABEL_64,64
67
+ C7231,LABEL_65,65
68
+ C1432,LABEL_66,66
69
+ C4422,LABEL_67,67
70
+ C7704,LABEL_68,68
71
+ C9310,LABEL_69,69
72
+ C5825,LABEL_70,70
73
+ C3123,LABEL_71,71
74
+ C7313,LABEL_72,72
75
+ C4112,LABEL_73,73
76
+ C2329,LABEL_74,74
77
+ C7703,LABEL_75,75
78
+ C2251,LABEL_76,76
79
+ C2640,LABEL_77,77
80
+ C5210,LABEL_78,78
81
+ C2451,LABEL_79,79
82
+ C7312,LABEL_80,80
83
+ C3531,LABEL_81,81
84
+ C5840,LABEL_82,82
85
+ C5812,LABEL_83,83
86
+ C2140,LABEL_84,84
87
+ C8412,LABEL_85,85
88
+ C5420,LABEL_86,86
89
+ C4210,LABEL_87,87
90
+ C3811,LABEL_88,88
91
+ C3713,LABEL_89,89
92
+ C4309,LABEL_90,90
93
+ C9431,LABEL_91,91
94
+ C8332,LABEL_92,92
95
+ C3154,LABEL_93,93
96
+ C2612,LABEL_94,94
97
+ C2462,LABEL_95,95
98
+ C7250,LABEL_96,96
99
+ C0020,LABEL_97,97
100
+ C8199,LABEL_98,98
101
+ C4421,LABEL_99,99
102
+ C3160,LABEL_100,100
103
+ C5611,LABEL_101,101
104
+ C5612,LABEL_102,102
105
+ C2622,LABEL_103,103
106
+ C3733,LABEL_104,104
107
+ C9443,LABEL_105,105
108
+ C7401,LABEL_106,106
109
+ C9602,LABEL_107,107
110
+ C1221,LABEL_108,108
111
+ C7707,LABEL_109,109
112
+ C4500,LABEL_110,110
113
+ C7533,LABEL_111,111
114
+ C4123,LABEL_112,112
115
+ C3833,LABEL_113,113
116
+ C2624,LABEL_114,114
117
+ C7403,LABEL_115,115
118
+ C7531,LABEL_116,116
119
+ C7240,LABEL_117,117
120
+ C2152,LABEL_118,118
121
+ C3522,LABEL_119,119
122
+ C1422,LABEL_120,120
123
+ C3125,LABEL_121,121
124
+ C7131,LABEL_122,122
125
+ C7193,LABEL_123,123
126
+ C5833,LABEL_124,124
127
+ C2722,LABEL_125,125
128
+ C3723,LABEL_126,126
129
+ C6410,LABEL_127,127
130
+ C1329,LABEL_128,128
131
+ C1419,LABEL_129,129
132
+ C4412,LABEL_130,130
133
+ C3401,LABEL_131,131
134
+ C9221,LABEL_132,132
135
+ C2155,LABEL_133,133
136
+ C9601,LABEL_134,134
137
+ C2220,LABEL_135,135
138
+ C5811,LABEL_136,136
139
+ C5931,LABEL_137,137
140
+ C5892,LABEL_138,138
141
+ C3724,LABEL_139,139
142
+ C2130,LABEL_140,140
143
+ C8431,LABEL_141,141
144
+ C2810,LABEL_142,142
145
+ C2151,LABEL_143,143
146
+ C9543,LABEL_144,144
147
+ C2713,LABEL_145,145
148
+ C2922,LABEL_146,146
149
+ C3715,LABEL_147,147
150
+ C5822,LABEL_148,148
151
+ C7322,LABEL_149,149
152
+ C2412,LABEL_150,150
153
+ C8331,LABEL_151,151
154
+ C6110,LABEL_152,152
155
+ C5499,LABEL_153,153
156
+ C7199,LABEL_154,154
157
+ C1326,LABEL_155,155
158
+ C2252,LABEL_156,156
159
+ C7221,LABEL_157,157
160
+ C5923,LABEL_158,158
161
+ C2111,LABEL_159,159
162
+ C5921,LABEL_160,160
163
+ C4423,LABEL_161,161
164
+ C4223,LABEL_162,162
165
+ C2323,LABEL_163,163
166
+ C1315,LABEL_164,164
167
+ C1212,LABEL_165,165
168
+ C3831,LABEL_166,166
169
+ C7132,LABEL_167,167
170
+ C3532,LABEL_168,168
171
+ C2473,LABEL_169,169
172
+ C3316,LABEL_170,170
173
+ C2932,LABEL_171,171
174
+ C2443,LABEL_172,172
175
+ C3142,LABEL_173,173
176
+ C2823,LABEL_174,174
177
+ C2424,LABEL_175,175
178
+ C2934,LABEL_176,176
179
+ C2652,LABEL_177,177
180
+ C2435,LABEL_178,178
181
+ C7709,LABEL_179,179
182
+ C5894,LABEL_180,180
183
+ C3321,LABEL_181,181
184
+ C2311,LABEL_182,182
185
+ C3722,LABEL_183,183
186
+ C3539,LABEL_184,184
187
+ C2321,LABEL_185,185
188
+ C2431,LABEL_186,186
189
+ C3313,LABEL_187,187
190
+ C1421,LABEL_188,188
191
+ C2210,LABEL_189,189
192
+ C1327,LABEL_190,190
193
+ C2481,LABEL_191,191
194
+ C3732,LABEL_192,192
195
+ C9441,LABEL_193,193
196
+ C7891,LABEL_194,194
197
+ C7232,LABEL_195,195
198
+ C5622,LABEL_196,196
199
+ C5993,LABEL_197,197
200
+ C8114,LABEL_198,198
201
+ C4411,LABEL_199,199
202
+ C9442,LABEL_200,200
203
+ C6422,LABEL_201,201
204
+ C5823,LABEL_202,202
205
+ C3611,LABEL_203,203
206
+ C8193,LABEL_204,204
207
+ C1501,LABEL_205,205
208
+ C9512,LABEL_206,206
209
+ C7404,LABEL_207,207
210
+ C8209,LABEL_208,208
211
+ C0011,LABEL_209,209
212
+ C5992,LABEL_210,210
213
+ C3820,LABEL_211,211
214
+ C3129,LABEL_212,212
215
+ C3405,LABEL_213,213
216
+ C8440,LABEL_214,214
217
+ C1322,LABEL_215,215
218
+ C2441,LABEL_216,216
219
+ C2121,LABEL_217,217
220
+ C2923,LABEL_218,218
221
+ C7111,LABEL_219,219
222
+ C3314,LABEL_220,220
223
+ C3711,LABEL_221,221
224
+ C2484,LABEL_222,222
225
+ C4113,LABEL_223,223
226
+ C3152,LABEL_224,224
227
+ C9603,LABEL_225,225
228
+ C5412,LABEL_226,226
229
+ C3521,LABEL_227,227
230
+ C3812,LABEL_228,228
231
+ C2931,LABEL_229,229
232
+ C2442,LABEL_230,230
233
+ C2432,LABEL_231,231
234
+ C1112,LABEL_232,232
235
+ C3813,LABEL_233,233
236
+ C7701,LABEL_234,234
237
+ C2469,LABEL_235,235
238
+ C3324,LABEL_236,236
239
+ C5722,LABEL_237,237
240
+ C5824,LABEL_238,238
241
+ C2156,LABEL_239,239
242
+ C2421,LABEL_240,240
243
+ C1211,LABEL_241,241
244
+ C2712,LABEL_242,242
245
+ C4430,LABEL_243,243
246
+ C2322,LABEL_244,244
247
+ C2651,LABEL_245,245
248
+ C3132,LABEL_246,246
249
+ C9420,LABEL_247,247
250
+ C3325,LABEL_248,248
251
+ C7211,LABEL_249,249
252
+ C5899,LABEL_250,250
253
+ C9222,LABEL_251,251
254
+ C3126,LABEL_252,252
255
+ C4446,LABEL_253,253
256
+ C5942,LABEL_254,254
257
+ C2422,LABEL_255,255
258
+ C4424,LABEL_256,256
259
+ C4301,LABEL_257,257
260
+ C1325,LABEL_258,258
261
+ C2453,LABEL_259,259
262
+ C3110,LABEL_260,260
263
+ C1120,LABEL_261,261
264
+ C1222,LABEL_262,262
265
+ C2426,LABEL_263,263
266
+ C7522,LABEL_264,264
267
+ C7314,LABEL_265,265
268
+ C2471,LABEL_266,266
269
+ C7212,LABEL_267,267
270
+ C2159,LABEL_268,268
271
+ C3124,LABEL_269,269
272
+ C2154,LABEL_270,270
273
+ C7315,LABEL_271,271
274
+ C8340,LABEL_272,272
275
+ C1316,LABEL_273,273
276
+ C3535,LABEL_274,274
277
+ C3203,LABEL_275,275
278
+ C5493,LABEL_276,276
279
+ C2416,LABEL_277,277
280
+ C9229,LABEL_278,278
281
+ C3153,LABEL_279,279
282
+ C2434,LABEL_280,280
283
+ C4441,LABEL_281,281
284
+ C2935,LABEL_282,282
285
+ C1323,LABEL_283,283
286
+ C3131,LABEL_284,284
287
+ C3533,LABEL_285,285
288
+ C3721,LABEL_286,286
289
+ C3155,LABEL_287,287
290
+ C6202,LABEL_288,288
291
+ C2326,LABEL_289,289
292
+ C0012,LABEL_290,290
293
+ C3143,LABEL_291,291
294
+ C3614,LABEL_292,292
295
+ C2653,LABEL_293,293
296
+ C5492,LABEL_294,294
297
+ C2613,LABEL_295,295
298
+ C2433,LABEL_296,296
299
+ C6205,LABEL_297,297
300
+ C2592,LABEL_298,298
301
+ C3523,LABEL_299,299
302
+ C2482,LABEL_300,300
303
+ C2821,LABEL_301,301
304
+ C3317,LABEL_302,302
305
+ C2921,LABEL_303,303
306
+ C8160,LABEL_304,304
307
+ C2325,LABEL_305,305
308
+ C7834,LABEL_306,306
309
+ C2324,LABEL_307,307
310
+ C3402,LABEL_308,308
311
+ C3312,LABEL_309,309
312
+ C4222,LABEL_310,310
313
+ C6300,LABEL_311,311
314
+ C4122,LABEL_312,312
315
+ C2830,LABEL_313,313
316
+ C8333,LABEL_314,314
317
+ C1312,LABEL_315,315
318
+ C7292,LABEL_316,316
319
+ C2723,LABEL_317,317
320
+ C8144,LABEL_318,318
321
+ C7293,LABEL_319,319
322
+ C3122,LABEL_320,320
323
+ C1311,LABEL_321,321
324
+ C2630,LABEL_322,322
325
+ C2621,LABEL_323,323
326
+ C3832,LABEL_324,324
327
+ C3403,LABEL_325,325
328
+ C9434,LABEL_326,326
329
+ C2423,LABEL_327,327
330
+ C3329,LABEL_328,328
331
+ C1321,LABEL_329,329
332
+ C7622,LABEL_330,330
333
+ C7294,LABEL_331,331
334
+ C9490,LABEL_332,332
335
+ C2312,LABEL_333,333
336
+ C7702,LABEL_334,334
337
+ C2513,LABEL_335,335
338
+ C6423,LABEL_336,336
339
+ C2911,LABEL_337,337
340
+ C1429,LABEL_338,338
341
+ C2483,LABEL_339,339
342
+ C7618,LABEL_340,340
343
+ C7835,LABEL_341,341
344
+ C6430,LABEL_342,342
345
+ C7708,LABEL_343,343
346
+ C7706,LABEL_344,344
347
+ C7837,LABEL_345,345
348
+ C5893,LABEL_346,346
349
+ C2123,LABEL_347,347
350
+ C2719,LABEL_348,348
351
+ C1431,LABEL_349,349
352
+ C7616,LABEL_350,350
353
+ C5991,LABEL_351,351
354
+ C2625,LABEL_352,352
355
+ C3404,LABEL_353,353
356
+ C3631,LABEL_354,354
357
+ C7820,LABEL_355,355
358
+ C2158,LABEL_356,356
359
+ C8191,LABEL_357,357
360
+ C2439,LABEL_358,358
361
+ C2825,LABEL_359,359
362
+ C7831,LABEL_360,360
363
+ C2413,LABEL_361,361
364
+ C2721,LABEL_362,362
365
+ C4443,LABEL_363,363
366
+ C2933,LABEL_364,364
367
+ C2153,LABEL_365,365
368
+ C3151,LABEL_366,366
369
+ C2415,LABEL_367,367
370
+ C3327,LABEL_368,368
371
+ C9320,LABEL_369,369
372
+ C2461,LABEL_370,370
373
+ C9223,LABEL_371,371
374
+ C7612,LABEL_372,372
375
+ C3629,LABEL_373,373
376
+ C9530,LABEL_374,374
377
+ C1223,LABEL_375,375
378
+ C2411,LABEL_376,376
379
+ C2465,LABEL_377,377
380
+ C1219,LABEL_378,378
381
+ C7619,LABEL_379,379
382
+ C1324,LABEL_380,380
383
+ C3712,LABEL_381,381
384
+ C8142,LABEL_382,382
385
+ C8321,LABEL_383,383
386
+ C8131,LABEL_384,384
387
+ C4445,LABEL_385,385
388
+ C2427,LABEL_386,386
389
+ C7122,LABEL_387,387
390
+ C2822,LABEL_388,388
391
+ C8121,LABEL_389,389
392
+ C2936,LABEL_390,390
393
+ C7621,LABEL_391,391
394
+ C2912,LABEL_392,392
395
+ C3612,LABEL_393,393
396
+ C7223,LABEL_394,394
397
+ C7894,LABEL_395,395
398
+ C8311,LABEL_396,396
399
+ C7611,LABEL_397,397
400
+ C9410,LABEL_398,398
401
+ C8133,LABEL_399,399
402
+ C7532,LABEL_400,400
403
+ C2512,LABEL_401,401
404
+ C2463,LABEL_402,402
405
+ C6201,LABEL_403,403
406
+ C3128,LABEL_404,404
407
+ C1113,LABEL_405,405
408
+ C3121,LABEL_406,406
409
+ C3133,LABEL_407,407
410
+ C7321,LABEL_408,408
411
+ C8111,LABEL_409,409
412
+ C3204,LABEL_410,410
413
+ C2157,LABEL_411,411
414
+ C6209,LABEL_412,412
415
+ C2437,LABEL_413,413
416
+ C1314,LABEL_414,414
417
+ C5941,LABEL_415,415
418
+ C9433,LABEL_416,416
419
+ C3814,LABEL_417,417
420
+ C5922,LABEL_418,418
421
+ C2711,LABEL_419,419
422
+ C7614,LABEL_420,420
423
+ C8145,LABEL_421,421
424
+ C6421,LABEL_422,422
425
+ C8112,LABEL_423,423
426
+ C7892,LABEL_424,424
427
+ C8202,LABEL_425,425
428
+ C2472,LABEL_426,426
429
+ C2591,LABEL_427,427
430
+ C3331,LABEL_428,428
431
+ C8322,LABEL_429,429
432
+ C3139,LABEL_430,430
433
+ C3714,LABEL_431,431
434
+ C2414,LABEL_432,432
435
+ C3716,LABEL_433,433
436
+ C2939,LABEL_434,434
437
+ C7899,LABEL_435,435
438
+ C3632,LABEL_436,436
439
+ C5411,LABEL_437,437
440
+ C2436,LABEL_438,438
441
+ C2452,LABEL_439,439
442
+ C2454,LABEL_440,440
443
+ C2464,LABEL_441,441
444
+ C2466,LABEL_442,442
445
+ C2729,LABEL_443,443
446
+ C2937,LABEL_444,444
447
+ C3127,LABEL_445,445
448
+ C3134,LABEL_446,446
449
+ C3135,LABEL_447,447
450
+ C3201,LABEL_448,448
451
+ C3205,LABEL_449,449
452
+ C3206,LABEL_450,450
453
+ C3207,LABEL_451,451
454
+ C3209,LABEL_452,452
455
+ C3311,LABEL_453,453
456
+ C3315,LABEL_454,454
457
+ C3322,LABEL_455,455
458
+ C3323,LABEL_456,456
459
+ C3339,LABEL_457,457
460
+ C3622,LABEL_458,458
461
+ C3623,LABEL_459,459
462
+ C4444,LABEL_460,460
463
+ C5491,LABEL_461,461
464
+ C5832,LABEL_462,462
465
+ C5891,LABEL_463,463
466
+ C5895,LABEL_464,464
467
+ C6203,LABEL_465,465
468
+ C6204,LABEL_466,466
469
+ C7112,LABEL_467,467
470
+ C7192,LABEL_468,468
471
+ C7222,LABEL_469,469
472
+ C7291,LABEL_470,470
473
+ C7295,LABEL_471,471
474
+ C7311,LABEL_472,472
475
+ C7324,LABEL_473,473
476
+ C7405,LABEL_474,474
477
+ C7615,LABEL_475,475
478
+ C7617,LABEL_476,476
479
+ C7623,LABEL_477,477
480
+ C7811,LABEL_478,478
481
+ C7812,LABEL_479,479
482
+ C7832,LABEL_480,480
483
+ C7833,LABEL_481,481
484
+ C7836,LABEL_482,482
485
+ C7893,LABEL_483,483
486
+ C8113,LABEL_484,484
487
+ C8122,LABEL_485,485
488
+ C8132,LABEL_486,486
489
+ C8141,LABEL_487,487
490
+ C8143,LABEL_488,488
491
+ C8151,LABEL_489,489
492
+ C8152,LABEL_490,490
493
+ C8153,LABEL_491,491
494
+ C8154,LABEL_492,492
495
+ C8155,LABEL_493,493
496
+ C8156,LABEL_494,494
497
+ C8159,LABEL_495,495
498
+ C8192,LABEL_496,496
499
+ C8201,LABEL_497,497
500
+ C8312,LABEL_498,498
501
+ C9541,LABEL_499,499
502
+ C9542,LABEL_500,500
503
+ C9812,LABEL_501,501
data/metadata.json ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "EDAD_RANGO": [
3
+ { "numerico": 0, "textual": ""},
4
+ { "numerico": 1, "textual": "Menor de 20 años" },
5
+ { "numerico": 2, "textual": "Entre 20 y 29 años" },
6
+ { "numerico": 3, "textual": "Entre 30 y 39 años" },
7
+ { "numerico": 4, "textual": "Entre 40 y 49 años" },
8
+ { "numerico": 5, "textual": "Entre 50 y 59 años" },
9
+ { "numerico": 6, "textual": "Entre 60 y 69 años" },
10
+ { "numerico": 7, "textual": "Entre 70 y 79 años" },
11
+ { "numerico": 8, "textual": "Con 80 años o más" }
12
+ ],
13
+
14
+ "SEXO": [
15
+ { "numerico": 0, "textual": ""},
16
+ { "numerico": 1, "textual": "Hombre" },
17
+ { "numerico": 6, "textual": "Mujer" }
18
+ ],
19
+
20
+ "INGRESOS": [
21
+ { "numerico": 0, "textual": ""},
22
+ { "numerico": 1, "textual": "Ingresos del hogar hasta 500€" },
23
+ { "numerico": 2, "textual": "Ingresos del hogar de más de 500€ hasta 1000€" },
24
+ { "numerico": 3, "textual": "Ingresos del hogar de más de 1000€ hasta 1500€" },
25
+ { "numerico": 4, "textual": "Ingresos del hogar de más de 1500€ hasta 2000€" },
26
+ { "numerico": 5, "textual": "Ingresos del hogar de más de 2000€ hasta 2500€" },
27
+ { "numerico": 6, "textual": "Ingresos del hogar de más de 2500€ hasta 3500€" },
28
+ { "numerico": 7, "textual": "Ingresos del hogar de más de 3500€" }
29
+ ],
30
+
31
+ "ESTUDIOS": [
32
+ { "numerico": 0, "textual": ""},
33
+ { "numerico": 1, "textual": "No sabe leer ni escribir" },
34
+ { "numerico": 2, "textual": "Sabe leer y escribir pero fue menos de 5 años a la escuela" },
35
+ { "numerico": 3, "textual": "Sabe leer y escribir y fue a la escuela 5 o más años sin completar: EGB, 3º ESO, Bachillerato Elemental o certificado de escolaridad. En esta categoría se incluye la Formación Básica Inicial de adultos completada" },
36
+ { "numerico": 4, "textual": "Cursado 3º curso o superior de ESO sin título de Graduado en ESO, cursada la EGB completa sin título de Graduado Escolar, certificado de escolaridad" },
37
+ { "numerico": 5, "textual": "EGB terminada (Graduado Escolar), Graduado en ESO, Bachillerato Elemental, certificado de estudios primarios o de profesionalidad niveles 1 y 2. Se incluye Formación Básica Postinicial de adultos terminada" },
38
+ { "numerico": 6, "textual": "Bachiller Superior, BUP, Bachiller, COU, PREU" },
39
+ { "numerico": 7, "textual": "FP1, Ciclo Formativo de Grado Medio, título de técnico auxiliar o equivalente. Incluye enseñanzas profesionales de música y/o danza, certificado de nivel avanzado de la Escuela Oficial de Idiomas y certificado de profesionalidad de nivel 3, Oficialía Industrial y Formación Profesional Básica" },
40
+ { "numerico": 8, "textual": "FP2, Ciclo Formativo de Grado Superior, Maestría Industrial. Título de técnico especialista o equivalente" },
41
+ { "numerico": 9, "textual": "Diplomatura, Grado, títulos superiores de música y/o danza" },
42
+ { "numerico": 10, "textual": "Licenciatura, Máster universitario u otros estudios de postgrado (especialistas, expertos)" },
43
+ { "numerico": 11, "textual": "Doctorado" }
44
+ ],
45
+
46
+ "SITUACION": [
47
+ { "numerico": 0, "textual": ""},
48
+ { "numerico": 1, "textual": "Asalariado/a del sector privado" },
49
+ { "numerico": 2, "textual": "Asalariado/a del sector público" },
50
+ { "numerico": 3, "textual": "Trabajador/a bajo programa público de empleo remunerado" },
51
+ { "numerico": 4, "textual": "Aprendiz remunerado" },
52
+ { "numerico": 5, "textual": "Empleador/a (no miembro de cooperativa) con menos de 10 empleados" },
53
+ { "numerico": 6, "textual": "Empleador/a (no miembro de cooperativa) con 10 o más empleados" },
54
+ { "numerico": 7, "textual": "Empresario/a sin asalariados o trabajador/a / profesional independiente / autónomo/a" },
55
+ { "numerico": 8, "textual": "Ayuda en negocios familiares" }
56
+ ]
57
+ }
data/problematic_cnos.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ "C1120", "C1219", "C1223", "C1312", "C1313", "C1316", "C1325", "C1419", "C1431", "C1432",
3
+ "C1501", "C2324", "C2413", "C2424", "C2471", "C2611", "C2622", "C2623", "C2722", "C2121",
4
+ "C3121", "C3122", "C3123", "C3124", "C3128", "C3129", "C3141", "C3143", "C3203", "C3339",
5
+ "C3402", "C3403", "C3404", "C3535", "C3621", "C3811", "C3713", "C4112", "C4113", "C4122",
6
+ "C5722", "C5824", "C5899", "C5210", "C6421", "C6422", "C7122", "C7223", "C7293", "C7314",
7
+ "C7521", "C7705", "C8121", "C8193", "C8199", "C9410", "C2431", "C2512", "C3125", "C3132",
8
+ "C3715", "C5612", "C6202", "C7211", "C7618"
9
+ ]
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:438a27f6fa94ae0c75c8f696083fd83bbb0fa90ffcd892d5424c1e059fab5697
3
+ size 135
requirements.txt CHANGED
@@ -1,3 +1,6 @@
1
- altair
2
- pandas
3
- streamlit
 
 
 
 
1
+ transformers
2
+ torch
3
+ streamlit
4
+ streamlit-chat
5
+ sentencepiece
6
+ dotenv
sentencepiece.bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b8a54190d2b9256881ed34ab5428786629f929dd5a579350a6ef4735b86a9208
3
+ size 132
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:de6f09c3f9b891e5b98dd3af9463dcab5a97d5265e288271395324a0577e6c05
3
+ size 133
tokenizer_config.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "extra_special_tokens": {},
49
+ "mask_token": "<mask>",
50
+ "model_max_length": 512,
51
+ "pad_token": "<pad>",
52
+ "sep_token": "</s>",
53
+ "tokenizer_class": "XLMRobertaTokenizer",
54
+ "unk_token": "<unk>"
55
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fdff2120bb73ad47b72318a842a1c40c46b6c14d6871c0c6ab45e1318c3b28c8
3
+ size 129
utils/__init__.py ADDED
File without changes
utils/__pycache__/__init__.cpython-312.pyc ADDED
Binary file (157 Bytes). View file
 
utils/__pycache__/cno_utils.cpython-312.pyc ADDED
Binary file (1.98 kB). View file
 
utils/cno_utils.py ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from huggingface_hub import hf_hub_download
2
+ import pandas as pd
3
+
4
+ def _load_label_mapping():
5
+ csv_path = hf_hub_download(repo_id="bob-nlp/A5-CNO-BOB-ISTAC-D12", filename="data/idxs.csv")
6
+ df = pd.read_csv(csv_path)
7
+ _label_mapping = dict(zip(df['label'], df['CNO']))
8
+ return _label_mapping
9
+
10
+ def _load_description_mapping():
11
+ csv_path = hf_hub_download(repo_id="bob-nlp/A5-CNO-BOB-ISTAC-D12", filename="data/cno11_notas.csv")
12
+ df = pd.read_csv(csv_path)
13
+ _description_mapping = dict(zip(df['CNO'], df['DN4']))
14
+ return _description_mapping
15
+
16
+ def convert_to_cno(output_label):
17
+ """
18
+ Converts a model label (e.g., 'LABEL_0') to the CNO format (e.g., 'C1111').
19
+
20
+ Parameters:
21
+ output_label (str): Label from the model (like 'LABEL_0')
22
+ model_id (str): The Hugging Face model ID
23
+
24
+ Returns:
25
+ str: Converted label, or original if not found.
26
+ """
27
+ mapping = _load_label_mapping()
28
+ return mapping.get(output_label, output_label)
29
+
30
+ def get_cno_description(cno):
31
+ """
32
+ Retrieves the description for a given CNO code.
33
+
34
+ Parameters:
35
+ cno (str): The CNO code (e.g., 'C1111')
36
+
37
+ Returns:
38
+ str: Description of the CNO code, or 'Unknown' if not found.
39
+ """
40
+ mapping = _load_description_mapping()
41
+ return mapping.get(cno, 'Unknown')