End of training

Browse files

Files changed (5) hide show

README.md +97 -185
config.json +42 -0
model.safetensors +1 -1
runs/Jul17_14-34-55_e305321a40a7/events.out.tfevents.1752762895.e305321a40a7.745.1 +0 -0
training_args.bin +0 -0

README.md CHANGED Viewed

@@ -7,192 +7,104 @@ tags:
 model-index:
 - name: schedulebot-nlu-engine
   results: []
-datasets:
-- andreaceto/hasd
-language:
-- en
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# Schedulebot-nlu-engine
-## Model Description
-This model is a multi-task Natural Language Understanding (NLU) engine designed specifically for an appointment scheduling chatbot. It is fine-tuned from a **`distilbert-base-uncased`** backbone and is capable of performing two tasks simultaneously:
-- **Intent Classification**: Identifying the user's primary goal (e.g., `schedule`, `cancel`).
-- **Named Entity Recognition (NER)**: Extracting custom, domain-specific entities (e.g., `appointment_type`).
-This model stands out due to its custom classification heads, which use a more complex architecture to improve performance on nuanced tasks.
-## Model Architecture
-The model uses a standard `distilbert-base-uncased` model as its core feature extractor. Two custom classification "heads" are placed on top of this base to perform the downstream tasks.
-- **Base Model**: `distilbert-base-uncased`
-- **Classifier Heads**: each head is a Multi-Layer Perceptron (MLP) with the following structure to allow for more complex feature interpretation:
-    1. A Linear layer projecting the transformer's output dimension (768) to an intermediate size (384).
-    2. A GELU activation function.
-    3. A Dropout layer with a rate of 0.3 for regularization.
-    4. A final Linear layer projecting the intermediate size to the number of output labels for the specific task (intent or NER).
-## Intended Use
-This model is intended to be the core NLU component of a conversational AI system for managing appointments.
-For instructions on how to use the model check the [dedicated file](./how_to_use.md).
-## Training Data
-The model was trained on the **HASD (Hybrid Appointment Scheduling Dataset)**, a custom dataset built specifically for this task.
-- **Source**: The dataset is a hybrid of real-world conversational examples from `clinc/clinc_oos` (for simple intents) and synthetically generated, template-based examples for complex scheduling intents.
-- **Balancing**: To combat class imbalance, intents sourced from `clinc/clinc_oos` were **down-sampled** to a maximum of **150 examples** each.
-- **Augmentation**: To increase data diversity for complex intents (`schedule`, `reschedule`, etc.), **Contextual Word Replacement** was used. A `distilbert-base-uncased` model augmented the templates by replacing non-placeholder words with contextually relevant synonyms.
-The dataset is available [here](https://huggingface.co/datasets/andreaceto/hasd).
-### Intents
-The model is trained to recognize the following intents:
-`schedule`, `reschedule`, `cancel`, `query_avail`, `greeting`, `positive_reply`, `negative_reply`, `bye`, `oos` (out-of-scope).
-### Entities
-The model is trained to recognize the following custom named entities:
-`practitioner_name`, `appointment_type`, `appointment_id`.
-## Training Procedure
-The model was trained using a two-stage fine-tuning strategy to ensure stability and performance.
-### Stage 1: Training the Classifier Heads
-- The `distilbert-base-uncased` base model was entirely **frozen**.
-- Only the randomly initialized MLP heads for intent and NER classification were trained.
-**Setup**:
-```python
-# Define a data collator to handle padding for token classification
-data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)
-# Define Training Arguments
-training_args = TrainingArguments(
-    output_dir="path/to/output_dir",
-    overwrite_output_dir=True,
-    num_train_epochs=200,               # Training epochs
-    per_device_train_batch_size=32,
-    per_device_eval_batch_size=32,
-    learning_rate=1e-4,                 # Learning Rate
-    weight_decay=1e-5,                  # AdamW weight decay
-    logging_dir="path/to/logging_dir",
-    logging_strategy="epoch",
-    eval_strategy="epoch",
-    save_strategy="epoch",
-    load_best_model_at_end=True,
-    metric_for_best_model="eval_loss",     # Focus on validation loss as the key metric
-    # --- Hub Arguments ---
-    push_to_hub=True,
-    hub_model_id=hub_model_id,
-    hub_strategy="end",
-    hub_token=hf_token,
-    report_to="tensorboard"             # Tensorboard to monitor training
-)
-# Create the Trainer
-trainer = Trainer(
-    model=model,
-    args=training_args,
-    train_dataset=processed_datasets["train"],
-    eval_dataset=processed_datasets["validation"],
-    processing_class=tokenizer,
-    data_collator=data_collator,
-    compute_metrics=compute_metrics,  # Custom function (check how_to_use.md)
-    callbacks=[EarlyStoppingCallback(early_stopping_patience=10)]
-)
-```
-### Stage 2: Selective Fine-Tuning
-- The DistilBERT backbone was entirely **unfrozen**.
-- Using a very low LR allows the model to adapt even better to the new data while preserving the powerful, general-purpose knowledge.
-**Setup**:
-```python
-# Define Training Arguments
-training_args = TrainingArguments(
-    output_dir="path/to/output_dir",
-    overwrite_output_dir=True,
-    num_train_epochs=50,               # Fine-tuning epochs
-    per_device_train_batch_size=32,
-    per_device_eval_batch_size=32,
-    learning_rate=1e-6,                 # Learning Rate
-    weight_decay=1e-3,                  # AdamW weight decay
-    logging_dir="path/to/logging_dir",
-    logging_strategy="epoch",
-    eval_strategy="epoch",
-    save_strategy="epoch",
-    load_best_model_at_end=True,
-    metric_for_best_model="eval_loss",     # Focus on NER F1 as the key metric
-    # --- Hub Arguments ---
-    push_to_hub=True,
-    hub_model_id=hub_model_id,
-    hub_strategy="end",
-    hub_token=hf_token,
-    report_to="tensorboard"             # Tensorboard to monitor training
-)
-# Create the Trainer
-trainer = Trainer(
-    model=model,
-    args=training_args,
-    train_dataset=processed_datasets["train"],
-    eval_dataset=processed_datasets["validation"],
-    processing_class=tokenizer,
-    data_collator=data_collator,
-    compute_metrics=compute_metrics,  # Custom function (check how_to_use.md)
-    callbacks=[EarlyStoppingCallback(early_stopping_patience=5)]
-)
-```
-## Evaluation
-The model was evaluated on a held-out test set, and its performance was measured for both tasks.
-### Intent Classification Performance
-| Intent        | Precision | Recall | F1-Score |  Support |
-| ---           | ---       | ---    | ---      | ---      |
-|           bye | 0.8636    | 0.8261 | 0.8444   | 23       |
-|        cancel | 0.8902    | 0.8795 | 0.8848   | 83       |
-|      greeting | 0.8636    | 0.8636 | 0.8636   | 22       |
-|negative_reply | 0.9048    | 0.8636 | 0.8837   | 22       |
-|           oos | 0.9524    | 0.8696 | 0.9091   | 23       |
-|positive_reply | 0.7308    | 0.8636 | 0.7917   | 22       |
-|   query_avail | 0.9268    | 0.9383 | 0.9325   | 81       |
-|    reschedule | 0.8974    | 0.8434 | 0.8696   | 83       |
-|      schedule | 0.8824    | 0.9375 | 0.9091   | 80       |
-| ---           | ---       | ---    | ---      | ----     |
-| **Accuracy**     |               |            | **0.8884**   | 439      |
-| **Macro Avg**    |    **0.8791** | **0.8761** | **0.8765**   | 439      |
-| **Weighted Avg** |    **0.8902** | **0.8884** | **0.8885**   | 439      |
-### NER (Token Classification) Performance
-| Entity              | Precision | Recall | F1-Score |  Support |
-| ---                 | ---       | ---    | ---      | ---      |
-| B-appointment_id    | 0.9813    | 0.9705 | 0.9759   | 271      |
-| B-appointment_type  | 0.8517    | 0.7943 | 0.8220   | 282      |
-| B-practitioner_name | 0.9540    | 0.9210 | 0.9372   | 405      |
-| O                   | 0.9782    | 0.9874 | 0.9828   | 3813     |
-| ---                 | ---       | ---    | ---      | ----     |
-| **Accuracy**        |           |        | 0.9694   | 4771     |
-| **Macro Avg**       | 0.9413    | 0.9183 | 0.9295   | 4771     |
-| **Weighted Avg**    | 0.9688    | 0.9694 | 0.9690   | 4771     |
-The model achieves near-perfect results on the NER task and excellent results on the intent classification task for this specific dataset.
-## Limitations and Bias
-- The model's performance is highly dependent on the quality and scope of the **HASD dataset**. It may not generalize well to phrasing or appointment types significantly different from what it was trained on.
-- The dataset was primarily generated from templates, which may not capture the full diversity of real human language.
-- The model inherits any biases present in the `distilbert-base-uncased` model and the `clinc/clinc_oos` dataset.

 model-index:
 - name: schedulebot-nlu-engine
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# schedulebot-nlu-engine
+This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.3194
+- Intent Accuracy: 0.9224
+- Intent F1: 0.9216
+- Ner F1: 0.9320
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 1e-06
+- train_batch_size: 32
+- eval_batch_size: 32
+- seed: 42
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: linear
+- num_epochs: 50
+### Training results
+| Training Loss | Epoch | Step | Validation Loss | Intent Accuracy | Intent F1 | Ner F1 |
+|:-------------:|:-----:|:----:|:---------------:|:---------------:|:---------:|:------:|
+| No log        | 1.0   | 64   | 0.6763          | 0.8196          | 0.8178    | 0.9239 |
+| No log        | 2.0   | 128  | 0.6300          | 0.8470          | 0.8460    | 0.9227 |
+| No log        | 3.0   | 192  | 0.6008          | 0.8356          | 0.8347    | 0.9239 |
+| No log        | 4.0   | 256  | 0.5762          | 0.8539          | 0.8541    | 0.9240 |
+| No log        | 5.0   | 320  | 0.5599          | 0.8470          | 0.8468    | 0.9246 |
+| No log        | 6.0   | 384  | 0.5391          | 0.8493          | 0.8483    | 0.9263 |
+| No log        | 7.0   | 448  | 0.5222          | 0.8676          | 0.8670    | 0.9256 |
+| 0.8885        | 8.0   | 512  | 0.5053          | 0.8607          | 0.8603    | 0.9269 |
+| 0.8885        | 9.0   | 576  | 0.4875          | 0.8607          | 0.8597    | 0.9279 |
+| 0.8885        | 10.0  | 640  | 0.4723          | 0.8721          | 0.8708    | 0.9274 |
+| 0.8885        | 11.0  | 704  | 0.4599          | 0.8858          | 0.8854    | 0.9297 |
+| 0.8885        | 12.0  | 768  | 0.4536          | 0.8973          | 0.8966    | 0.9291 |
+| 0.8885        | 13.0  | 832  | 0.4432          | 0.8790          | 0.8783    | 0.9279 |
+| 0.8885        | 14.0  | 896  | 0.4334          | 0.8881          | 0.8873    | 0.9290 |
+| 0.8885        | 15.0  | 960  | 0.4268          | 0.8813          | 0.8806    | 0.9295 |
+| 0.6688        | 16.0  | 1024 | 0.4180          | 0.8881          | 0.8872    | 0.9295 |
+| 0.6688        | 17.0  | 1088 | 0.4119          | 0.8995          | 0.8991    | 0.9296 |
+| 0.6688        | 18.0  | 1152 | 0.4061          | 0.8973          | 0.8964    | 0.9290 |
+| 0.6688        | 19.0  | 1216 | 0.3949          | 0.8950          | 0.8940    | 0.9285 |
+| 0.6688        | 20.0  | 1280 | 0.3899          | 0.9018          | 0.9012    | 0.9296 |
+| 0.6688        | 21.0  | 1344 | 0.3855          | 0.9087          | 0.9083    | 0.9302 |
+| 0.6688        | 22.0  | 1408 | 0.3768          | 0.8950          | 0.8942    | 0.9296 |
+| 0.6688        | 23.0  | 1472 | 0.3756          | 0.8950          | 0.8948    | 0.9308 |
+| 0.5511        | 24.0  | 1536 | 0.3693          | 0.9110          | 0.9100    | 0.9308 |
+| 0.5511        | 25.0  | 1600 | 0.3658          | 0.9064          | 0.9057    | 0.9308 |
+| 0.5511        | 26.0  | 1664 | 0.3598          | 0.9110          | 0.9101    | 0.9320 |
+| 0.5511        | 27.0  | 1728 | 0.3647          | 0.9041          | 0.9035    | 0.9309 |
+| 0.5511        | 28.0  | 1792 | 0.3500          | 0.9201          | 0.9190    | 0.9314 |
+| 0.5511        | 29.0  | 1856 | 0.3466          | 0.9155          | 0.9145    | 0.9314 |
+| 0.5511        | 30.0  | 1920 | 0.3481          | 0.9155          | 0.9149    | 0.9314 |
+| 0.5511        | 31.0  | 1984 | 0.3431          | 0.9155          | 0.9150    | 0.9314 |
+| 0.4859        | 32.0  | 2048 | 0.3409          | 0.9110          | 0.9104    | 0.9314 |
+| 0.4859        | 33.0  | 2112 | 0.3404          | 0.9201          | 0.9195    | 0.9308 |
+| 0.4859        | 34.0  | 2176 | 0.3346          | 0.9132          | 0.9127    | 0.9309 |
+| 0.4859        | 35.0  | 2240 | 0.3324          | 0.9201          | 0.9192    | 0.9309 |
+| 0.4859        | 36.0  | 2304 | 0.3306          | 0.9178          | 0.9170    | 0.9309 |
+| 0.4859        | 37.0  | 2368 | 0.3309          | 0.9178          | 0.9173    | 0.9314 |
+| 0.4859        | 38.0  | 2432 | 0.3289          | 0.9178          | 0.9173    | 0.9314 |
+| 0.4859        | 39.0  | 2496 | 0.3272          | 0.9201          | 0.9195    | 0.9314 |
+| 0.4434        | 40.0  | 2560 | 0.3259          | 0.9178          | 0.9173    | 0.9314 |
+| 0.4434        | 41.0  | 2624 | 0.3240          | 0.9201          | 0.9193    | 0.9314 |
+| 0.4434        | 42.0  | 2688 | 0.3228          | 0.9224          | 0.9216    | 0.9326 |
+| 0.4434        | 43.0  | 2752 | 0.3243          | 0.9178          | 0.9173    | 0.9320 |
+| 0.4434        | 44.0  | 2816 | 0.3248          | 0.9201          | 0.9195    | 0.9314 |
+| 0.4434        | 45.0  | 2880 | 0.3218          | 0.9224          | 0.9216    | 0.9320 |
+| 0.4434        | 46.0  | 2944 | 0.3213          | 0.9224          | 0.9216    | 0.9320 |
+| 0.4221        | 47.0  | 3008 | 0.3205          | 0.9224          | 0.9216    | 0.9320 |
+| 0.4221        | 48.0  | 3072 | 0.3195          | 0.9224          | 0.9216    | 0.9320 |
+| 0.4221        | 49.0  | 3136 | 0.3196          | 0.9224          | 0.9216    | 0.9320 |
+| 0.4221        | 50.0  | 3200 | 0.3194          | 0.9224          | 0.9216    | 0.9320 |
+### Framework versions
+- Transformers 4.53.2
+- Pytorch 2.6.0+cu124
+- Datasets 4.0.0
+- Tokenizers 0.21.2

config.json CHANGED Viewed

@@ -7,11 +7,53 @@
   "dim": 768,
   "dropout": 0.1,
   "hidden_dim": 3072,
   "initializer_range": 0.02,
   "max_position_embeddings": 512,
   "model_type": "distilbert",
   "n_heads": 12,
   "n_layers": 6,
   "pad_token_id": 0,
   "qa_dropout": 0.1,
   "seq_classif_dropout": 0.2,

   "dim": 768,
   "dropout": 0.1,
   "hidden_dim": 3072,
+  "id2label": [
+    "bye",
+    "cancel",
+    "greeting",
+    "negative_reply",
+    "oos",
+    "positive_reply",
+    "query_avail",
+    "reschedule",
+    "schedule"
+  ],
+  "id2label_ner": [
+    "O",
+    "B-appointment_id",
+    "I-appointment_id",
+    "B-appointment_type",
+    "I-appointment_type",
+    "B-practitioner_name",
+    "I-practitioner_name"
+  ],
   "initializer_range": 0.02,
+  "label2id": {
+    "bye": 0,
+    "cancel": 1,
+    "greeting": 2,
+    "negative_reply": 3,
+    "oos": 4,
+    "positive_reply": 5,
+    "query_avail": 6,
+    "reschedule": 7,
+    "schedule": 8
+  },
+  "label2id_ner": {
+    "B-appointment_id": 1,
+    "B-appointment_type": 3,
+    "B-practitioner_name": 5,
+    "I-appointment_id": 2,
+    "I-appointment_type": 4,
+    "I-practitioner_name": 6,
+    "O": 0
+  },
   "max_position_embeddings": 512,
   "model_type": "distilbert",
   "n_heads": 12,
   "n_layers": 6,
+  "num_intent_labels": 9,
+  "num_ner_labels": 7,
   "pad_token_id": 0,
   "qa_dropout": 0.1,
   "seq_classif_dropout": 0.2,

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:771362ffd2649dfac75137c8fb23415627356fd9ed2deab53c4b2d41215db76e
 size 267851552

 version https://git-lfs.github.com/spec/v1
+oid sha256:2ae2a3c417ae03007fca71c860470d2c13b8543d3a7aea9c9e8c646b4cc9fcd5
 size 267851552

runs/Jul17_14-34-55_e305321a40a7/events.out.tfevents.1752762895.e305321a40a7.745.1 ADDED Viewed

Binary file (29 kB). View file

training_args.bin CHANGED Viewed

Binary files a/training_args.bin and b/training_args.bin differ