End of training

Browse files

Files changed (5) hide show

README.md +97 -185
config.json +16 -16
model.safetensors +1 -1
runs/Jul17_15-56-14_69c2e588e3b9/events.out.tfevents.1752767775.69c2e588e3b9.567.1 +0 -0
training_args.bin +0 -0

README.md CHANGED Viewed

@@ -7,192 +7,104 @@ tags:
 model-index:
 - name: schedulebot-nlu-engine
   results: []
-datasets:
-- andreaceto/hasd
-language:
-- en
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# Schedulebot-nlu-engine
-## Model Description
-This model is a multi-task Natural Language Understanding (NLU) engine designed specifically for an appointment scheduling chatbot. It is fine-tuned from a **`distilbert-base-uncased`** backbone and is capable of performing two tasks simultaneously:
-- **Intent Classification**: Identifying the user's primary goal (e.g., `schedule`, `cancel`).
-- **Named Entity Recognition (NER)**: Extracting custom, domain-specific entities (e.g., `appointment_type`).
-This model stands out due to its custom classification heads, which use a more complex architecture to improve performance on nuanced tasks.
-## Model Architecture
-The model uses a standard `distilbert-base-uncased` model as its core feature extractor. Two custom classification "heads" are placed on top of this base to perform the downstream tasks.
-- **Base Model**: `distilbert-base-uncased`
-- **Classifier Heads**: each head is a Multi-Layer Perceptron (MLP) with the following structure to allow for more complex feature interpretation:
-    1. A Linear layer projecting the transformer's output dimension (768) to an intermediate size (384).
-    2. A GELU activation function.
-    3. A Dropout layer with a rate of 0.3 for regularization.
-    4. A final Linear layer projecting the intermediate size to the number of output labels for the specific task (intent or NER).
-## Intended Use
-This model is intended to be the core NLU component of a conversational AI system for managing appointments.
-For instructions on how to use the model check the [dedicated file](./how_to_use.md).
-## Training Data
-The model was trained on the **HASD (Hybrid Appointment Scheduling Dataset)**, a custom dataset built specifically for this task.
-- **Source**: The dataset is a hybrid of real-world conversational examples from `clinc/clinc_oos` (for simple intents) and synthetically generated, template-based examples for complex scheduling intents.
-- **Balancing**: To combat class imbalance, intents sourced from `clinc/clinc_oos` were **down-sampled** to a maximum of **150 examples** each.
-- **Augmentation**: To increase data diversity for complex intents (`schedule`, `reschedule`, etc.), **Contextual Word Replacement** was used. A `distilbert-base-uncased` model augmented the templates by replacing non-placeholder words with contextually relevant synonyms.
-The dataset is available [here](https://huggingface.co/datasets/andreaceto/hasd).
-### Intents
-The model is trained to recognize the following intents:
-`schedule`, `reschedule`, `cancel`, `query_avail`, `greeting`, `positive_reply`, `negative_reply`, `bye`, `oos` (out-of-scope).
-### Entities
-The model is trained to recognize the following custom named entities:
-`practitioner_name`, `appointment_type`, `appointment_id`.
-## Training Procedure
-The model was trained using a two-stage fine-tuning strategy to ensure stability and performance.
-### Stage 1: Training the Classifier Heads
-- The `distilbert-base-uncased` base model was entirely **frozen**.
-- Only the randomly initialized MLP heads for intent and NER classification were trained.
-**Setup**:
-```python
-# Define a data collator to handle padding for token classification
-data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)
-# Define Training Arguments
-training_args = TrainingArguments(
-    output_dir="path/to/output_dir",
-    overwrite_output_dir=True,
-    num_train_epochs=200,               # Training epochs
-    per_device_train_batch_size=32,
-    per_device_eval_batch_size=32,
-    learning_rate=1e-4,                 # Learning Rate
-    weight_decay=1e-5,                  # AdamW weight decay
-    logging_dir="path/to/logging_dir",
-    logging_strategy="epoch",
-    eval_strategy="epoch",
-    save_strategy="epoch",
-    load_best_model_at_end=True,
-    metric_for_best_model="eval_loss",     # Focus on validation loss as the key metric
-    # --- Hub Arguments ---
-    push_to_hub=True,
-    hub_model_id=hub_model_id,
-    hub_strategy="end",
-    hub_token=hf_token,
-    report_to="tensorboard"             # Tensorboard to monitor training
-)
-# Create the Trainer
-trainer = Trainer(
-    model=model,
-    args=training_args,
-    train_dataset=processed_datasets["train"],
-    eval_dataset=processed_datasets["validation"],
-    processing_class=tokenizer,
-    data_collator=data_collator,
-    compute_metrics=compute_metrics,  # Custom function (check how_to_use.md)
-    callbacks=[EarlyStoppingCallback(early_stopping_patience=10)]
-)
-```
-### Stage 2: Selective Fine-Tuning
-- The DistilBERT backbone was entirely **unfrozen**.
-- Using a very low LR allows the model to adapt even better to the new data while preserving the powerful, general-purpose knowledge.
-**Setup**:
-```python
-# Define Training Arguments
-training_args = TrainingArguments(
-    output_dir="path/to/output_dir",
-    overwrite_output_dir=True,
-    num_train_epochs=50,               # Fine-tuning epochs
-    per_device_train_batch_size=32,
-    per_device_eval_batch_size=32,
-    learning_rate=1e-6,                 # Learning Rate
-    weight_decay=1e-3,                  # AdamW weight decay
-    logging_dir="path/to/logging_dir",
-    logging_strategy="epoch",
-    eval_strategy="epoch",
-    save_strategy="epoch",
-    load_best_model_at_end=True,
-    metric_for_best_model="eval_loss",     # Focus on NER F1 as the key metric
-    # --- Hub Arguments ---
-    push_to_hub=True,
-    hub_model_id=hub_model_id,
-    hub_strategy="end",
-    hub_token=hf_token,
-    report_to="tensorboard"             # Tensorboard to monitor training
-)
-# Create the Trainer
-trainer = Trainer(
-    model=model,
-    args=training_args,
-    train_dataset=processed_datasets["train"],
-    eval_dataset=processed_datasets["validation"],
-    processing_class=tokenizer,
-    data_collator=data_collator,
-    compute_metrics=compute_metrics,  # Custom function (check how_to_use.md)
-    callbacks=[EarlyStoppingCallback(early_stopping_patience=5)]
-)
-```
-## Evaluation
-The model was evaluated on a held-out test set, and its performance was measured for both tasks.
-### Intent Classification Performance
-| Intent        | Precision | Recall | F1-Score |  Support |
-| ---           | ---       | ---    | ---      | ---      |
-|           bye | 0.9048    | 0.8261 | 0.8636   | 23       |
-|        cancel | 0.9103    | 0.8554 | 0.8820   | 83       |
-|      greeting | 1.0000    | 0.8636 | 0.9268   | 22       |
-|negative_reply | 0.8750    | 0.9545 | 0.9130   | 22       |
-|           oos | 1.0000    | 0.8261 | 0.9048   | 23       |
-|positive_reply | 0.7692    | 0.9091 | 0.8333   | 22       |
-|   query_avail | 0.9259    | 0.9259 | 0.9259   | 81       |
-|    reschedule | 0.8571    | 0.8675 | 0.8623   | 83       |
-|      schedule | 0.8506    | 0.9250 | 0.8862   | 80       |
-| ---           | ---       | ---    | ---      | ----     |
-| **Accuracy**     |               |            | **0.8884**   | 439 |
-| **Macro Avg**    |    **0.8992** | **0.8837** | **0.8887**   | 439 |
-| **Weighted Avg** |    **0.8923** | **0.8884** | **0.8887**   | 439 |
-### NER (Token Classification) Performance
-| Entity              | Precision | Recall | F1-Score |  Support |
-| ---                 | ---       | ---    | ---      | ---      |
-| B-appointment_id    | 0.9925    | 0.9705 | 0.9813   | 271      |
-| B-appointment_type  | 0.8760    | 0.7766 | 0.8233   | 282      |
-| B-practitioner_name | 0.9540    | 0.9210 | 0.9372   | 405      |
-| O                   | 0.9775    | 0.9908 | 0.9841   | 3813     |
-| ---                 | ---       | ---    | ---      | ----     |
-| **Accuracy**        |            |            | **0.9711** | 4771 |
-| **Macro Avg**       | **0.9500** | **0.9147** | **0.9315** | 4771 |
-| **Weighted Avg**    | **0.9703** | **0.9711** | **0.9705** | 4771 |
-The model achieves near-perfect results on the NER task and excellent results on the intent classification task for this specific dataset.
-## Limitations and Bias
-- The model's performance is highly dependent on the quality and scope of the **HASD dataset**. It may not generalize well to phrasing or appointment types significantly different from what it was trained on.
-- The dataset was primarily generated from templates, which may not capture the full diversity of real human language.
-- The model inherits any biases present in the `distilbert-base-uncased` model and the `clinc/clinc_oos` dataset.

 model-index:
 - name: schedulebot-nlu-engine
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# schedulebot-nlu-engine
+This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.3515
+- Intent Accuracy: 0.9201
+- Intent F1: 0.9200
+- Ner F1: 0.9262
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 1e-06
+- train_batch_size: 32
+- eval_batch_size: 32
+- seed: 42
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: linear
+- num_epochs: 50
+### Training results
+| Training Loss | Epoch | Step | Validation Loss | Intent Accuracy | Intent F1 | Ner F1 |
+|:-------------:|:-----:|:----:|:---------------:|:---------------:|:---------:|:------:|
+| No log        | 1.0   | 64   | 0.7147          | 0.8059          | 0.8052    | 0.9185 |
+| No log        | 2.0   | 128  | 0.6750          | 0.8196          | 0.8196    | 0.9178 |
+| No log        | 3.0   | 192  | 0.6464          | 0.8265          | 0.8259    | 0.9172 |
+| No log        | 4.0   | 256  | 0.6265          | 0.8333          | 0.8320    | 0.9189 |
+| No log        | 5.0   | 320  | 0.6048          | 0.8447          | 0.8444    | 0.9189 |
+| No log        | 6.0   | 384  | 0.5813          | 0.8425          | 0.8425    | 0.9183 |
+| No log        | 7.0   | 448  | 0.5649          | 0.8539          | 0.8535    | 0.9189 |
+| 0.9075        | 8.0   | 512  | 0.5482          | 0.8493          | 0.8492    | 0.9207 |
+| 0.9075        | 9.0   | 576  | 0.5284          | 0.8584          | 0.8584    | 0.9200 |
+| 0.9075        | 10.0  | 640  | 0.5105          | 0.8676          | 0.8683    | 0.9188 |
+| 0.9075        | 11.0  | 704  | 0.5011          | 0.8630          | 0.8622    | 0.9212 |
+| 0.9075        | 12.0  | 768  | 0.4964          | 0.8630          | 0.8631    | 0.9217 |
+| 0.9075        | 13.0  | 832  | 0.4918          | 0.8653          | 0.8648    | 0.9206 |
+| 0.9075        | 14.0  | 896  | 0.4710          | 0.8836          | 0.8844    | 0.9206 |
+| 0.9075        | 15.0  | 960  | 0.4618          | 0.8813          | 0.8808    | 0.9206 |
+| 0.685         | 16.0  | 1024 | 0.4500          | 0.8973          | 0.8973    | 0.9212 |
+| 0.685         | 17.0  | 1088 | 0.4504          | 0.8790          | 0.8791    | 0.9223 |
+| 0.685         | 18.0  | 1152 | 0.4362          | 0.8927          | 0.8921    | 0.9229 |
+| 0.685         | 19.0  | 1216 | 0.4312          | 0.8904          | 0.8902    | 0.9241 |
+| 0.685         | 20.0  | 1280 | 0.4218          | 0.8927          | 0.8925    | 0.9240 |
+| 0.685         | 21.0  | 1344 | 0.4185          | 0.9041          | 0.9035    | 0.9235 |
+| 0.685         | 22.0  | 1408 | 0.4083          | 0.9018          | 0.9013    | 0.9241 |
+| 0.685         | 23.0  | 1472 | 0.4066          | 0.9041          | 0.9037    | 0.9247 |
+| 0.5723        | 24.0  | 1536 | 0.4015          | 0.9041          | 0.9039    | 0.9247 |
+| 0.5723        | 25.0  | 1600 | 0.4032          | 0.8995          | 0.8996    | 0.9246 |
+| 0.5723        | 26.0  | 1664 | 0.3923          | 0.9087          | 0.9085    | 0.9241 |
+| 0.5723        | 27.0  | 1728 | 0.3892          | 0.9087          | 0.9087    | 0.9246 |
+| 0.5723        | 28.0  | 1792 | 0.3854          | 0.9110          | 0.9107    | 0.9240 |
+| 0.5723        | 29.0  | 1856 | 0.3824          | 0.9155          | 0.9156    | 0.9262 |
+| 0.5723        | 30.0  | 1920 | 0.3801          | 0.9132          | 0.9130    | 0.9245 |
+| 0.5723        | 31.0  | 1984 | 0.3781          | 0.9110          | 0.9109    | 0.9268 |
+| 0.4899        | 32.0  | 2048 | 0.3727          | 0.9110          | 0.9109    | 0.9240 |
+| 0.4899        | 33.0  | 2112 | 0.3745          | 0.9132          | 0.9131    | 0.9246 |
+| 0.4899        | 34.0  | 2176 | 0.3676          | 0.9178          | 0.9176    | 0.9246 |
+| 0.4899        | 35.0  | 2240 | 0.3671          | 0.9155          | 0.9154    | 0.9252 |
+| 0.4899        | 36.0  | 2304 | 0.3636          | 0.9155          | 0.9155    | 0.9268 |
+| 0.4899        | 37.0  | 2368 | 0.3627          | 0.9178          | 0.9178    | 0.9268 |
+| 0.4899        | 38.0  | 2432 | 0.3602          | 0.9132          | 0.9132    | 0.9268 |
+| 0.4899        | 39.0  | 2496 | 0.3593          | 0.9201          | 0.9200    | 0.9262 |
+| 0.4496        | 40.0  | 2560 | 0.3577          | 0.9178          | 0.9179    | 0.9262 |
+| 0.4496        | 41.0  | 2624 | 0.3563          | 0.9178          | 0.9177    | 0.9262 |
+| 0.4496        | 42.0  | 2688 | 0.3556          | 0.9155          | 0.9155    | 0.9257 |
+| 0.4496        | 43.0  | 2752 | 0.3554          | 0.9132          | 0.9132    | 0.9262 |
+| 0.4496        | 44.0  | 2816 | 0.3547          | 0.9201          | 0.9200    | 0.9262 |
+| 0.4496        | 45.0  | 2880 | 0.3544          | 0.9155          | 0.9155    | 0.9262 |
+| 0.4496        | 46.0  | 2944 | 0.3535          | 0.9178          | 0.9179    | 0.9262 |
+| 0.4327        | 47.0  | 3008 | 0.3518          | 0.9201          | 0.9200    | 0.9262 |
+| 0.4327        | 48.0  | 3072 | 0.3517          | 0.9201          | 0.9200    | 0.9262 |
+| 0.4327        | 49.0  | 3136 | 0.3514          | 0.9201          | 0.9200    | 0.9262 |
+| 0.4327        | 50.0  | 3200 | 0.3515          | 0.9201          | 0.9200    | 0.9262 |
+### Framework versions
+- Transformers 4.53.2
+- Pytorch 2.6.0+cu124
+- Datasets 4.0.0
+- Tokenizers 0.21.2

config.json CHANGED Viewed

@@ -8,24 +8,24 @@
   "dropout": 0.1,
   "hidden_dim": 3072,
   "id2label": {
-    0: "bye",
-    1: "cancel",
-    2: "greeting",
-    3: "negative_reply",
-    4: "oos",
-    5: "positive_reply",
-    6: "query_avail",
-    7: "reschedule",
-    8: "schedule"
   },
   "id2label_ner": {
-    0: "O",
-    1: "B-appointment_id",
-    2: "I-appointment_id",
-    3: "B-appointment_type",
-    4: "I-appointment_type",
-    5: "B-practitioner_name",
-    6: "I-practitioner_name"
   },
   "initializer_range": 0.02,
   "label2id": {

   "dropout": 0.1,
   "hidden_dim": 3072,
   "id2label": {
+    "0": "bye",
+    "1": "cancel",
+    "2": "greeting",
+    "3": "negative_reply",
+    "4": "oos",
+    "5": "positive_reply",
+    "6": "query_avail",
+    "7": "reschedule",
+    "8": "schedule"
   },
   "id2label_ner": {
+    "0": "O",
+    "1": "B-appointment_id",
+    "2": "I-appointment_id",
+    "3": "B-appointment_type",
+    "4": "I-appointment_type",
+    "5": "B-practitioner_name",
+    "6": "I-practitioner_name"
   },
   "initializer_range": 0.02,
   "label2id": {

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:2ae2a3c417ae03007fca71c860470d2c13b8543d3a7aea9c9e8c646b4cc9fcd5
 size 267851552

 version https://git-lfs.github.com/spec/v1
+oid sha256:17e123630e19b38ecde6368e4de87af7809acf8abcc021bb8645b4148e5139b4
 size 267851552

runs/Jul17_15-56-14_69c2e588e3b9/events.out.tfevents.1752767775.69c2e588e3b9.567.1 ADDED Viewed

Binary file (29.1 kB). View file

training_args.bin CHANGED Viewed

Binary files a/training_args.bin and b/training_args.bin differ