End of training

Browse files

Files changed (5) hide show

README.md +96 -186
config.json +1 -1
model.safetensors +1 -1
runs/Jul17_12-34-05_bb9517665b4e/events.out.tfevents.1752755645.bb9517665b4e.525.1 +0 -0
training_args.bin +0 -0

README.md CHANGED Viewed

@@ -7,194 +7,104 @@ tags:
 model-index:
 - name: schedulebot-nlu-engine
   results: []
-datasets:
-- andreaceto/hasd
-language:
-- en
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
-# Schedulebot-nlu-engine
-## Model Description
-This model is a multi-task Natural Language Understanding (NLU) engine designed specifically for an appointment scheduling chatbot. It is fine-tuned from a **`distilbert-base-uncased`** backbone and is capable of performing two tasks simultaneously:
-- **Intent Classification**: Identifying the user's primary goal (e.g., `schedule`, `cancel`).
-- **Named Entity Recognition (NER)**: Extracting custom, domain-specific entities (e.g., `appointment_type`).
-This model stands out due to its custom classification heads, which use a more complex architecture to improve performance on nuanced tasks.
-## Model Architecture
-The model uses a standard `distilbert-base-uncased` model as its core feature extractor. Two custom classification "heads" are placed on top of this base to perform the downstream tasks.
-- **Base Model**: `distilbert-base-uncased`
-- **Classifier Heads**: each head is a Multi-Layer Perceptron (MLP) with the following structure to allow for more complex feature interpretation:
-    1. A Linear layer projecting the transformer's output dimension (768) to an intermediate size (384).
-    2. A GELU activation function.
-    3. A Dropout layer with a rate of 0.3 for regularization.
-    4. A final Linear layer projecting the intermediate size to the number of output labels for the specific task (intent or NER).
-## Intended Use
-This model is intended to be the core NLU component of a conversational AI system for managing appointments.
-For instructions on how to use the model check the [dedicated file](./how_to_use.md).
-## Training Data
-The model was trained on the **HASD (Hybrid Appointment Scheduling Dataset)**, a custom dataset built specifically for this task.
-- **Source**: The dataset is a hybrid of real-world conversational examples from `clinc/clinc_oos` (for simple intents) and synthetically generated, template-based examples for complex scheduling intents.
-- **Balancing**: To combat class imbalance, intents sourced from `clinc/clinc_oos` were **down-sampled** to a maximum of **150 examples** each.
-- **Augmentation**: To increase data diversity for complex intents (`schedule`, `reschedule`, etc.), **Contextual Word Replacement** was used. A `distilbert-base-uncased` model augmented the templates by replacing non-placeholder words with contextually relevant synonyms.
-The dataset is available [here](https://huggingface.co/datasets/andreaceto/hasd).
-### Intents
-The model is trained to recognize the following intents:
-`schedule`, `reschedule`, `cancel`, `query_avail`, `greeting`, `positive_reply`, `negative_reply`, `bye`, `oos` (out-of-scope).
-### Entities
-The model is trained to recognize the following custom named entities:
-`practitioner_name`, `appointment_type`, `appointment_id`.
-## Training Procedure
-The model was trained using a two-stage fine-tuning strategy to ensure stability and performance.
-### Stage 1: Training the Classifier Heads
-- The `distilbert-base-uncased` base model was entirely **frozen**.
-- Only the randomly initialized MLP heads for intent and NER classification were trained.
-**Setup**:
-```python
-# Define a data collator to handle padding for token classification
-data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)
-# Define Training Arguments
-training_args = TrainingArguments(
-    output_dir="path/to/output_dir",
-    num_train_epochs=200,               # Training epochs
-    per_device_train_batch_size=32,
-    per_device_eval_batch_size=32,
-    learning_rate=1e-4,                 # Learning Rate
-    weight_decay=1e-5,                  # AdamW weight decay
-    logging_dir="path/to/logging_dir",
-    logging_strategy="steps",
-    logging_steps=10,
-    eval_strategy="epoch",
-    save_strategy="epoch",
-    load_best_model_at_end=True,
-    metric_for_best_model="ner_f1",     # Focus on NER F1 as the key metric
-    # --- Hub Arguments ---
-    push_to_hub=True,
-    hub_model_id=hub_model_id,
-    hub_strategy="end",
-    hub_token=hf_token,
-    report_to="tensorboard"             # Tensorboard to monitor training
-)
-# Create the Trainer
-trainer = Trainer(
-    model=model,
-    args=training_args,
-    train_dataset=processed_datasets["train"],
-    eval_dataset=processed_datasets["validation"],
-    processing_class=tokenizer,
-    data_collator=data_collator,
-    compute_metrics=compute_metrics,  # Custom function (check how_to_use.md)
-    callbacks=[EarlyStoppingCallback(early_stopping_patience=20)]
-)
-```
-### Stage 2: Selective Fine-Tuning
-- The DistilBERT backbone was entirely **unfrozen**.
-- Using a very low LR allows the model to adapt even better to the new data while preserving the powerful, general-purpose knowledge.
-**Setup**:
-```python
-# Define Training Arguments
-training_args = TrainingArguments(
-    output_dir="path/to/output_dir",
-    num_train_epochs=50,               # Fine.tuning epochs
-    per_device_train_batch_size=32,
-    per_device_eval_batch_size=32,
-    learning_rate=1e-6,                 # Learning Rate
-    weight_decay=1e-3,                  # AdamW weight decay
-    logging_dir="path/to/logging_dir",
-    logging_strategy="steps",
-    logging_steps=10,
-    eval_strategy="epoch",
-    save_strategy="epoch",
-    load_best_model_at_end=True,
-    metric_for_best_model="ner_f1",     # Focus on NER F1 as the key metric
-    # --- Hub Arguments ---
-    push_to_hub=True,
-    hub_model_id=hub_model_id,
-    hub_strategy="end",
-    hub_token=hf_token,
-    report_to="tensorboard"             # Tensorboard to monitor training
-)
-# Create the Trainer
-trainer = Trainer(
-    model=model,
-    args=training_args,
-    train_dataset=processed_datasets["train"],
-    eval_dataset=processed_datasets["validation"],
-    processing_class=tokenizer,
-    data_collator=data_collator,
-    compute_metrics=compute_metrics,  # Custom function (check how_to_use.md)
-    callbacks=[EarlyStoppingCallback(early_stopping_patience=5)]
-)
-```
-## Evaluation
-The model was evaluated on a held-out test set, and its performance was measured for both tasks.
-### Intent Classification Performance
-| Intent | Precision | Recall | F1-Score | Support |
-| --- | --- | --- | --- | --- |
-| bye | 1.00 | 1.00 | 1.00 | 22 |
-| cancel | 1.00 | 0.95 | 0.98 | 21 |
-| greeting | 1.00 | 1.00 | 1.00 | 23 |
-| negative_reply | 0.96 | 1.00 | 0.98 | 22 |
-| oos | 1.00 | 1.00 | 1.00 | 22 |
-| positive_reply | 1.00 | 0.96 | 0.98 | 23 |
-| query_avail | 0.95 | 1.00 | 0.98 | 21 |
-| reschedule | 0.96 | 1.00 | 0.98 | 22 |
-| schedule | 0.95 | 0.95 | 0.95 | 21 |
-| **Accuracy** |  |  | **0.98** | **197** |
-| **Macro Avg** | **0.98** | **0.98** | **0.98** | **197** |
-| **Weighted Avg** | **0.98** | **0.98** | **0.98** | **197** |
-### NER (Token Classification) Performance
-| Entity | Precision | Recall | F1-Score | Support |
-| --- | --- | --- | --- | --- |
-| B-appointment_id | 1.00 | 1.00 | 1.00 | 25 |
-| B-appointment_type | 1.00 | 1.00 | 1.00 | 33 |
-| B-practitioner_name | 1.00 | 1.00 | 1.00 | 44 |
-| O | 1.00 | 1.00 | 1.00 | 1342 |
-| **Micro Avg** | **1.00** | **1.00** | **1.00** | 1444 |
-| **Macro Avg** | **1.00** | **1.00** | **1.00** | 1444 |
-| **Weighted Avg** | **1.00** | **1.00** | **1.00** | 1444 |
-The model achieves near-perfect results on the NER task and excellent results on the intent classification task for this specific dataset.
-## Limitations and Bias
-- The model's performance is highly dependent on the quality and scope of the **HASD dataset**. It may not generalize well to phrasing or appointment types significantly different from what it was trained on.
-- The dataset was primarily generated from templates, which may not capture the full diversity of real human language.
-- The model inherits any biases present in the `distilbert-base-uncased` model and the `clinc/clinc_oos` dataset.

 model-index:
 - name: schedulebot-nlu-engine
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->
+# schedulebot-nlu-engine
+This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on an unknown dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.3390
+- Intent Accuracy: 0.9178
+- Intent F1: 0.9178
+- Ner F1: 0.9240
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 1e-06
+- train_batch_size: 32
+- eval_batch_size: 32
+- seed: 42
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: linear
+- num_epochs: 50
+### Training results
+| Training Loss | Epoch | Step | Validation Loss | Intent Accuracy | Intent F1 | Ner F1 |
+|:-------------:|:-----:|:----:|:---------------:|:---------------:|:---------:|:------:|
+| No log        | 1.0   | 64   | 0.7274          | 0.7785          | 0.7785    | 0.9136 |
+| No log        | 2.0   | 128  | 0.6946          | 0.7991          | 0.8005    | 0.9162 |
+| No log        | 3.0   | 192  | 0.6461          | 0.8196          | 0.8178    | 0.9158 |
+| No log        | 4.0   | 256  | 0.6226          | 0.8265          | 0.8261    | 0.9152 |
+| No log        | 5.0   | 320  | 0.5986          | 0.8516          | 0.8518    | 0.9141 |
+| No log        | 6.0   | 384  | 0.5705          | 0.8356          | 0.8359    | 0.9153 |
+| No log        | 7.0   | 448  | 0.5506          | 0.8584          | 0.8568    | 0.9153 |
+| 0.901         | 8.0   | 512  | 0.5459          | 0.8379          | 0.8378    | 0.9147 |
+| 0.901         | 9.0   | 576  | 0.5220          | 0.8539          | 0.8546    | 0.9158 |
+| 0.901         | 10.0  | 640  | 0.5129          | 0.8676          | 0.8667    | 0.9157 |
+| 0.901         | 11.0  | 704  | 0.4974          | 0.8653          | 0.8648    | 0.9146 |
+| 0.901         | 12.0  | 768  | 0.4870          | 0.8744          | 0.8739    | 0.9180 |
+| 0.901         | 13.0  | 832  | 0.4892          | 0.8676          | 0.8682    | 0.9180 |
+| 0.901         | 14.0  | 896  | 0.4652          | 0.8767          | 0.8770    | 0.9174 |
+| 0.901         | 15.0  | 960  | 0.4523          | 0.8790          | 0.8789    | 0.9174 |
+| 0.6791        | 16.0  | 1024 | 0.4412          | 0.8881          | 0.8884    | 0.9197 |
+| 0.6791        | 17.0  | 1088 | 0.4441          | 0.8790          | 0.8785    | 0.9208 |
+| 0.6791        | 18.0  | 1152 | 0.4231          | 0.8950          | 0.8948    | 0.9190 |
+| 0.6791        | 19.0  | 1216 | 0.4202          | 0.8858          | 0.8855    | 0.9202 |
+| 0.6791        | 20.0  | 1280 | 0.4099          | 0.8950          | 0.8951    | 0.9208 |
+| 0.6791        | 21.0  | 1344 | 0.4054          | 0.8973          | 0.8970    | 0.9219 |
+| 0.6791        | 22.0  | 1408 | 0.4018          | 0.8950          | 0.8954    | 0.9212 |
+| 0.6791        | 23.0  | 1472 | 0.3953          | 0.8973          | 0.8974    | 0.9201 |
+| 0.5609        | 24.0  | 1536 | 0.3883          | 0.9041          | 0.9037    | 0.9220 |
+| 0.5609        | 25.0  | 1600 | 0.3874          | 0.8995          | 0.8994    | 0.9224 |
+| 0.5609        | 26.0  | 1664 | 0.3827          | 0.9041          | 0.9039    | 0.9224 |
+| 0.5609        | 27.0  | 1728 | 0.3796          | 0.9041          | 0.9045    | 0.9230 |
+| 0.5609        | 28.0  | 1792 | 0.3793          | 0.9018          | 0.9018    | 0.9230 |
+| 0.5609        | 29.0  | 1856 | 0.3703          | 0.9110          | 0.9111    | 0.9219 |
+| 0.5609        | 30.0  | 1920 | 0.3732          | 0.9018          | 0.9018    | 0.9207 |
+| 0.5609        | 31.0  | 1984 | 0.3639          | 0.9132          | 0.9134    | 0.9219 |
+| 0.4928        | 32.0  | 2048 | 0.3623          | 0.9064          | 0.9066    | 0.9225 |
+| 0.4928        | 33.0  | 2112 | 0.3599          | 0.9132          | 0.9133    | 0.9230 |
+| 0.4928        | 34.0  | 2176 | 0.3546          | 0.9110          | 0.9110    | 0.9219 |
+| 0.4928        | 35.0  | 2240 | 0.3515          | 0.9178          | 0.9178    | 0.9230 |
+| 0.4928        | 36.0  | 2304 | 0.3504          | 0.9155          | 0.9156    | 0.9235 |
+| 0.4928        | 37.0  | 2368 | 0.3501          | 0.9178          | 0.9179    | 0.9235 |
+| 0.4928        | 38.0  | 2432 | 0.3495          | 0.9132          | 0.9132    | 0.9230 |
+| 0.4928        | 39.0  | 2496 | 0.3452          | 0.9132          | 0.9132    | 0.9235 |
+| 0.447         | 40.0  | 2560 | 0.3430          | 0.9224          | 0.9224    | 0.9230 |
+| 0.447         | 41.0  | 2624 | 0.3441          | 0.9132          | 0.9134    | 0.9240 |
+| 0.447         | 42.0  | 2688 | 0.3408          | 0.9178          | 0.9178    | 0.9235 |
+| 0.447         | 43.0  | 2752 | 0.3427          | 0.9155          | 0.9156    | 0.9236 |
+| 0.447         | 44.0  | 2816 | 0.3420          | 0.9155          | 0.9157    | 0.9235 |
+| 0.447         | 45.0  | 2880 | 0.3407          | 0.9201          | 0.9201    | 0.9235 |
+| 0.447         | 46.0  | 2944 | 0.3396          | 0.9178          | 0.9178    | 0.9235 |
+| 0.4209        | 47.0  | 3008 | 0.3401          | 0.9178          | 0.9178    | 0.9235 |
+| 0.4209        | 48.0  | 3072 | 0.3389          | 0.9178          | 0.9178    | 0.9240 |
+| 0.4209        | 49.0  | 3136 | 0.3392          | 0.9178          | 0.9178    | 0.9240 |
+| 0.4209        | 50.0  | 3200 | 0.3390          | 0.9178          | 0.9178    | 0.9240 |
+### Framework versions
+- Transformers 4.53.2
+- Pytorch 2.6.0+cu124
+- Datasets 4.0.0
+- Tokenizers 0.21.2

config.json CHANGED Viewed

@@ -18,6 +18,6 @@
   "sinusoidal_pos_embds": false,
   "tie_weights_": true,
   "torch_dtype": "float32",
-  "transformers_version": "4.53.1",
   "vocab_size": 30522
 }

   "sinusoidal_pos_embds": false,
   "tie_weights_": true,
   "torch_dtype": "float32",
+  "transformers_version": "4.53.2",
   "vocab_size": 30522
 }

model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:107bf248a827ef41b5656fa52b0cfb51cc36c8f8fde89ff67c03df73d720be5d
 size 267851552

 version https://git-lfs.github.com/spec/v1
+oid sha256:771362ffd2649dfac75137c8fb23415627356fd9ed2deab53c4b2d41215db76e
 size 267851552

runs/Jul17_12-34-05_bb9517665b4e/events.out.tfevents.1752755645.bb9517665b4e.525.1 ADDED Viewed

Binary file (28.2 kB). View file

training_args.bin CHANGED Viewed

Binary files a/training_args.bin and b/training_args.bin differ