andreaceto commited on
Commit
2c854d6
·
verified ·
1 Parent(s): ba7da8c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +91 -19
README.md CHANGED
@@ -40,31 +40,29 @@ The model uses a standard `distilbert-base-uncased` model as its core feature ex
40
 
41
  ## Intended Use
42
 
43
- This model is intended to be the core NLU component of a conversational AI system for managing appointments. It takes raw user text as input and outputs a structured JSON object containing the predicted intent and a list of extracted entities.
44
 
45
- ```python
46
- from transformers import AutoTokenizer
47
-
48
-
49
- ```
50
 
51
  ## Training Data
52
 
53
  The model was trained on the **HASD (Hybrid Appointment Scheduling Dataset)**, a custom dataset built specifically for this task.
54
 
55
  - **Source**: The dataset is a hybrid of real-world conversational examples from `clinc/clinc_oos` (for simple intents) and synthetically generated, template-based examples for complex scheduling intents.
56
- - **Balancing**: To combat class imbalance, intents sourced from `clinc/clinc_oos` were **down-sampled** to a maximum of **150 examples** each. [cite: multitask_model.ipynb]
57
- - **Augmentation**: To increase data diversity for complex intents (`schedule`, `reschedule`, etc.), **Contextual Word Replacement** was used. A `distilbert-base-uncased` model augmented the templates by replacing non-placeholder words with contextually relevant synonyms. [cite: multitask_model.ipynb]
 
 
58
 
59
  ### Intents
60
 
61
  The model is trained to recognize the following intents:
62
- `schedule`, `reschedule`, `cancel`, `query_avail`, `greeting`, `positive_reply`, `negative_reply`, `bye`, `oos` (out-of-scope). [cite: multitask_model.ipynb]
63
 
64
  ### Entities
65
 
66
  The model is trained to recognize the following custom named entities:
67
- `practitioner_name`, `appointment_type`, `appointment_id`. [cite: multitask_model.ipynb]
68
 
69
  ## Training Procedure
70
 
@@ -72,16 +70,94 @@ The model was trained using a two-stage fine-tuning strategy to ensure stability
72
 
73
  ### Stage 1: Training the Classifier Heads
74
 
75
- - The `distilbert-base-uncased` base model was **frozen**. [cite: multitask_model.ipynb]
76
  - Only the randomly initialized MLP heads for intent and NER classification were trained.
77
- - This was done for **5 epochs** with a higher learning rate (`5e-4`), allowing the new layers to learn the task basics without disrupting the pre-trained backbone. [cite: multitask_model.ipynb]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
78
 
79
  ### Stage 2: Selective Fine-Tuning
80
 
81
- - The classification heads were kept trainable, and the **top two layers** of the DistilBERT backbone were **unfrozen**. [cite: multitask_model.ipynb]
82
- - The entire model was then fine-tuned for **3 epochs** with a much lower learning rate (`2e-5`). [cite: multitask_model.ipynb]
83
- - This gradual unfreezing approach allows the model to adapt its most task-specific layers to the new data while preserving the powerful, general-purpose knowledge in the lower layers.
84
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
  ## Evaluation
86
 
87
  The model was evaluated on a held-out test set, and its performance was measured for both tasks.
@@ -103,8 +179,6 @@ The model was evaluated on a held-out test set, and its performance was measured
103
  | **Macro Avg** | **0.98** | **0.98** | **0.98** | **197** |
104
  | **Weighted Avg** | **0.98** | **0.98** | **0.98** | **197** |
105
 
106
- *[Based on the classification report in the provided `multitask_model.ipynb` notebook.]* [cite: multitask_model.ipynb]
107
-
108
  ### NER (Token Classification) Performance
109
 
110
  | Entity | Precision | Recall | F1-Score | Support |
@@ -117,8 +191,6 @@ The model was evaluated on a held-out test set, and its performance was measured
117
  | **Macro Avg** | **1.00** | **1.00** | **1.00** | 1444 |
118
  | **Weighted Avg** | **1.00** | **1.00** | **1.00** | 1444 |
119
 
120
- *[Based on the classification report in the provided `multitask_model.ipynb` notebook.]* [cite: multitask_model.ipynb]
121
-
122
  The model achieves near-perfect results on the NER task and excellent results on the intent classification task for this specific dataset.
123
 
124
  ## Limitations and Bias
 
40
 
41
  ## Intended Use
42
 
43
+ This model is intended to be the core NLU component of a conversational AI system for managing appointments.
44
 
45
+ For instructions on how to use the model check the [dedicated file](./how_to_use.md).
 
 
 
 
46
 
47
  ## Training Data
48
 
49
  The model was trained on the **HASD (Hybrid Appointment Scheduling Dataset)**, a custom dataset built specifically for this task.
50
 
51
  - **Source**: The dataset is a hybrid of real-world conversational examples from `clinc/clinc_oos` (for simple intents) and synthetically generated, template-based examples for complex scheduling intents.
52
+ - **Balancing**: To combat class imbalance, intents sourced from `clinc/clinc_oos` were **down-sampled** to a maximum of **150 examples** each.
53
+ - **Augmentation**: To increase data diversity for complex intents (`schedule`, `reschedule`, etc.), **Contextual Word Replacement** was used. A `distilbert-base-uncased` model augmented the templates by replacing non-placeholder words with contextually relevant synonyms.
54
+
55
+ The dataset is available [here](https://huggingface.co/datasets/andreaceto/hasd).
56
 
57
  ### Intents
58
 
59
  The model is trained to recognize the following intents:
60
+ `schedule`, `reschedule`, `cancel`, `query_avail`, `greeting`, `positive_reply`, `negative_reply`, `bye`, `oos` (out-of-scope).
61
 
62
  ### Entities
63
 
64
  The model is trained to recognize the following custom named entities:
65
+ `practitioner_name`, `appointment_type`, `appointment_id`.
66
 
67
  ## Training Procedure
68
 
 
70
 
71
  ### Stage 1: Training the Classifier Heads
72
 
73
+ - The `distilbert-base-uncased` base model was entirely **frozen**.
74
  - Only the randomly initialized MLP heads for intent and NER classification were trained.
75
+
76
+ **Setup**:
77
+
78
+ ```python
79
+ # Define a data collator to handle padding for token classification
80
+ data_collator = DataCollatorForTokenClassification(tokenizer=tokenizer)
81
+
82
+ # Define Training Arguments
83
+ training_args = TrainingArguments(
84
+ output_dir="path/to/output_dir",
85
+ num_train_epochs=200, # Training epochs
86
+ per_device_train_batch_size=32,
87
+ per_device_eval_batch_size=32,
88
+ learning_rate=1e-4, # Learning Rate
89
+ weight_decay=1e-5, # AdamW weight decay
90
+ logging_dir="path/to/logging_dir",
91
+ logging_strategy="steps",
92
+ logging_steps=10,
93
+ eval_strategy="epoch",
94
+ save_strategy="epoch",
95
+ load_best_model_at_end=True,
96
+ metric_for_best_model="ner_f1", # Focus on NER F1 as the key metric
97
+ # --- Hub Arguments ---
98
+ push_to_hub=True,
99
+ hub_model_id=hub_model_id,
100
+ hub_strategy="end",
101
+ hub_token=hf_token,
102
+ report_to="tensorboard" # Tensorboard to monitor training
103
+ )
104
+
105
+ # Create the Trainer
106
+ trainer = Trainer(
107
+ model=model,
108
+ args=training_args,
109
+ train_dataset=processed_datasets["train"],
110
+ eval_dataset=processed_datasets["validation"],
111
+ processing_class=tokenizer,
112
+ data_collator=data_collator,
113
+ compute_metrics=compute_metrics, # Custom function (check how_to_use.md)
114
+ callbacks=[EarlyStoppingCallback(early_stopping_patience=20)]
115
+ )
116
+ ```
117
 
118
  ### Stage 2: Selective Fine-Tuning
119
 
120
+ - The DistilBERT backbone was entirely **unfrozen**.
121
+ - Using a very low LR allows the model to adapt even better to the new data while preserving the powerful, general-purpose knowledge.
 
122
 
123
+ **Setup**:
124
+
125
+ ```python
126
+ # Define Training Arguments
127
+ training_args = TrainingArguments(
128
+ output_dir="path/to/output_dir",
129
+ num_train_epochs=50, # Fine.tuning epochs
130
+ per_device_train_batch_size=32,
131
+ per_device_eval_batch_size=32,
132
+ learning_rate=1e-6, # Learning Rate
133
+ weight_decay=1e-3, # AdamW weight decay
134
+ logging_dir="path/to/logging_dir",
135
+ logging_strategy="steps",
136
+ logging_steps=10,
137
+ eval_strategy="epoch",
138
+ save_strategy="epoch",
139
+ load_best_model_at_end=True,
140
+ metric_for_best_model="ner_f1", # Focus on NER F1 as the key metric
141
+ # --- Hub Arguments ---
142
+ push_to_hub=True,
143
+ hub_model_id=hub_model_id,
144
+ hub_strategy="end",
145
+ hub_token=hf_token,
146
+ report_to="tensorboard" # Tensorboard to monitor training
147
+ )
148
+
149
+ # Create the Trainer
150
+ trainer = Trainer(
151
+ model=model,
152
+ args=training_args,
153
+ train_dataset=processed_datasets["train"],
154
+ eval_dataset=processed_datasets["validation"],
155
+ processing_class=tokenizer,
156
+ data_collator=data_collator,
157
+ compute_metrics=compute_metrics, # Custom function (check how_to_use.md)
158
+ callbacks=[EarlyStoppingCallback(early_stopping_patience=5)]
159
+ )
160
+ ```
161
  ## Evaluation
162
 
163
  The model was evaluated on a held-out test set, and its performance was measured for both tasks.
 
179
  | **Macro Avg** | **0.98** | **0.98** | **0.98** | **197** |
180
  | **Weighted Avg** | **0.98** | **0.98** | **0.98** | **197** |
181
 
 
 
182
  ### NER (Token Classification) Performance
183
 
184
  | Entity | Precision | Recall | F1-Score | Support |
 
191
  | **Macro Avg** | **1.00** | **1.00** | **1.00** | 1444 |
192
  | **Weighted Avg** | **1.00** | **1.00** | **1.00** | 1444 |
193
 
 
 
194
  The model achieves near-perfect results on the NER task and excellent results on the intent classification task for this specific dataset.
195
 
196
  ## Limitations and Bias