RoBERTa Dialogue Sentiment Analysis β EmoWOZ Fine-Tuning
Fine-tunes roberta-base (or roberta-large) on the EmoWOZ dataset for
per-utterance emotion classification in task-oriented dialogue, using a
configurable sliding window of preceding dialogue history as context.
Emotion Labels
| ID | Label | Notes |
|---|---|---|
| -1 | system | Filtered out β not predicted |
| 0 | neutral | |
| 1 | fearful | |
| 2 | dissatisfied | |
| 3 | apologetic | |
| 4 | abusive | |
| 5 | excited | |
| 6 | satisfied |
Only user utterances (emotion β -1) are classified. System turns are retained as context but not predicted.
Project Layout
roberta_emowoz/
βββ configs/
β βββ default.yaml # All hyperparameters (edit this)
βββ data/
β βββ dataset.py # DialogueDataset + collator
β βββ preprocessing.py # JSON β flat utterance samples
βββ models/
β βββ model.py # RoBERTa wrapper with classification head
β βββ focal_loss.py # Class-imbalance-aware loss
βββ scripts/
β βββ train.py # Main training entry point
β βββ evaluate.py # Full eval with per-class metrics
β βββ predict.py # Interactive / batch inference
βββ outputs/ # Checkpoints, logs, predictions (gitignored)
βββ requirements.txt
Quick Start
NEED CONDA, install it first! research with chatgpt if u need to know what it is
conda create -n nst_v4 python=3.11
conda activate nst_v4
pip install -r requirements_normal.txt
pip install --index-url https://download.pytorch.org/whl/cu121 -r requirements_torch.txt
# 2. Place your data files in the project root (or update config paths)
# set1_train.json set1_val.json set1_test.json
# 3. Train (uses defaults from DEFAULT_CONFIG in train.py, override any value)
# For my RTX 3070 this takes 2 hours to complete
python train.py
# Or with custom parameters:
python train.py --epochs 10 --batch_size 32 --history_window 4 --loss focal
# 4. Evaluate on test set
# python evaluate.py --checkpoint outputs/best_model
# 5. Interactive prediction
# python predict.py --checkpoint outputs/best_model --history_window 3
Key Hyperparameter β history_window
history_window controls how many preceding turns (both user and system)
are prepended as context before the current utterance.
history_window = 0 β [CLS] <current utterance> [SEP]
history_window = 2 β [CLS] <turn-2> [SEP] <turn-1> [SEP] <current> [SEP]
history_window = 4 β [CLS] <t-4> [SEP] <t-3> [SEP] <t-2> [SEP] <t-1> [SEP] <current> [SEP]
Turns are ordered oldest β newest. System turns are prefixed with "SYS:",
user turns with "USR:" to give the model speaker role signals.
Recommended sweep: [0, 2, 4, 6].
Class Imbalance
EmoWOZ is heavily skewed toward class 0 (neutral). Two mitigation strategies
are included and can be toggled in configs/default.yaml:
- Focal Loss (
loss: focal) β down-weights easy neutral examples. - Weighted Cross-Entropy (
loss: weighted_ce) β per-class inverse frequency weights computed from the training set.
Outputs
After training, outputs/ contains:
best_model/β best checkpoint by macro-F1 on validationlast_model/β final epoch checkpointtraining_log.jsonlβ epoch-level metricstest_results.jsonβ per-class precision / recall / F1 + confusion matrix