| # RoBERTa Dialogue Sentiment Analysis β EmoWOZ Fine-Tuning |
|
|
| Fine-tunes `roberta-base` (or `roberta-large`) on the EmoWOZ dataset for |
| **per-utterance emotion classification** in task-oriented dialogue, using a |
| configurable sliding window of preceding dialogue history as context. |
|
|
| ## Emotion Labels |
|
|
| | ID | Label | Notes | |
| |----|--------------|------------------------------| |
| | -1 | system | Filtered out β not predicted | |
| | 0 | neutral | | |
| | 1 | fearful | | |
| | 2 | dissatisfied | | |
| | 3 | apologetic | | |
| | 4 | abusive | | |
| | 5 | excited | | |
| | 6 | satisfied | | |
|
|
| Only **user utterances** (emotion β -1) are classified. System turns are |
| retained as context but not predicted. |
|
|
| ## Project Layout |
|
|
| ``` |
| roberta_emowoz/ |
| βββ configs/ |
| β βββ default.yaml # All hyperparameters (edit this) |
| βββ data/ |
| β βββ dataset.py # DialogueDataset + collator |
| β βββ preprocessing.py # JSON β flat utterance samples |
| βββ models/ |
| β βββ model.py # RoBERTa wrapper with classification head |
| β βββ focal_loss.py # Class-imbalance-aware loss |
| βββ scripts/ |
| β βββ train.py # Main training entry point |
| β βββ evaluate.py # Full eval with per-class metrics |
| β βββ predict.py # Interactive / batch inference |
| βββ outputs/ # Checkpoints, logs, predictions (gitignored) |
| βββ requirements.txt |
| ``` |
|
|
| ## Quick Start |
|
|
| NEED CONDA, install it first! research with chatgpt if u need to know what it is |
|
|
| ```bash |
| conda create -n nst_v4 python=3.11 |
| conda activate nst_v4 |
| pip install -r requirements_normal.txt |
| pip install --index-url https://download.pytorch.org/whl/cu121 -r requirements_torch.txt |
| |
| # 2. Place your data files in the project root (or update config paths) |
| # set1_train.json set1_val.json set1_test.json |
| |
| # 3. Train (uses defaults from DEFAULT_CONFIG in train.py, override any value) |
| # For my RTX 3070 this takes 2 hours to complete |
| python train.py |
| # Or with custom parameters: |
| python train.py --epochs 10 --batch_size 32 --history_window 4 --loss focal |
| |
| # 4. Evaluate on test set |
| # python evaluate.py --checkpoint outputs/best_model |
| |
| # 5. Interactive prediction |
| # python predict.py --checkpoint outputs/best_model --history_window 3 |
| ``` |
|
|
| ## Key Hyperparameter β `history_window` |
| |
| `history_window` controls how many **preceding turns** (both user and system) |
| are prepended as context before the current utterance. |
|
|
| ``` |
| history_window = 0 β [CLS] <current utterance> [SEP] |
| history_window = 2 β [CLS] <turn-2> [SEP] <turn-1> [SEP] <current> [SEP] |
| history_window = 4 β [CLS] <t-4> [SEP] <t-3> [SEP] <t-2> [SEP] <t-1> [SEP] <current> [SEP] |
| ``` |
|
|
| Turns are ordered oldest β newest. System turns are prefixed with `"SYS:"`, |
| user turns with `"USR:"` to give the model speaker role signals. |
|
|
| Recommended sweep: `[0, 2, 4, 6]`. |
|
|
| ## Class Imbalance |
|
|
| EmoWOZ is heavily skewed toward class 0 (neutral). Two mitigation strategies |
| are included and can be toggled in `configs/default.yaml`: |
|
|
| - **Focal Loss** (`loss: focal`) β down-weights easy neutral examples. |
| - **Weighted Cross-Entropy** (`loss: weighted_ce`) β per-class inverse |
| frequency weights computed from the training set. |
|
|
| ## Outputs |
|
|
| After training, `outputs/` contains: |
| - `best_model/` β best checkpoint by macro-F1 on validation |
| - `last_model/` β final epoch checkpoint |
| - `training_log.jsonl` β epoch-level metrics |
| - `test_results.json` β per-class precision / recall / F1 + confusion matrix |
|
|