RoBERTa Dialogue Sentiment Analysis — EmoWOZ Fine-Tuning

Fine-tunes roberta-base (or roberta-large) on the EmoWOZ dataset for per-utterance emotion classification in task-oriented dialogue, using a configurable sliding window of preceding dialogue history as context.

Emotion Labels

ID	Label	Notes
-1	system	Filtered out — not predicted
0	neutral
1	fearful
2	dissatisfied
3	apologetic
4	abusive
5	excited
6	satisfied

Only user utterances (emotion ≠ -1) are classified. System turns are retained as context but not predicted.

Project Layout

roberta_emowoz/
├── configs/
│   └── default.yaml          # All hyperparameters (edit this)
├── data/
│   ├── dataset.py            # DialogueDataset + collator
│   └── preprocessing.py      # JSON → flat utterance samples
├── models/
│   ├── model.py              # RoBERTa wrapper with classification head
│   └── focal_loss.py         # Class-imbalance-aware loss
├── scripts/
│   ├── train.py              # Main training entry point
│   ├── evaluate.py           # Full eval with per-class metrics
│   └── predict.py            # Interactive / batch inference
├── outputs/                  # Checkpoints, logs, predictions (gitignored)
└── requirements.txt

Quick Start

NEED CONDA, install it first! research with chatgpt if u need to know what it is

conda create -n nst_v4 python=3.11
conda activate nst_v4
pip install -r requirements_normal.txt
pip install --index-url https://download.pytorch.org/whl/cu121 -r requirements_torch.txt

# 2. Place your data files in the project root (or update config paths)
#    set1_train.json  set1_val.json  set1_test.json

# 3. Train (uses defaults from DEFAULT_CONFIG in train.py, override any value)
# For my RTX 3070 this takes 2 hours to complete
python train.py 
# Or with custom parameters:
python train.py --epochs 10 --batch_size 32 --history_window 4 --loss focal

# 4. Evaluate on test set
# python evaluate.py --checkpoint outputs/best_model

# 5. Interactive prediction
# python predict.py --checkpoint outputs/best_model --history_window 3

Key Hyperparameter — `history_window`

history_window controls how many preceding turns (both user and system) are prepended as context before the current utterance.

history_window = 0  →  [CLS] <current utterance> [SEP]
history_window = 2  →  [CLS] <turn-2> [SEP] <turn-1> [SEP] <current> [SEP]
history_window = 4  →  [CLS] <t-4> [SEP] <t-3> [SEP] <t-2> [SEP] <t-1> [SEP] <current> [SEP]

Turns are ordered oldest → newest. System turns are prefixed with "SYS:", user turns with "USR:" to give the model speaker role signals.

Recommended sweep: [0, 2, 4, 6].

Class Imbalance

EmoWOZ is heavily skewed toward class 0 (neutral). Two mitigation strategies are included and can be toggled in configs/default.yaml:

Focal Loss (loss: focal) — down-weights easy neutral examples.
Weighted Cross-Entropy (loss: weighted_ce) — per-class inverse frequency weights computed from the training set.

Outputs

After training, outputs/ contains:

best_model/ — best checkpoint by macro-F1 on validation
last_model/ — final epoch checkpoint
training_log.jsonl — epoch-level metrics
test_results.json — per-class precision / recall / F1 + confusion matrix