# RoBERTa Dialogue Sentiment Analysis — EmoWOZ Fine-Tuning

Fine-tunes `roberta-base` (or `roberta-large`) on the EmoWOZ dataset for
**per-utterance emotion classification** in task-oriented dialogue, using a
configurable sliding window of preceding dialogue history as context.

## Emotion Labels

| ID | Label        | Notes                        |
|----|--------------|------------------------------|
| -1 | system       | Filtered out — not predicted |
|  0 | neutral      |                              |
|  1 | fearful      |                              |
|  2 | dissatisfied |                              |
|  3 | apologetic   |                              |
|  4 | abusive      |                              |
|  5 | excited      |                              |
|  6 | satisfied    |                              |

Only **user utterances** (emotion ≠ -1) are classified. System turns are
retained as context but not predicted.

## Project Layout

```
roberta_emowoz/
├── configs/
│   └── default.yaml          # All hyperparameters (edit this)
├── data/
│   ├── dataset.py            # DialogueDataset + collator
│   └── preprocessing.py      # JSON → flat utterance samples
├── models/
│   ├── model.py              # RoBERTa wrapper with classification head
│   └── focal_loss.py         # Class-imbalance-aware loss
├── scripts/
│   ├── train.py              # Main training entry point
│   ├── evaluate.py           # Full eval with per-class metrics
│   └── predict.py            # Interactive / batch inference
├── outputs/                  # Checkpoints, logs, predictions (gitignored)
└── requirements.txt
```

## Quick Start

NEED CONDA, install it first! research with chatgpt if u need to know what it is

```bash
conda create -n nst_v4 python=3.11
conda activate nst_v4
pip install -r requirements_normal.txt
pip install --index-url https://download.pytorch.org/whl/cu121 -r requirements_torch.txt

# 2. Place your data files in the project root (or update config paths)
#    set1_train.json  set1_val.json  set1_test.json

# 3. Train (uses defaults from DEFAULT_CONFIG in train.py, override any value)
# For my RTX 3070 this takes 2 hours to complete
python train.py 
# Or with custom parameters:
python train.py --epochs 10 --batch_size 32 --history_window 4 --loss focal

# 4. Evaluate on test set
# python evaluate.py --checkpoint outputs/best_model

# 5. Interactive prediction
# python predict.py --checkpoint outputs/best_model --history_window 3
```

## Key Hyperparameter — `history_window`

`history_window` controls how many **preceding turns** (both user and system)
are prepended as context before the current utterance.

```
history_window = 0  →  [CLS] <current utterance> [SEP]
history_window = 2  →  [CLS] <turn-2> [SEP] <turn-1> [SEP] <current> [SEP]
history_window = 4  →  [CLS] <t-4> [SEP] <t-3> [SEP] <t-2> [SEP] <t-1> [SEP] <current> [SEP]
```

Turns are ordered oldest → newest. System turns are prefixed with `"SYS:"`,
user turns with `"USR:"` to give the model speaker role signals.

Recommended sweep: `[0, 2, 4, 6]`.

## Class Imbalance

EmoWOZ is heavily skewed toward class 0 (neutral). Two mitigation strategies
are included and can be toggled in `configs/default.yaml`:

- **Focal Loss** (`loss: focal`) — down-weights easy neutral examples.
- **Weighted Cross-Entropy** (`loss: weighted_ce`) — per-class inverse
  frequency weights computed from the training set.

## Outputs

After training, `outputs/` contains:
- `best_model/` — best checkpoint by macro-F1 on validation
- `last_model/` — final epoch checkpoint
- `training_log.jsonl` — epoch-level metrics
- `test_results.json` — per-class precision / recall / F1 + confusion matrix