joshthoo
/

RoBERTa-EmoWOZ

emotion-classification

Model card Files Files and versions

RoBERTa-EmoWOZ / README_for_train_script.md

joshthoo's picture

add training files

10de798 about 2 months ago

|

history blame contribute delete

3.83 kB

	# RoBERTa Dialogue Sentiment Analysis — EmoWOZ Fine-Tuning

	Fine-tunes `roberta-base` (or `roberta-large`) on the EmoWOZ dataset for
	per-utterance emotion classification in task-oriented dialogue, using a
	configurable sliding window of preceding dialogue history as context.

	## Emotion Labels

	\| ID \| Label \| Notes \|
	\|----\|--------------\|------------------------------\|
	\| -1 \| system \| Filtered out — not predicted \|
	\| 0 \| neutral \| \|
	\| 1 \| fearful \| \|
	\| 2 \| dissatisfied \| \|
	\| 3 \| apologetic \| \|
	\| 4 \| abusive \| \|
	\| 5 \| excited \| \|
	\| 6 \| satisfied \| \|

	Only user utterances (emotion ≠ -1) are classified. System turns are
	retained as context but not predicted.

	## Project Layout

	```
	roberta_emowoz/
	├── configs/
	│ └── default.yaml # All hyperparameters (edit this)
	├── data/
	│ ├── dataset.py # DialogueDataset + collator
	│ └── preprocessing.py # JSON → flat utterance samples
	├── models/
	│ ├── model.py # RoBERTa wrapper with classification head
	│ └── focal_loss.py # Class-imbalance-aware loss
	├── scripts/
	│ ├── train.py # Main training entry point
	│ ├── evaluate.py # Full eval with per-class metrics
	│ └── predict.py # Interactive / batch inference
	├── outputs/ # Checkpoints, logs, predictions (gitignored)
	└── requirements.txt
	```

	## Quick Start

	NEED CONDA, install it first! research with chatgpt if u need to know what it is

	```bash
	conda create -n nst_v4 python=3.11
	conda activate nst_v4
	pip install -r requirements_normal.txt
	pip install --index-url https://download.pytorch.org/whl/cu121 -r requirements_torch.txt

	# 2. Place your data files in the project root (or update config paths)
	# set1_train.json set1_val.json set1_test.json

	# 3. Train (uses defaults from DEFAULT_CONFIG in train.py, override any value)
	# For my RTX 3070 this takes 2 hours to complete
	python train.py
	# Or with custom parameters:
	python train.py --epochs 10 --batch_size 32 --history_window 4 --loss focal

	# 4. Evaluate on test set
	# python evaluate.py --checkpoint outputs/best_model

	# 5. Interactive prediction
	# python predict.py --checkpoint outputs/best_model --history_window 3
	```

	## Key Hyperparameter — `history_window`

	`history_window` controls how many preceding turns (both user and system)
	are prepended as context before the current utterance.

	```
	history_window = 0 → [CLS] <current utterance> [SEP]
	history_window = 2 → [CLS] <turn-2> [SEP] <turn-1> [SEP] <current> [SEP]
	history_window = 4 → [CLS] <t-4> [SEP] <t-3> [SEP] <t-2> [SEP] <t-1> [SEP] <current> [SEP]
	```

	Turns are ordered oldest → newest. System turns are prefixed with `"SYS:"`,
	user turns with `"USR:"` to give the model speaker role signals.

	Recommended sweep: `[0, 2, 4, 6]`.

	## Class Imbalance

	EmoWOZ is heavily skewed toward class 0 (neutral). Two mitigation strategies
	are included and can be toggled in `configs/default.yaml`:

	- Focal Loss (`loss: focal`) — down-weights easy neutral examples.
	- Weighted Cross-Entropy (`loss: weighted_ce`) — per-class inverse
	frequency weights computed from the training set.

	## Outputs

	After training, `outputs/` contains:
	- `best_model/` — best checkpoint by macro-F1 on validation
	- `last_model/` — final epoch checkpoint
	- `training_log.jsonl` — epoch-level metrics
	- `test_results.json` — per-class precision / recall / F1 + confusion matrix