OA-RUMER / README.md
Codex
Rename dataset assets for modeling
cfb1af4
---
license: mit
language:
- vi
tags:
- vietnamese
- speech
- emotion-recognition
- escalation-detection
- customer-service
- multimodal
- model-assets
- model-ready
---
# OA-RUMER Model Assets
OA-RUMER is a Vietnamese customer-service modeling asset package for
turn-level emotion recognition and escalation/de-escalation analysis. This
repository contains metadata plus model-ready CSV/JSON splits; it does not
include the full raw audio corpus.
## Contents
- `model_assets/metadata/calls_metadata.csv`: call-level audio metadata.
- `model_assets/metadata/model_assets_summary.md`: high-level asset summary.
- `model_assets/model_ready/oa_rumer_full_phowhisper_3class/`: primary 3-class
turn-level splits and summaries.
- `model_assets/model_ready/oa_rumer_full_phowhisper/`: original full label
variant.
- `model_assets/model_ready/text_only_transition_3class/`: customer-transition
split files for escalation modeling.
## Repository Layout
| Path | Contents |
|---|---|
| `model_assets/metadata/` | Call-level metadata and model assets summary |
| `model_assets/model_ready/` | Final CSV/JSON splits ready for modeling |
## Labels
- Emotion labels: `neutral`, `positive`, `negative`
- Original emotion labels include `negative_low` and `negative_high`
- Escalation labels: `stable`, `de-escalation`, `escalation`
- Role labels: `customer`, `agent`
- Overlap labels: `no_overlap`, `backchannel_overlap`,
`interruption_overlap`, `conflict_overlap`, `uncertain_overlap`
The cleaned split CSVs use `label_confidence` for annotation confidence.
## Audio Paths
Raw WAV files are not included in this repository. The CSV files still point to
`data_audio_set/*.wav`; place the local audio folder at `data_audio_set/` when
running audio-based experiments.
## Notes
- `negative_high` is merged into `negative` for the main 3-class runs.
- Escalation can be modeled at the customer-transition level using
`model_assets/model_ready/text_only_transition_3class/`.
- Audio paths in the CSV files point to `data_audio_set/*.wav`.
## PhoBERT Context Baselines
The local experiment runner at `experiments/run_phobert_context_baselines.sh`
adds text-only ablations around a frozen PhoBERT turn encoder plus an optional
Transformer context encoder.
| Model | Text | Audio | Role | Context | Overlap | MT |
|---|---|---|---|---|---|---|
| Text-only PhoBERT | Yes | No | No | No | No | No |
| Text Context Transformer | Yes | No | No | Yes | No | No |
| Text+Role Context | Yes | No | Yes | Yes | No | No |
| Text+Role Transition Context | Yes | No | Yes | Yes | No | Yes |
| Text+Role Agent Context Transition | Yes | No | Yes | Yes | No | No |
| OA-RUMER | Yes | Yes | Yes | Yes | Yes | Yes |
Run a smoke trial:
```bash
DEVICE=auto experiments/run_phobert_context_baselines.sh trial
```
Run the full 3-class suite:
```bash
DEVICE=auto EPOCHS=8 experiments/run_phobert_context_baselines.sh full
```