---
license: mit
language:
- vi
tags:
- vietnamese
- speech
- emotion-recognition
- escalation-detection
- customer-service
- multimodal
- model-assets
- model-ready
---

# OA-RUMER Model Assets

OA-RUMER is a Vietnamese customer-service modeling asset package for
turn-level emotion recognition and escalation/de-escalation analysis. This
repository contains metadata plus model-ready CSV/JSON splits; it does not
include the full raw audio corpus.

## Contents

- `model_assets/metadata/calls_metadata.csv`: call-level audio metadata.
- `model_assets/metadata/model_assets_summary.md`: high-level asset summary.
- `model_assets/model_ready/oa_rumer_full_phowhisper_3class/`: primary 3-class
  turn-level splits and summaries.
- `model_assets/model_ready/oa_rumer_full_phowhisper/`: original full label
  variant.
- `model_assets/model_ready/text_only_transition_3class/`: customer-transition
  split files for escalation modeling.

## Repository Layout

| Path | Contents |
|---|---|
| `model_assets/metadata/` | Call-level metadata and model assets summary |
| `model_assets/model_ready/` | Final CSV/JSON splits ready for modeling |

## Labels

- Emotion labels: `neutral`, `positive`, `negative`
- Original emotion labels include `negative_low` and `negative_high`
- Escalation labels: `stable`, `de-escalation`, `escalation`
- Role labels: `customer`, `agent`
- Overlap labels: `no_overlap`, `backchannel_overlap`,
  `interruption_overlap`, `conflict_overlap`, `uncertain_overlap`

The cleaned split CSVs use `label_confidence` for annotation confidence.

## Audio Paths

Raw WAV files are not included in this repository. The CSV files still point to
`data_audio_set/*.wav`; place the local audio folder at `data_audio_set/` when
running audio-based experiments.

## Notes

- `negative_high` is merged into `negative` for the main 3-class runs.
- Escalation can be modeled at the customer-transition level using
  `model_assets/model_ready/text_only_transition_3class/`.
- Audio paths in the CSV files point to `data_audio_set/*.wav`.

## PhoBERT Context Baselines

The local experiment runner at `experiments/run_phobert_context_baselines.sh`
adds text-only ablations around a frozen PhoBERT turn encoder plus an optional
Transformer context encoder.

| Model | Text | Audio | Role | Context | Overlap | MT |
|---|---|---|---|---|---|---|
| Text-only PhoBERT | Yes | No | No | No | No | No |
| Text Context Transformer | Yes | No | No | Yes | No | No |
| Text+Role Context | Yes | No | Yes | Yes | No | No |
| Text+Role Transition Context | Yes | No | Yes | Yes | No | Yes |
| Text+Role Agent Context Transition | Yes | No | Yes | Yes | No | No |
| OA-RUMER | Yes | Yes | Yes | Yes | Yes | Yes |

Run a smoke trial:

```bash
DEVICE=auto experiments/run_phobert_context_baselines.sh trial
```

Run the full 3-class suite:

```bash
DEVICE=auto EPOCHS=8 experiments/run_phobert_context_baselines.sh full
```