| --- |
| license: mit |
| language: |
| - vi |
| tags: |
| - vietnamese |
| - speech |
| - emotion-recognition |
| - escalation-detection |
| - customer-service |
| - multimodal |
| - model-assets |
| - model-ready |
| --- |
| |
| # OA-RUMER Model Assets |
|
|
| OA-RUMER is a Vietnamese customer-service modeling asset package for |
| turn-level emotion recognition and escalation/de-escalation analysis. This |
| repository contains metadata plus model-ready CSV/JSON splits; it does not |
| include the full raw audio corpus. |
|
|
| ## Contents |
|
|
| - `model_assets/metadata/calls_metadata.csv`: call-level audio metadata. |
| - `model_assets/metadata/model_assets_summary.md`: high-level asset summary. |
| - `model_assets/model_ready/oa_rumer_full_phowhisper_3class/`: primary 3-class |
| turn-level splits and summaries. |
| - `model_assets/model_ready/oa_rumer_full_phowhisper/`: original full label |
| variant. |
| - `model_assets/model_ready/text_only_transition_3class/`: customer-transition |
| split files for escalation modeling. |
|
|
| ## Repository Layout |
|
|
| | Path | Contents | |
| |---|---| |
| | `model_assets/metadata/` | Call-level metadata and model assets summary | |
| | `model_assets/model_ready/` | Final CSV/JSON splits ready for modeling | |
|
|
| ## Labels |
|
|
| - Emotion labels: `neutral`, `positive`, `negative` |
| - Original emotion labels include `negative_low` and `negative_high` |
| - Escalation labels: `stable`, `de-escalation`, `escalation` |
| - Role labels: `customer`, `agent` |
| - Overlap labels: `no_overlap`, `backchannel_overlap`, |
| `interruption_overlap`, `conflict_overlap`, `uncertain_overlap` |
|
|
| The cleaned split CSVs use `label_confidence` for annotation confidence. |
|
|
| ## Audio Paths |
|
|
| Raw WAV files are not included in this repository. The CSV files still point to |
| `data_audio_set/*.wav`; place the local audio folder at `data_audio_set/` when |
| running audio-based experiments. |
|
|
| ## Notes |
|
|
| - `negative_high` is merged into `negative` for the main 3-class runs. |
| - Escalation can be modeled at the customer-transition level using |
| `model_assets/model_ready/text_only_transition_3class/`. |
| - Audio paths in the CSV files point to `data_audio_set/*.wav`. |
|
|
| ## PhoBERT Context Baselines |
|
|
| The local experiment runner at `experiments/run_phobert_context_baselines.sh` |
| adds text-only ablations around a frozen PhoBERT turn encoder plus an optional |
| Transformer context encoder. |
|
|
| | Model | Text | Audio | Role | Context | Overlap | MT | |
| |---|---|---|---|---|---|---| |
| | Text-only PhoBERT | Yes | No | No | No | No | No | |
| | Text Context Transformer | Yes | No | No | Yes | No | No | |
| | Text+Role Context | Yes | No | Yes | Yes | No | No | |
| | Text+Role Transition Context | Yes | No | Yes | Yes | No | Yes | |
| | Text+Role Agent Context Transition | Yes | No | Yes | Yes | No | No | |
| | OA-RUMER | Yes | Yes | Yes | Yes | Yes | Yes | |
|
|
| Run a smoke trial: |
|
|
| ```bash |
| DEVICE=auto experiments/run_phobert_context_baselines.sh trial |
| ``` |
|
|
| Run the full 3-class suite: |
|
|
| ```bash |
| DEVICE=auto EPOCHS=8 experiments/run_phobert_context_baselines.sh full |
| ``` |
|
|