--- license: mit language: - vi tags: - vietnamese - speech - emotion-recognition - escalation-detection - customer-service - multimodal - model-assets - model-ready --- # OA-RUMER Model Assets OA-RUMER is a Vietnamese customer-service modeling asset package for turn-level emotion recognition and escalation/de-escalation analysis. This repository contains metadata plus model-ready CSV/JSON splits; it does not include the full raw audio corpus. ## Contents - `model_assets/metadata/calls_metadata.csv`: call-level audio metadata. - `model_assets/metadata/model_assets_summary.md`: high-level asset summary. - `model_assets/model_ready/oa_rumer_full_phowhisper_3class/`: primary 3-class turn-level splits and summaries. - `model_assets/model_ready/oa_rumer_full_phowhisper/`: original full label variant. - `model_assets/model_ready/text_only_transition_3class/`: customer-transition split files for escalation modeling. ## Repository Layout | Path | Contents | |---|---| | `model_assets/metadata/` | Call-level metadata and model assets summary | | `model_assets/model_ready/` | Final CSV/JSON splits ready for modeling | ## Labels - Emotion labels: `neutral`, `positive`, `negative` - Original emotion labels include `negative_low` and `negative_high` - Escalation labels: `stable`, `de-escalation`, `escalation` - Role labels: `customer`, `agent` - Overlap labels: `no_overlap`, `backchannel_overlap`, `interruption_overlap`, `conflict_overlap`, `uncertain_overlap` The cleaned split CSVs use `label_confidence` for annotation confidence. ## Audio Paths Raw WAV files are not included in this repository. The CSV files still point to `data_audio_set/*.wav`; place the local audio folder at `data_audio_set/` when running audio-based experiments. ## Notes - `negative_high` is merged into `negative` for the main 3-class runs. - Escalation can be modeled at the customer-transition level using `model_assets/model_ready/text_only_transition_3class/`. - Audio paths in the CSV files point to `data_audio_set/*.wav`. ## PhoBERT Context Baselines The local experiment runner at `experiments/run_phobert_context_baselines.sh` adds text-only ablations around a frozen PhoBERT turn encoder plus an optional Transformer context encoder. | Model | Text | Audio | Role | Context | Overlap | MT | |---|---|---|---|---|---|---| | Text-only PhoBERT | Yes | No | No | No | No | No | | Text Context Transformer | Yes | No | No | Yes | No | No | | Text+Role Context | Yes | No | Yes | Yes | No | No | | Text+Role Transition Context | Yes | No | Yes | Yes | No | Yes | | Text+Role Agent Context Transition | Yes | No | Yes | Yes | No | No | | OA-RUMER | Yes | Yes | Yes | Yes | Yes | Yes | Run a smoke trial: ```bash DEVICE=auto experiments/run_phobert_context_baselines.sh trial ``` Run the full 3-class suite: ```bash DEVICE=auto EPOCHS=8 experiments/run_phobert_context_baselines.sh full ```