Dataset Card for Emotional-Context-Speech
Dataset Summary
Emotional-Context-Speech is a large-scale, context-aware speech corpus derived from multi-speaker audiobooks. To overcome the information bottleneck inherent in traditional sentence-level annotations, this dataset provides a rich set of metadata designed to capture the subtle nuances of human affect. Based on cognitive appraisal theories, the dataset incorporates two core dimensions to define emotional expression: Personal Experience (representing the individual's historical cognitive baseline) and conversational Context (representing the immediate interaction environment).
This dataset is designed to advance Emotional Text-to-Speech (TTS) research by shifting the paradigm from sentence-level control to context-aware modeling.
Dataset Details
- Language: Mandarin Chinese (zh)
- Total Duration: Approximately 1,335 hours
- Total Samples: ~730k samples
- Emotion Categories: 3,359 open-vocabulary emotion categories
- Source Data: Publicly available multi-speaker audio dramas
- Annotation Method: Automated pipeline combining LLM (DeepSeek-V3.2) reasoning with Acoustic Feature Quantization.
Data Structure
The dataset is provided in JSONL format. Each line contains a dictionary with rich textual, contextual, and acoustic features.
| Feature Key | Description |
|---|---|
text |
The literal transcription of the speech utterance. |
context_description |
A summary of the immediate dialogue context and scenario. |
personal_experience |
The character's personal background and historical cognitive baseline. |
emotion_reason |
The causal logic behind the emotion, derived from the context and experience. |
emotions |
An open-vocabulary list of target emotion categories. |
emotion_description |
A natural language summary of the character's emotional state. |
paralinguistic_description_dpsk |
Paralinguistic details generated by DeepSeek-V3.2. |
paralinguistic_description_captioner |
Paralinguistic details generated by Qwen3-Omni-Captioner for benchmarking. |
cosy2_codec |
Discrete semantic tokens extracted using the CosyVoice2 model. |
Data Instance Example
{
"text": "小衿,你还有哪里不舒服吗?",
"context_description": "此刻他听见了养女冷漠回应和医生离开的动静,经历了病房内的冲突后,试图用询问转移注意力。",
"personal_experience": "他过去作为富养长大的名媛,经常用虚伪方式处理人际关系,并长期将养女视为可利用资源,影响此刻发言。",
"emotion_reason": "说话人因刚目睹养女反击医生并表现出冷漠,需维持关心形象以控制局面,导致语音表现出缓慢、停顿的虚伪关心;其个人历史中长期利用养女作为血库并习惯表面友善,促使此刻用温和语调掩盖真实意图。",
"emotions": ["关心", "虚伪"],
"emotion_description": "语气看似关心但缺乏真诚,隐含控制意图。",
"paralinguistic_description_dpsk": "中等音量低声说着,语速缓慢带有停顿,像是轻声询问。",
"paralinguistic_description_captioner": "语速正常,没有急促或拖延的情况。声音柔和,音量适中,音高平稳,没有明显的音量波动或变化。未发现其他非语音声音,如笑声、咳嗽、叹气或深呼吸。",
"cosy2_codec": [1978, 5622, 5485, 4756, 2300, 137, 3462, 2508, 2753, 4994, 4967, 2689, 1302, 2112, 2112, 1470, 3657, 3661, 3661, 3651, 1869, 2112, 4299, 2112, 1869, 4056, 3975, 1788, 2112, 4056, 2112, 2112, 2112, 4237, 4886, 2473, 6075, 2285, 4615, 2819, 829, 748, 1460, 4780, 5907, 2429, 2166, 2172, 6557, 6048, 6292, 1854, 5824, 5067, 6537, 5805, 5093, 1397, 2125, 272, 2197, 5103, 3159, 1701, 2112, 2112, 4299, 2112, 2112, 4299, 4056, 1869, 3975, 3975, 3975, 2112, 3894]
}
Considerations for Using the Data
- License: This dataset is released under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. It is strictly for non-commercial, academic, and research purposes.
- Copyright Restrictions: The raw audio data is derived from publicly available, copyrighted audiobooks and cannot be publicly distributed. This dataset only provides the extracted text, situational context metadata, and discrete acoustic semantic tokens. Users may not use any contained features to create commercial products.
- Ethical Constraints: We acknowledge potential risks regarding deepfake misuse and LLM-induced annotation bias. Any generated artifacts or models utilizing this dataset should adhere to strict non-commercial ethical guidelines.
Citation Information
If you use this dataset in your research, please cite our paper:
@inproceedings{sun2026emotional,
title={Beyond Sentence-level Labels: Integrating Conversational Context and Personal Experience for Natural Emotional Expression},
author={Sun, Haiyang and Le, Chenyang and Wang, Wei and Zhang, Leying and Li, Chuang and Han, Bing and Li, Chenda and Bi, Mengxiao and Qian, Yanmin},
booktitle={Findings of the Association for Computational Linguistics: ACL 2026},
year={2026}
}