ddwang2000 commited on
Commit
b49cf49
·
verified ·
1 Parent(s): 3c05a99

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +84 -0
README.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - Qwen/Qwen2.5-Omni-7B
7
+ ---
8
+
9
+ # EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning
10
+
11
+
12
+ [![ICLR 2026 Oral](https://img.shields.io/badge/ICLR%202026-Oral-gold)](https://arxiv.org/pdf/2601.15668) [![Project](https://img.shields.io/badge/Project-Page-green)](https://github.com/dingdongwang/EmotionThinker)
13
+
14
+ <p align="center">
15
+ <img src="intro.png" width="800"/>
16
+ </p>
17
+
18
+ ## Introduction
19
+ EmotionThinker is the first RL–enhanced SpeechLLM framework for interpretable speech emotion reasoning.
20
+
21
+ Unlike conventional speech emotion recognition (SER) systems that treat emotion as a flat classification problem, EmotionThinker reframes SER as a deep reasoning problem, enabling models to jointly produce accurate emotion labels and structured, human-aligned explanations.
22
+
23
+ EmotionThinker offers the following advantages:
24
+
25
+ - Higher emotion recognition accuracy compared to existing SpeechLLMs;
26
+ - Deep reasoning ability to integrate emotion-related cues for justification;
27
+ - Fine-grained audio caption covering speaker traits, prosodic cues and semantic information.
28
+
29
+
30
+ ## Quickstart
31
+
32
+ ```
33
+ import torch
34
+ from transformers import Qwen2_5OmniForConditionalGeneration, Qwen2_5OmniProcessor
35
+ from qwen_omni_utils import process_mm_info
36
+
37
+ processor = Qwen2_5OmniProcessor.from_pretrained('ddwang2000/EmotionThinker')
38
+
39
+ model = Qwen2_5OmniForConditionalGeneration.from_pretrained('ddwang2000/EmotionThinker',torch_dtype="auto", device_map="auto")
40
+
41
+ print("✅ Model loaded successfully")
42
+
43
+ audio_path="angry.wav" #your audio path
44
+ prompt="<audio>What is the emotion expressed in this audio clip? Please choose one from the following options: neutral, happy, sad, angry, contempt or disgust, confused, whisper, surprise, fear."
45
+
46
+ messages = [
47
+ {"role": "system", "content": [
48
+ {"type": "text", "text": "A conversation between User and Assistant. The user asks a question, and the Assistant solves it. The assistant first thinks about the reasoning process in the mind and then provides the user with the answer. The reasoning process and answer are enclosed within <think> </think> and <answer> </answer> tags, respectively, i.e., <think> reasoning process here </think><answer> answer here </answer>."}
49
+ ], },
50
+ {"role": "user", "content": [
51
+ {"type": "audio", "audio": audio_path},
52
+ {"type": "text", "text": prompt},
53
+ ]
54
+ },
55
+ ]
56
+
57
+ text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
58
+ audios, images, videos = process_mm_info(messages, use_audio_in_video=False)
59
+ inputs = processor(text=text, audio=audios, images=images, videos=videos, return_tensors="pt", padding=True, use_audio_in_video=False)
60
+ inputs = inputs.to(model.device).to(model.dtype)
61
+
62
+ with torch.no_grad():
63
+ text_ids = model.generate(
64
+ **inputs,
65
+ return_audio=False,
66
+ max_new_tokens=2048
67
+ )[:, inputs.input_ids.size(1):]
68
+
69
+
70
+ text = processor.batch_decode(text_ids, skip_special_tokens=True, clean_up_tokenization_spaces=False)[0]
71
+ print(text)
72
+
73
+ ```
74
+
75
+ ## Citation
76
+ If you find this model useful in your research, please kindly cite:
77
+ ```
78
+ @inproceedings{wang2026emotionthinker,
79
+ title={EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning},
80
+ author={Wang, Dingdong and Liu, Shujie and Zhang, Tianhua and Chen, Youjun and Li, Jinyu and Meng, Helen},
81
+ booktitle={International Conference on Learning Representations (ICLR)},
82
+ year={2026}
83
+ }
84
+ ```