## Introduction
EmotionThinker is the first RL–enhanced SpeechLLM framework for interpretable speech emotion reasoning. For details, please refer to the [paper](https://arxiv.org/pdf/2601.15668).
Unlike conventional speech emotion recognition (SER) systems that treat emotion as a flat classification problem, EmotionThinker reframes SER as a deep reasoning problem, enabling models to jointly produce accurate emotion labels and structured, human-aligned explanations.
EmotionThinker offers the following advantages:
- Higher emotion recognition accuracy compared to existing SpeechLLMs;
- Deep reasoning ability to integrate emotion-related cues for justification;
- Fine-grained audio caption covering speaker traits, prosodic cues and semantic information.
## Quickstart
```
import torch
from transformers import Qwen2_5OmniForConditionalGeneration, Qwen2_5OmniProcessor
from qwen_omni_utils import process_mm_info
processor = Qwen2_5OmniProcessor.from_pretrained('ddwang2000/EmotionThinker')
model = Qwen2_5OmniForConditionalGeneration.from_pretrained('ddwang2000/EmotionThinker',torch_dtype="auto", device_map="auto")
print("✅ Model loaded successfully")
audio_path="angry.wav" #your audio path
prompt="