---
base_model:
- Qwen/Qwen2.5-Omni-7B
language:
- en
license: apache-2.0
library_name: transformers
pipeline_tag: audio-text-to-text
---
# EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning
This repository contains the model presented in the paper [EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning](https://huggingface.co/papers/2601.15668).
[](https://arxiv.org/pdf/2601.15668) [](https://github.com/dingdongwang/EmotionThinker)
## Introduction
EmotionThinker is the first RL–enhanced SpeechLLM framework for interpretable speech emotion reasoning. For details, please refer to the [paper](https://huggingface.co/papers/2601.15668).
Unlike conventional speech emotion recognition (SER) systems that treat emotion as a flat classification problem, EmotionThinker reframes SER as a deep reasoning problem, enabling models to jointly produce accurate emotion labels and structured, human-aligned explanations.
EmotionThinker offers the following advantages:
- Higher emotion recognition accuracy compared to existing SpeechLLMs;
- Deep reasoning ability to integrate emotion-related cues for justification;
- Fine-grained audio caption covering speaker traits, prosodic cues and semantic information.
## Quickstart
```python
import torch
from transformers import Qwen2_5OmniForConditionalGeneration, Qwen2_5OmniProcessor
from qwen_omni_utils import process_mm_info
processor = Qwen2_5OmniProcessor.from_pretrained('ddwang2000/EmotionThinker')
model = Qwen2_5OmniForConditionalGeneration.from_pretrained('ddwang2000/EmotionThinker',torch_dtype="auto", device_map="auto")
print("✅ Model loaded successfully")
audio_path="angry.wav" #your audio path
prompt="