--- license: apache-2.0 language: - en base_model: - Qwen/Qwen2.5-Omni-7B --- # EmotionThinker: Prosody-Aware Reinforcement Learning for Explainable Speech Emotion Reasoning [![ICLR 2026 Oral](https://img.shields.io/badge/ICLR%202026-Oral-gold)](https://arxiv.org/pdf/2601.15668) [![Project](https://img.shields.io/badge/Project-Page-green)](https://github.com/dingdongwang/EmotionThinker)

## Introduction EmotionThinker is the first RL–enhanced SpeechLLM framework for interpretable speech emotion reasoning. For details, please refer to the [paper](https://arxiv.org/pdf/2601.15668). Unlike conventional speech emotion recognition (SER) systems that treat emotion as a flat classification problem, EmotionThinker reframes SER as a deep reasoning problem, enabling models to jointly produce accurate emotion labels and structured, human-aligned explanations. EmotionThinker offers the following advantages: - Higher emotion recognition accuracy compared to existing SpeechLLMs; - Deep reasoning ability to integrate emotion-related cues for justification; - Fine-grained audio caption covering speaker traits, prosodic cues and semantic information. ## Quickstart ``` import torch from transformers import Qwen2_5OmniForConditionalGeneration, Qwen2_5OmniProcessor from qwen_omni_utils import process_mm_info processor = Qwen2_5OmniProcessor.from_pretrained('ddwang2000/EmotionThinker') model = Qwen2_5OmniForConditionalGeneration.from_pretrained('ddwang2000/EmotionThinker',torch_dtype="auto", device_map="auto") print("✅ Model loaded successfully") audio_path="angry.wav" #your audio path prompt="