Raon-SpeechChat-9B
Technical Report | Blog (Coming soon)
Raon-SpeechChat-9B is a full-duplex speech language model that enables real-time, simultaneous listen-and-speak conversation in English. Built on top of Raon-Speech-9B, it extends the base model with full-duplex decoding — the model can listen to a user and generate speech responses at the same time, supporting natural turn-taking, backchannels ("uh-huh", "mm-hmm"), and barge-in handling.
Key Features
- Full-Duplex Conversation: Simultaneous listen-and-speak decoding — the model processes user speech and generates responses in real time, just like a natural conversation.
- End-to-End Speech Language Model: Built on Qwen3 (36 layers, 4096 hidden dim), Voxtral-Mini-4B-Realtime-2602 Audio Encoder (32 layers), Mimi codec (32 quantizers), ECAPA-TDNN speaker encoder, Qwen3OmniMoeTalkerCodePredictor (5 layers, 1024 hidden dim), and Qwen3-based Talker (4 layers, 2048 hidden dim).
- Bilingual Support: Real-time conversational speech understanding and generation in both English.
- Backchannel Responses: Dedicated backchannel token (
<|audio_output_backchannel|>) for natural conversational feedback like "uh-huh" and "mm-hmm", with adjustable frequency via backchannel penalty. - Speak-First / Listen-First Modes: Configurable via runtime token forcing — the model can either wait for user speech before responding (listen-first) or begin speaking immediately (speak-first).
- Persona-Driven Conversations: 17 built-in personas with customizable system prompts, context injection, and persona catalog support.
- Speaker Voice Conditioning: Optional speaker reference audio for voice cloning via ECAPA-TDNN embeddings.
- HuggingFace Transformers Integration: Load and run directly via
AutoModel.from_pretrainedwithtrust_remote_code=True— no custom package installation required.
Benchmark Results
TBD
Requirements
pip install transformers>=4.57.1 torch torchaudio soundfile accelerate
# Optional
pip install speechbrain # for speaker voice conditioning
pip install gradio # for Gradio demo
Quick Start
Option 1: Load from Hub (recommended)
No pip install raon needed.
import importlib
import torch
from transformers import AutoModel
MODEL_ID = "KRAFTON/Raon-SpeechChat-9B"
# Load model (downloads code + weights from Hub)
_model = AutoModel.from_pretrained(MODEL_ID, trust_remote_code=True, dtype=torch.bfloat16, device_map="cuda")
# Get RaonPipeline from Hub module
hub_module = importlib.import_module(type(_model).__module__)
RaonPipeline = hub_module.RaonPipeline
del _model
# Create pipeline
pipe = RaonPipeline(MODEL_ID, device="cuda", dtype="bfloat16")
Option 2: With raon package installed
pip install -e . # or: uv sync
from raon import RaonPipeline
# From Hub (local code + Hub weights)
pipe = RaonPipeline("KRAFTON/Raon-SpeechChat-9B")
# From local path
pipe = RaonPipeline("/path/to/raon-duplex-model")
Related Models
- Raon-Speech-9B — Base speech language model supporting STT, TTS, TextQA, and SpeechChat tasks.
License
This repository is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.
© 2026 KRAFTON
- Downloads last month
- 13