Raon-SpeechChat-9B

Technical Report | Blog (Coming soon)

Raon-SpeechChat-9B is a full-duplex speech language model that enables real-time, simultaneous listen-and-speak conversation in English. Built on top of Raon-Speech-9B, it extends the base model with full-duplex decoding — the model can listen to a user and generate speech responses at the same time, supporting natural turn-taking, backchannels ("uh-huh", "mm-hmm"), and barge-in handling.

Key Features

Full-Duplex Conversation: Simultaneous listen-and-speak decoding — the model processes user speech and generates responses in real time, just like a natural conversation.
End-to-End Speech Language Model: Built on Qwen3 (36 layers, 4096 hidden dim), Voxtral-Mini-4B-Realtime-2602 Audio Encoder (32 layers), Mimi codec (32 quantizers), ECAPA-TDNN speaker encoder, Qwen3OmniMoeTalkerCodePredictor (5 layers, 1024 hidden dim), and Qwen3-based Talker (4 layers, 2048 hidden dim).
Bilingual Support: Real-time conversational speech understanding and generation in both English.
Backchannel Responses: Dedicated backchannel token (<|audio_output_backchannel|>) for natural conversational feedback like "uh-huh" and "mm-hmm", with adjustable frequency via backchannel penalty.
Speak-First / Listen-First Modes: Configurable via runtime token forcing — the model can either wait for user speech before responding (listen-first) or begin speaking immediately (speak-first).
Persona-Driven Conversations: 17 built-in personas with customizable system prompts, context injection, and persona catalog support.
Speaker Voice Conditioning: Optional speaker reference audio for voice cloning via ECAPA-TDNN embeddings.
HuggingFace Transformers Integration: Load and run directly via AutoModel.from_pretrained with trust_remote_code=True — no custom package installation required.

Benchmark Results

TBD

Requirements

pip install transformers>=4.57.1 torch torchaudio soundfile accelerate

# Optional
pip install speechbrain  # for speaker voice conditioning
pip install gradio       # for Gradio demo

Quick Start

Option 1: Load from Hub (recommended)

No pip install raon needed.

import importlib
import torch
from transformers import AutoModel

MODEL_ID = "KRAFTON/Raon-SpeechChat-9B"

# Load model (downloads code + weights from Hub)
_model = AutoModel.from_pretrained(MODEL_ID, trust_remote_code=True, dtype=torch.bfloat16, device_map="cuda")

# Get RaonPipeline from Hub module
hub_module = importlib.import_module(type(_model).__module__)
RaonPipeline = hub_module.RaonPipeline
del _model

# Create pipeline
pipe = RaonPipeline(MODEL_ID, device="cuda", dtype="bfloat16")

Option 2: With raon package installed

pip install -e .  # or: uv sync

from raon import RaonPipeline

# From Hub (local code + Hub weights)
pipe = RaonPipeline("KRAFTON/Raon-SpeechChat-9B")

# From local path
pipe = RaonPipeline("/path/to/raon-duplex-model")

Related Models

Raon-Speech-9B — Base speech language model supporting STT, TTS, TextQA, and SpeechChat tasks.

License

This repository is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.

Downloads last month: 13

Safetensors

Model size

10B params

Tensor type

F32

BF16