Raon-SpeechChat-9B

Demo | Self-Hosted Demo | Technical Report | Blog (Coming soon)

Raon-SpeechChat-9B is a full-duplex speech language model that enables real-time, simultaneous listen-and-speak conversation in English. Built on top of Raon-Speech-9B, it extends the base model with full-duplex decoding — the model can listen to a user and generate speech responses at the same time, supporting natural turn-taking, backchannels ("uh-huh", "mm-hmm"), and barge-in handling.

Key Features

Full-Duplex Conversation: Simultaneous listen-and-speak decoding — the model processes user speech and generates responses in real time, just like a natural conversation.
End-to-End Speech Language Model: Built on Qwen3 (36 layers, 4096 hidden dim), Voxtral-Mini-4B-Realtime-2602 Audio Encoder (32 layers), Mimi codec (32 quantizers), ECAPA-TDNN speaker encoder, Qwen3OmniMoeTalkerCodePredictor (5 layers, 1024 hidden dim), and Qwen3-based Talker (4 layers, 2048 hidden dim).
Backchannel Responses: Dedicated backchannel token (<|audio_output_backchannel|>) for natural conversational feedback like "uh-huh" and "mm-hmm", with adjustable frequency via backchannel penalty.
Speak-First / Listen-First Modes: Configurable via runtime token forcing — the model can either wait for user speech before responding (listen-first) or begin speaking immediately (speak-first).
Persona-Driven Conversations: 17 built-in personas with customizable system prompts, context injection, and persona catalog support.
Speaker Voice Conditioning: Optional speaker reference audio for voice cloning via ECAPA-TDNN embeddings.
HuggingFace Transformers Integration: Load and run directly via AutoModel.from_pretrained with trust_remote_code=True — no custom package installation required.

Benchmark Results

Raon-SpeechChat performs strongly on conversational speech capabilities such as pause handling, backchanneling, smooth turn-taking, interruption handling, overlap robustness, and multi-turn dialogue.

Requirements

pip install 'transformers>=4.57.1,<5.0' torch torchaudio soundfile accelerate

# Optional
pip install speechbrain  # for speaker voice conditioning
pip install gradio       # for Gradio demo

Quick Start

Option 1: Load from Hub (recommended)

No pip install raon needed.

import importlib
import torch
from transformers import AutoModel

MODEL_ID = "KRAFTON/Raon-SpeechChat-9B"

# Load model (downloads code + weights from Hub)
_model = AutoModel.from_pretrained(MODEL_ID, trust_remote_code=True, dtype=torch.bfloat16, device_map="cuda")

# Get RaonPipeline from Hub module
hub_module = importlib.import_module(type(_model).__module__)
RaonPipeline = hub_module.RaonPipeline
del _model

# Create pipeline
pipe = RaonPipeline(MODEL_ID, device="cuda", dtype="bfloat16")

Option 2: With raon package installed

pip install -e .  # or: uv sync

from raon import RaonPipeline

# From Hub (local code + Hub weights)
pipe = RaonPipeline("KRAFTON/Raon-SpeechChat-9B")

# From local path
pipe = RaonPipeline("/path/to/raon-duplex-model")

Self-Hosted Demo

Run the full-duplex speech conversation demo locally in your browser with Docker.

Prerequisites: NVIDIA GPU with CUDA 12.x (16 GB+ VRAM), Docker, NVIDIA Container Toolkit, and Node.js 18+.

# 1. Clone the demo repo
git clone https://github.com/krafton-ai/Raon-SpeechChat-Demo.git
cd Raon-SpeechChat-Demo

# 2. Build the frontend
cd frontend-next && npm install && npm run export && cd ..

# 3. Launch (model auto-downloads on first run, ~25 GB)
docker compose up -d --build

Visit https://localhost:8082/fd-demo/ once the service is ready. First run takes ~15-30 minutes for model download and conversion. Check readiness:

curl -k https://localhost:8082/health
# Look for: "status": "ok", "healthy_worker_count" > 0

See the Raon-SpeechChat-Demo repository for full documentation, multi-GPU setup, and architecture details.

Related Models

Raon-Speech-9B — Base speech language model supporting STT, TTS, TextQA, and SpeechChat tasks.

License

This repository is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License.

Citation

@misc{raonspeech,
    title  = {Raon-Speech Technical Report},
    author = {{KRAFTON}},
    month  = {April},
    year   = {2026}
}

Downloads last month: 850

Safetensors

Model size

10B params

Tensor type

F32

BF16

Collection including KRAFTON/Raon-SpeechChat-9B

Raon

Collection

9 items • Updated May 21 • 46

Paper for KRAFTON/Raon-SpeechChat-9B

Raon-Speech Technical Report

Paper • 2605.23912 • Published Apr 8