| --- |
| license: mit |
| base_model: Qwen/Qwen2-7B-Instruct |
| tags: |
| - speech-to-speech |
| - dialogue |
| - full-duplex |
| - asr |
| - tts |
| - llm |
| - qwen2 |
| - vad-free |
| - micro-turn |
| language: |
| - en |
| pipeline_tag: text-generation |
| --- |
| |
| # DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization |
|
|
| This repository provides the model for **DuplexCascade**, a full-duplex speech-to-speech dialogue system built on a cascaded ASR-LLM-TTS pipeline with **VAD-free interaction** and **micro-turn optimization**. |
|
|
| The backbone large language model is [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct), which was further fine-tuned for our duplex dialogue setting. |
|
|
| ## Paper |
|
|
| Our paper is available on arXiv: |
|
|
| **Paper:** https://arxiv.org/abs/2603.09180 |
|
|
| ## Inference Code |
|
|
| Please refer to our GitHub repository for inference and implementation details: |
|
|
| **GitHub:** https://github.com/sbintuitions/DuplexCascade |
|
|
| ## Model Description |
|
|
| DuplexCascade is designed for full-duplex spoken dialogue, enabling more natural interaction through: |
|
|
| - A cascaded **ASR-LLM-TTS** pipeline |
| - **VAD-free** dialogue control |
| - **Micro-turn optimization** for smoother turn-taking behavior |
|
|
| This model is obtained by fine-tuning [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct) for the full-duplex dialogue setting. |
|
|
| ## Base Model |
|
|
| - **Base LLM:** [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct) |
|
|
| ## License |
|
|
| This model is released under the **MIT License**. |