DuplexCascade / README.md
user02169
Initial public release
dca21cb
metadata
license: mit
base_model: Qwen/Qwen2-7B-Instruct
tags:
  - speech-to-speech
  - dialogue
  - full-duplex
  - asr
  - tts
  - llm
  - qwen2
  - vad-free
  - micro-turn
language:
  - en
pipeline_tag: text-generation

DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization

This repository provides the model for DuplexCascade, a full-duplex speech-to-speech dialogue system built on a cascaded ASR-LLM-TTS pipeline with VAD-free interaction and micro-turn optimization.

The backbone large language model is Qwen2-7B-Instruct, which was further fine-tuned for our duplex dialogue setting.

Paper

Our paper is available on arXiv:

Paper: https://arxiv.org/abs/2603.09180

Inference Code

Please refer to our GitHub repository for inference and implementation details:

GitHub: https://github.com/sbintuitions/DuplexCascade

Model Description

DuplexCascade is designed for full-duplex spoken dialogue, enabling more natural interaction through:

  • A cascaded ASR-LLM-TTS pipeline
  • VAD-free dialogue control
  • Micro-turn optimization for smoother turn-taking behavior

This model is obtained by fine-tuning Qwen2-7B-Instruct for the full-duplex dialogue setting.

Base Model

License

This model is released under the MIT License.