sbintuitions
/

DuplexCascade

Text Generation

speech-to-speech

Model card Files Files and versions

DuplexCascade / README.md

user02169

Initial public release

dca21cb 2 months ago

|

history blame contribute delete

1.52 kB

	---
	license: mit
	base_model: Qwen/Qwen2-7B-Instruct
	tags:
	- speech-to-speech
	- dialogue
	- full-duplex
	- asr
	- tts
	- llm
	- qwen2
	- vad-free
	- micro-turn
	language:
	- en
	pipeline_tag: text-generation
	---

	# DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization

	This repository provides the model for DuplexCascade, a full-duplex speech-to-speech dialogue system built on a cascaded ASR-LLM-TTS pipeline with VAD-free interaction and micro-turn optimization.

	The backbone large language model is [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct), which was further fine-tuned for our duplex dialogue setting.

	## Paper

	Our paper is available on arXiv:

	Paper: https://arxiv.org/abs/2603.09180

	## Inference Code

	Please refer to our GitHub repository for inference and implementation details:

	GitHub: https://github.com/sbintuitions/DuplexCascade

	## Model Description

	DuplexCascade is designed for full-duplex spoken dialogue, enabling more natural interaction through:

	- A cascaded ASR-LLM-TTS pipeline
	- VAD-free dialogue control
	- Micro-turn optimization for smoother turn-taking behavior

	This model is obtained by fine-tuning [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct) for the full-duplex dialogue setting.

	## Base Model

	- Base LLM: [Qwen2-7B-Instruct](https://huggingface.co/Qwen/Qwen2-7B-Instruct)

	## License

	This model is released under the MIT License.