DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization
This repository provides the model for DuplexCascade, a full-duplex speech-to-speech dialogue system built on a cascaded ASR-LLM-TTS pipeline with VAD-free interaction and micro-turn optimization.
The backbone large language model is Qwen2-7B-Instruct, which was further fine-tuned for our duplex dialogue setting.
Paper
Our paper is available on arXiv:
Paper: https://arxiv.org/abs/2603.09180
Inference Code
Please refer to our GitHub repository for inference and implementation details:
GitHub: https://github.com/sbintuitions/DuplexCascade
Model Description
DuplexCascade is designed for full-duplex spoken dialogue, enabling more natural interaction through:
- A cascaded ASR-LLM-TTS pipeline
- VAD-free dialogue control
- Micro-turn optimization for smoother turn-taking behavior
This model is obtained by fine-tuning Qwen2-7B-Instruct for the full-duplex dialogue setting.
Base Model
- Base LLM: Qwen2-7B-Instruct
License
This model is released under the MIT License.