DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization

This repository provides the model for DuplexCascade, a full-duplex speech-to-speech dialogue system built on a cascaded ASR-LLM-TTS pipeline with VAD-free interaction and micro-turn optimization.

The backbone large language model is Qwen2-7B-Instruct, which was further fine-tuned for our duplex dialogue setting.

Paper

Our paper is available on arXiv:

Paper: https://arxiv.org/abs/2603.09180

Inference Code

Please refer to our GitHub repository for inference and implementation details:

GitHub: https://github.com/sbintuitions/DuplexCascade

Model Description

DuplexCascade is designed for full-duplex spoken dialogue, enabling more natural interaction through:

A cascaded ASR-LLM-TTS pipeline
VAD-free dialogue control
Micro-turn optimization for smoother turn-taking behavior

This model is obtained by fine-tuning Qwen2-7B-Instruct for the full-duplex dialogue setting.

Base Model

Base LLM: Qwen2-7B-Instruct

License

This model is released under the MIT License.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for sbintuitions/DuplexCascade

Base model

Qwen/Qwen2-7B

Finetuned

Qwen/Qwen2-7B-Instruct

Finetuned

(125)

this model

Paper for sbintuitions/DuplexCascade

DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization

Paper • 2603.09180 • Published 30 days ago