DuplexCascade: Full-Duplex Speech-to-Speech Dialogue with VAD-Free Cascaded ASR-LLM-TTS Pipeline and Micro-Turn Optimization

This repository provides the model for DuplexCascade, a full-duplex speech-to-speech dialogue system built on a cascaded ASR-LLM-TTS pipeline with VAD-free interaction and micro-turn optimization.

The backbone large language model is Qwen2-7B-Instruct, which was further fine-tuned for our duplex dialogue setting.

Paper

Our paper is available on arXiv:

Paper: https://arxiv.org/abs/2603.09180

Inference Code

Please refer to our GitHub repository for inference and implementation details:

GitHub: https://github.com/sbintuitions/DuplexCascade

Model Description

DuplexCascade is designed for full-duplex spoken dialogue, enabling more natural interaction through:

  • A cascaded ASR-LLM-TTS pipeline
  • VAD-free dialogue control
  • Micro-turn optimization for smoother turn-taking behavior

This model is obtained by fine-tuning Qwen2-7B-Instruct for the full-duplex dialogue setting.

Base Model

License

This model is released under the MIT License.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sbintuitions/DuplexCascade

Base model

Qwen/Qwen2-7B
Finetuned
(125)
this model

Paper for sbintuitions/DuplexCascade