Papers
arxiv:2606.19688

Latency-Configurable Streaming Speech Enhancement via Asymmetric Temporal Padding

Published on Jun 19
Authors:
,

Abstract

LaCo-SENet enables configurable speech enhancement latency through asymmetric padding and dual-buffer streaming with selective state updates, achieving competitive quality at low latency.

Streaming speech enhancement requires balancing algorithmic latency against quality, yet existing approaches largely treat this as a binary causal versus non-causal choice. LaCo-SENet addresses this issue with two mechanisms parameterized by a single training-time hyperparameter. First, asymmetric temporal padding redistributes past and future context in convolutions, enabling systematic latency configuration. Second, dual-buffer streaming combines state buffers for past context with lookahead buffers that supply future context at both the input and feature levels. Selective state updates also prevent future-frame leakage into the streaming state, ensuring training-inference consistency. On VoiceBank+DEMAND, a fixed-budget (1.37M parameters) backbone yields a family of models spanning 12.5-75.0 ms, with PESQ rising from 3.35 to 3.43. At just 12.5 ms (fully causal), a PESQ of 3.35 matches or exceeds the prior causal state-of-the-art (3.27 at 46.5 ms).

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2606.19688
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2606.19688 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2606.19688 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.