Abstract
A non-autoregressive speech recognition approach formulates acoustic-to-text conversion as conditional transcript editing using a bidirectional language model editor with latent alignment training and interleaved padding for improved efficiency.
While autoregressive (AR) LLM-based ASR systems achieve strong accuracy, their sequential decoding limits parallelism and incurs high latency. We propose NLE, a non-autoregressive (NAR) approach that formulates speech recognition as conditional transcript editing, enabling fully parallel prediction. NLE extracts acoustic embeddings and an initial hypothesis from a pretrained speech encoder, then refines the hypothesis using a bidirectional LLM editor trained with a latent alignment objective. An interleaved padding strategy exploits the identity mapping bias of Transformers, allowing the model to focus on corrections rather than full reconstruction. On the Open ASR leaderboard, NLE++ achieves 5.67% average WER with an RTFx (inverse real-time factor) of 1630. In single-utterance scenarios, NLE achieves 27x speedup over the AR baseline, making it suitable for real-time applications.
Community
NLE: Non-autoregressive LLM-based ASR by Transcript Editing
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MDM-ASR: Bridging Accuracy and Efficiency in ASR with Diffusion-Based Non-Autoregressive Decoding (2026)
- dLLM-ASR: A Faster Diffusion LLM-based Framework for Speech Recognition (2026)
- Streaming Speech Recognition with Decoder-Only Large Language Models and Latency Optimization (2026)
- TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment (2026)
- Align-Consistency: Improving Non-autoregressive and Semi-supervised ASR with Consistency Regularization (2026)
- Decoder-only Conformer with Modality-aware Sparse Mixtures of Experts for ASR (2026)
- NADIR: Differential Attention Flow for Non-Autoregressive Transliteration in Indic Languages (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper