Zipformer: A faster and better encoder for automatic speech recognition Paper β’ 2310.11230 β’ Published Oct 17, 2023 β’ 1
PromptASR for contextualized ASR with controllable style Paper β’ 2309.07414 β’ Published Sep 14, 2023
Libriheavy: a 50,000 hours ASR corpus with punctuation casing and context Paper β’ 2309.08105 β’ Published Sep 15, 2023 β’ 1
Predicting Multi-Codebook Vector Quantization Indexes for Knowledge Distillation Paper β’ 2211.00508 β’ Published Oct 31, 2022
Blank-regularized CTC for Frame Skipping in Neural Transducer Paper β’ 2305.11558 β’ Published May 19, 2023
LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization Paper β’ 2409.00819 β’ Published Sep 1, 2024
Delay-penalized CTC implemented based on Finite State Transducer Paper β’ 2305.11539 β’ Published May 19, 2023
k2SSL: A Faster and Better Framework for Self-Supervised Speech Representation Learning Paper β’ 2411.17100 β’ Published Nov 26, 2024
ZipVoice: Fast and High-Quality Zero-Shot Text-to-Speech with Flow Matching Paper β’ 2506.13053 β’ Published Jun 16, 2025
ZipVoice-Dialog: Non-Autoregressive Spoken Dialogue Generation with Flow Matching Paper β’ 2507.09318 β’ Published Jul 12, 2025 β’ 2