Masked Audio Generation using a Single Non-Autoregressive Transformer Paper • 2401.04577 • Published Jan 9, 2024 • 45
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing Paper • 2110.13900 • Published Oct 26, 2021 • 1
Moshi: a speech-text foundation model for real-time dialogue Paper • 2410.00037 • Published Sep 17, 2024 • 11
view article Article How to train a new language model from scratch using Transformers and Tokenizers Feb 14, 2020 • 58