End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Pseudo Whisper Pre-training Paper • 2005.01972 • Published May 5, 2020
USAD 2.0: Scaling Representation Distillation for Universal Audio Understanding Paper • 2606.06444 • Published 25 days ago • 3
MARQUIS: A Three-Stage Pipeline for Video Retrieval-Augmented Generation Paper • 2605.17640 • Published May 17
Principled Context Engineering for RAG: Statistical Guarantees via Conformal Prediction Paper • 2511.17908 • Published Jan 19
view post Post 3283 Built a small site for tracking speech-to-speech, full-duplex, and audio foundation model work.It covers models, benchmarks, datasets, and some blog posts to organize the landscape in one place.Still early, but sharing in case it is useful:https://www.fullduplex.ai/If you spot missing entries or mistakes, I would really appreciate corrections. See translation 2 replies · 🔥 3 3 + Reply