VibeVoice Collection Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/ • 9 items • Updated 4 days ago • 199
GutenOCR: A Grounded Vision-Language Front-End for Documents Paper • 2601.14490 • Published 5 days ago • 29
Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities Paper • 2503.04721 • Published Mar 6, 2025 • 2
Nemotron Speech Collection Open, state-of-the-art, production‑ready enterprise speech models from the NVIDIA Speech research team for ASR, TTS, Speaker Diarization and S2S • 17 items • Updated 5 days ago • 29
AIBrix: Towards Scalable, Cost-Effective Large Language Model Inference Infrastructure Paper • 2504.03648 • Published Feb 22, 2025 • 1
view article Article Introducing OptiMind, a research model designed for optimization 10 days ago • 31
view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 Dec 18, 2025 • 116
Nemotron-Cascade Collection Scaling Cascaded Reinforcement Learning for General-Purpose Reasoning Models • 18 items • Updated 5 days ago • 48
TimeBill: Time-Budgeted Inference for Large Language Models Paper • 2512.21859 • Published about 1 month ago • 25
Physics of Language Models: Part 4.1, Architecture Design and the Magic of Canon Layers Paper • 2512.17351 • Published Dec 19, 2025 • 27
Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation Paper • 2512.16913 • Published Dec 18, 2025 • 34