Full Attention Strikes Back: Transferring Full Attention into Sparse within Hundred Training Steps Paper • 2605.16928 • Published 7 days ago • 80
Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs Paper • 2605.20315 • Published 4 days ago • 27
Gated DeltaNet-2: Decoupling Erase and Write in Linear Attention Paper • 2605.22791 • Published 2 days ago • 16
Toto 2.0: Time Series Forecasting Enters the Scaling Era Paper • 2605.20119 • Published 4 days ago • 35
TransitLM: A Large-Scale Dataset and Benchmark for Map-Free Transit Route Generation Paper • 2605.22355 • Published 2 days ago • 164
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 3 days ago • 125
LatentOmni: Rethinking Omni-Modal Understanding via Unified Audio-Visual Latent Reasoning Paper • 2605.22012 • Published 2 days ago • 35
π-Bench: Evaluating Proactive Personal Assistant Agents in Long-Horizon Workflows Paper • 2605.14678 • Published 4 days ago • 86
view article Article LeRobot Humanoid: An Open, Low-Cost, 3D-Printed Humanoid for Robot Learning VirgileBatto • 1 day ago • 18
UniT: Unified Geometry Learning with Group Autoregressive Transformer Paper • 2605.21131 • Published 3 days ago • 5
Mega-ASR: Towards In-the-wild^2 Speech Recognition via Scaling up Real-world Acoustic Simulation Paper • 2605.19833 • Published 4 days ago • 125
IndusAgent: Reinforcing Open-Vocabulary Industrial Anomaly Detection with Agentic Tools Paper • 2605.20682 • Published 3 days ago • 50
MambaVision Collection MambaVision: A Hybrid Mamba-Transformer Vision Backbone. Includes both 1K and 21K pretrained models. • 12 items • Updated 3 days ago • 37
BigVGAN Collection BigVGAN is a universal neural vocoder that generates audio waveform using mel spectrogram as input. • 11 items • Updated 3 days ago • 19
Nemotron 3 8B Collection The Nemotron 3 8B Family of models is optimized for building production-ready generative AI applications for the enterprise. • 5 items • Updated 3 days ago • 54
SSMs Collection A collection of Mamba-2-based research models with 8B parameters trained on 3.5T tokens for comparison with Transformers. • 5 items • Updated 3 days ago • 31
Llama3-ChatQA-1.5 Collection Llama3-ChatQA-1.5 models excel at conversational question answering (QA) and retrieval-augmented generation (RAG). • 6 items • Updated 3 days ago • 47
NV-Embed Collection NV-Embed is a generalist embedding model encompassing retrieval, reranking, classification, clustering, STS tasks. • 3 items • Updated 3 days ago • 19