view article Article Beyond LoRA: Can you beat the most popular fine-tuning technique? +2 BenjaminB, sayakpaul, hubnemo, kashif • 11 days ago • 64
Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields Paper • 2606.11042 • Published 20 days ago • 22
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments Paper • 2605.30280 • Published May 28 • 146
Macaron-A2UI: A Model for Generative UI in Personal Agents Paper • 2605.24830 • Published May 24 • 83
Lightning Unified Video Editing via In-Context Sparse Attention Paper • 2605.04569 • Published May 6 • 18
From Context to Skills: Can Language Models Learn from Context Skillfully? Paper • 2604.27660 • Published May 3 • 171
EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model Paper • 2604.10268 • Published Apr 11 • 12
view article Article Introducing NVIDIA Nemotron 3 Nano Omni: Long-Context Multimodal Intelligence for Documents, Audio and Video Agents nvidia • Apr 28 • 62
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published Apr 27 • 119
view article Article Training and Finetuning Multimodal Embedding & Reranker Models with Sentence Transformers tomaarsen • Apr 16 • 72
FP4 Explore, BF16 Train: Diffusion Reinforcement Learning via Efficient Rollout Scaling Paper • 2604.06916 • Published Apr 8 • 34
Omni-SimpleMem: Autoresearch-Guided Discovery of Lifelong Multimodal Agent Memory Paper • 2604.01007 • Published Apr 2 • 31
view article Article Welcome Gemma 4: Frontier multimodal intelligence on device +5 merve, pcuenq, sergiopaniego, burtenshaw, Steveeeeeeen, alvarobartt, SaylorTwift • Apr 2 • 910
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale Paper • 2603.25040 • Published Mar 26 • 134