Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation Paper • 2605.04128 • Published 28 days ago • 17
A Self-Evolving Framework for Efficient Terminal Agents via Observational Context Compression Paper • 2604.19572 • Published Apr 21 • 23
Reward Hacking in the Era of Large Models: Mechanisms, Emergent Misalignment, Challenges Paper • 2604.13602 • Published Apr 15 • 32
DeVI: Physics-based Dexterous Human-Object Interaction via Synthetic Video Imitation Paper • 2604.20841 • Published Apr 22 • 24
EditCrafter: Tuning-free High-Resolution Image Editing via Pretrained Diffusion Model Paper • 2604.10268 • Published Apr 11 • 12
TingIS: Real-time Risk Event Discovery from Noisy Customer Incidents at Enterprise Scale Paper • 2604.21889 • Published Apr 23 • 12
view article Article Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective LinkedIn • Jan 27 • 76
view article Article AssetOpsBench: Bridging the Gap Between AI Agent Benchmarks and Industrial Reality ibm-research • Jan 21 • 33
view article Article We Got Claude to Build CUDA Kernels and teach open models! +2 burtenshaw, evalstate, merve, pcuenq • Jan 28 • 156
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization Paper • 2503.12937 • Published Mar 17, 2025 • 30
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey Paper • 2503.12605 • Published Mar 16, 2025 • 35
BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing Paper • 2503.13434 • Published Mar 17, 2025 • 28
Personalize Anything for Free with Diffusion Transformer Paper • 2503.12590 • Published Mar 16, 2025 • 44
GEN3C: 3D-Informed World-Consistent Video Generation with Precise Camera Control Paper • 2503.03751 • Published Mar 5, 2025 • 25
OmniHuman-1: Rethinking the Scaling-Up of One-Stage Conditioned Human Animation Models Paper • 2502.01061 • Published Feb 3, 2025 • 225
Gradio WebRTC Cookbook ⚡️ Collection Collection of real-time voice and video demos built with gradio-webrtc custom component • 8 items • Updated Dec 10, 2024 • 19
Shiksha: A Technical Domain focused Translation Dataset and Model for Indian Languages Paper • 2412.09025 • Published Dec 12, 2024 • 4