basaanithanaveenkumar

naveenkumarbasaanitha

13 8

AI & ML interests

computervision NLP

Recent Activity

upvoted a paper 12 days ago

World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible

upvoted a paper 14 days ago

ARM: An AutoRegressive Large Multimodal Model with Unified Discrete Representations

upvoted a paper 14 days ago

Self-Distilled Policy Gradient

View all activity

Organizations

upvoted a paper 12 days ago

World Tracing: Generative Pixel-Aligned Geometry Beyond the Visible

Paper • 2606.13652 • Published 19 days ago • 15

upvoted 2 papers 14 days ago

ARM: An AutoRegressive Large Multimodal Model with Unified Discrete Representations

Paper • 2606.11188 • Published 21 days ago • 26

Self-Distilled Policy Gradient

Paper • 2606.04036 • Published 28 days ago • 27

upvoted a paper 3 months ago

FASTER: Rethinking Real-Time Flow VLAs

Paper • 2603.19199 • Published Mar 19 • 60

upvoted an article 5 months ago

Article

SmolVLM - small yet mighty Vision Language Model

andito, merve, mfarre, eliebak, pcuenq

•

Nov 26, 2024

• 420

upvoted 3 articles 6 months ago

Article

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

danaaubakirova, andito, merve, ariG23498, fracapuano, loubnabnl, pcuenq, mshukor, cadene

•

Jun 3, 2025

• 356

Article

The Optimal Architecture for Small Language Models

codelion

•

Dec 26, 2025

• 121

Article

Mixture of Experts Explained

osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq

•

Dec 11, 2023

• 1.15k

upvoted a paper 6 months ago

Next-Embedding Prediction Makes Strong Vision Learners

Paper • 2512.16922 • Published Dec 18, 2025 • 91

upvoted 2 papers 8 months ago

Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models

Paper • 2509.26628 • Published Sep 30, 2025 • 17

MotionRAG: Motion Retrieval-Augmented Image-to-Video Generation

Paper • 2509.26391 • Published Sep 30, 2025 • 22

upvoted a paper almost 2 years ago

DepthCrafter: Generating Consistent Long Depth Sequences for Open-world Videos

Paper • 2409.02095 • Published Sep 3, 2024 • 38

upvoted a paper over 2 years ago

Video Editing via Factorized Diffusion Distillation

Paper • 2403.09334 • Published Mar 14, 2024 • 22

basaanithanaveenkumar

AI & ML interests

Recent Activity

Organizations

naveenkumarbasaanitha's activity

SmolVLM - small yet mighty Vision Language Model

SmolVLA: Efficient Vision-Language-Action Model trained on Lerobot Community Data

The Optimal Architecture for Small Language Models

Mixture of Experts Explained