Aditya Kumar Singh's picture

Aditya Kumar Singh

rodo

·

http://rodosingh.github.io/

AI & ML interests

Multimodal Learning

Organizations

upvoted a paper 4 months ago

DUET-VLM: Dual stage Unified Efficient Token reduction for VLM Training and Inference

Paper • 2602.18846 • Published Feb 21 • 5

upvoted 5 papers over 1 year ago

Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers

Paper • 2503.11579 • Published Mar 14, 2025 • 21

Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6, 2025 • 97

VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning

Paper • 2503.13444 • Published Mar 17, 2025 • 20

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Paper • 2503.12937 • Published Mar 17, 2025 • 30

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172

upvoted a collection over 1 year ago

Qwen2.5-Coder

Code-specific model series based on Qwen2.5 • 38 items • Updated Mar 2 • 372