7 53 62

Manan Shah

cs-mshah

https://cs-mshah.github.io/

AI & ML interests

Computer Vision

Recent Activity

authored a paper 3 days ago

MirrorVerse: Pushing Diffusion Models to Realistically Reflect the World

authored a paper 3 days ago

Lift4D: Harmonizing Single-View 3D Estimation for 4D Reconstruction In-the-Wild

upvoted a paper 4 days ago

Lift4D: Harmonizing Single-View 3D Estimation for 4D Reconstruction In-the-Wild

View all activity

Organizations

upvoted a paper 4 days ago

Lift4D: Harmonizing Single-View 3D Estimation for 4D Reconstruction In-the-Wild

Paper • 2606.23688 • Published 7 days ago • 4

upvoted a paper 24 days ago

Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs

Paper • 2605.30611 • Published May 28 • 250

upvoted 2 papers about 2 months ago

Leveraging Verifier-Based Reinforcement Learning in Image Editing

Paper • 2604.27505 • Published Apr 30 • 59

Representation Fréchet Loss for Visual Generation

Paper • 2604.28190 • Published Apr 30 • 32

upvoted a paper 2 months ago

Extending One-Step Image Generation from Class Labels to Text via Discriminative Text Representation

Paper • 2604.18168 • Published Apr 20 • 96

upvoted 2 articles 3 months ago

Article

2. Attention Optimizations: From Standard Attention to FlashAttention

atharv6f

•

Feb 9

• 2

Article

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, nouamanetazi, lvwerra, sergiopaniego

•

Mar 10

• 165

upvoted 2 papers 3 months ago

FlowScene: Style-Consistent Indoor Scene Generation with Multimodal Graph Rectified Flow

Paper • 2603.19598 • Published Mar 20 • 32

WorldCam: Interactive Autoregressive 3D Gaming Worlds with Camera Pose as a Unifying Geometric Representation

Paper • 2603.16871 • Published Mar 17 • 61

upvoted an article 4 months ago

Article

Custom Kernels for All from Codex and Claude

burtenshaw, sayakpaul, ariG23498, evalstate

•

Feb 13

• 80

upvoted a paper 4 months ago

GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning

Paper • 2602.12099 • Published Feb 12 • 62

upvoted an article 5 months ago

Article

We Got Claude to Build CUDA Kernels and teach open models!

burtenshaw, evalstate, merve, pcuenq

•

Jan 28

• 158

upvoted 4 papers 5 months ago

SALAD: Achieve High-Sparsity Attention via Efficient Linear Attention Tuning for Video Diffusion Transformer

Paper • 2601.16515 • Published Jan 23 • 15

Your Group-Relative Advantage Is Biased

Paper • 2601.08521 • Published Jan 13 • 158

PhysRVG: Physics-Aware Unified Reinforcement Learning for Video Generative Models

Paper • 2601.11087 • Published Jan 16 • 11

STEP3-VL-10B Technical Report

Paper • 2601.09668 • Published Jan 14 • 196

upvoted 4 papers 6 months ago

VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice

Paper • 2601.05175 • Published Jan 8 • 37

Watching, Reasoning, and Searching: A Video Deep Research Benchmark on Open Web for Agentic Video Reasoning

Paper • 2601.06943 • Published Jan 11 • 215

Choreographing a World of Dynamic Objects

Paper • 2601.04194 • Published Jan 7 • 14

VINCIE: Unlocking In-context Image Editing from Video

Paper • 2506.10941 • Published Jun 12, 2025 • 5

Manan Shah

AI & ML interests

Recent Activity

Organizations

cs-mshah's activity

2. Attention Optimizations: From Standard Attention to FlashAttention

Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries

Custom Kernels for All from Codex and Claude

We Got Claude to Build CUDA Kernels and teach open models!