Gaurang Bharti

gbharti

·

https://gaurangbharti.netlify.app/

AI & ML interests

GPTs, Computer Vision, NLP

Recent Activity

upvoted a paper 5 days ago

Qwen-Image-2.0-RL Technical Report

upvoted a paper 5 days ago

Translation as a Bridging Action: Transferring Manipulation Skills from Humans to Robots

upvoted a paper 5 days ago

PhysisForcing: Physics Reinforced World Simulator for Robotic Manipulation

View all activity

Organizations

upvoted 3 papers 5 days ago

Qwen-Image-2.0-RL Technical Report

Paper • 2606.27608 • Published 10 days ago • 48

Translation as a Bridging Action: Transferring Manipulation Skills from Humans to Robots

Paper • 2606.28133 • Published 9 days ago • 39

PhysisForcing: Physics Reinforced World Simulator for Robotic Manipulation

Paper • 2606.28128 • Published 9 days ago • 50

upvoted a paper 8 days ago

Wan-Streamer v0.1: End-to-end Real-time Interactive Foundation Models

Paper • 2606.25041 • Published 12 days ago • 115

upvoted a paper 9 months ago

OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

Paper • 2510.10689 • Published Oct 12, 2025 • 46

upvoted 6 papers about 1 year ago

VideoPrism: A Foundational Visual Encoder for Video Understanding

Paper • 2402.13217 • Published Feb 20, 2024 • 41

Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations

Paper • 2506.18898 • Published Jun 23, 2025 • 35

Towards Understanding Camera Motions in Any Video

Paper • 2504.15376 • Published Apr 21, 2025 • 157

NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks

Paper • 2504.19854 • Published Apr 28, 2025 • 7

TesserAct: Learning 4D Embodied World Models

Paper • 2504.20995 • Published Apr 29, 2025 • 22

The Leaderboard Illusion

Paper • 2504.20879 • Published Apr 29, 2025 • 71

upvoted 3 papers almost 2 years ago

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 52

FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance

Paper • 2408.08189 • Published Aug 15, 2024 • 17

MusiConGen: Rhythm and Chord Control for Transformer-Based Text-to-Music Generation

Paper • 2407.15060 • Published Jul 21, 2024 • 9

upvoted a collection about 2 years ago

VILA: On Pre-training for Visual Language Models

10 items • Updated Mar 10 • 58

upvoted 5 papers over 2 years ago

YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

Paper • 2402.13616 • Published Feb 21, 2024 • 49

LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens

Paper • 2402.13753 • Published Feb 21, 2024 • 116

MusicRL: Aligning Music Generation to Human Preferences

Paper • 2402.04229 • Published Feb 6, 2024 • 17

Mixtral of Experts

Paper • 2401.04088 • Published Jan 8, 2024 • 162

EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision

Paper • 2311.02077 • Published Nov 3, 2023 • 15