Advait Gupta's picture

Advait Gupta

advaitgupta

·

advaitgupta

AI & ML interests

None yet

Recent Activity

upvoted a paper 4 days ago

Multi-Turn Reflective Masking Elicits Reasoning in Mask Diffusion Models

updated a dataset 8 days ago

AgentVQA/Agent_VQA_Manual

upvoted a paper 8 days ago

Guava: An Effective and Universal Harness for Embodied Manipulation

View all activity

Organizations

upvoted a paper 4 days ago

Multi-Turn Reflective Masking Elicits Reasoning in Mask Diffusion Models

Paper • 2606.16700 • Published 12 days ago • 13

upvoted a paper 8 days ago

Guava: An Effective and Universal Harness for Embodied Manipulation

Paper • 2606.18363 • Published 11 days ago • 28

upvoted a paper 9 days ago

Self-Evolving Visual Questioner

Paper • 2606.13929 • Published 16 days ago • 15

upvoted a paper 11 days ago

Skip a Layer or Loop It? Learning Program-of-Layers in LLMs

Paper • 2606.06574 • Published 23 days ago • 24

upvoted a paper 4 months ago

What does RL improve for Visual Reasoning? A Frankenstein-Style Analysis

Paper • 2602.12395 • Published Feb 12 • 17

upvoted 6 papers 12 months ago

Skip a Layer or Loop it? Test-Time Depth Adaptation of Pretrained LLMs

Paper • 2507.07996 • Published Jul 10, 2025 • 35

BlenderFusion: 3D-Grounded Visual Editing and Generative Compositing

Paper • 2506.17450 • Published Jun 20, 2025 • 64

Fine-Grained Preference Optimization Improves Spatial Reasoning in VLMs

Paper • 2506.21656 • Published Jun 26, 2025 • 16

LLaVA-Scissor: Token Compression with Semantic Connected Components for Video LLMs

Paper • 2506.21862 • Published Jun 27, 2025 • 36

Where to find Grokking in LLM Pretraining? Monitor Memorization-to-Generalization without Test

Paper • 2506.21551 • Published Jun 26, 2025 • 28

FaSTA^*: Fast-Slow Toolpath Agent with Subroutine Mining for Efficient Multi-turn Image Editing

Paper • 2506.20911 • Published Jun 26, 2025 • 41

upvoted 8 papers about 1 year ago

Wait, We Don't Need to "Wait"! Removing Thinking Tokens Improves Reasoning Efficiency

Paper • 2506.08343 • Published Jun 10, 2025 • 54

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Paper • 2505.09568 • Published May 14, 2025 • 100

WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents

Paper • 2504.15785 • Published Apr 22, 2025 • 22

ColorBench: Can VLMs See and Understand the Colorful World? A Comprehensive Benchmark for Color Perception, Reasoning, and Robustness

Paper • 2504.10514 • Published Apr 10, 2025 • 48

Towards Visual Text Grounding of Multimodal Large Language Model

Paper • 2504.04974 • Published Apr 7, 2025 • 18

C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing

Paper • 2504.07964 • Published Apr 10, 2025 • 62

Transformers without Normalization

Paper • 2503.10622 • Published Mar 13, 2025 • 172

Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?

Paper • 2504.06514 • Published Apr 9, 2025 • 39

upvoted a paper over 1 year ago

RoboFactory: Exploring Embodied Agent Collaboration with Compositional Constraints

Paper • 2503.16408 • Published Mar 20, 2025 • 42