HariharaIII 's Collections Papers
updated
VAPO: Efficient and Reliable Reinforcement Learning for Advanced
Reasoning Tasks
Paper
• 2504.05118
• Published • 26
T1: Tool-integrated Self-verification for Test-time Compute Scaling in
Small Language Models
Paper
• 2504.04718
• Published • 43
SynWorld: Virtual Scenario Synthesis for Agentic Action Knowledge
Refinement
Paper
• 2504.03561
• Published • 18
Concept Lancet: Image Editing with Compositional Representation
Transplant
Paper
• 2504.02828
• Published • 16
ShieldAgent: Shielding Agents via Verifiable Safety Policy Reasoning
Paper
• 2503.22738
• Published • 17
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated
Agent-Human Interplay
Paper
• 2504.03601
• Published • 18
LiveVQA: Live Visual Knowledge Seeking
Paper
• 2504.05288
• Published • 15
Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning
(v1)
Paper
• 2504.03151
• Published • 15
Generative Evaluation of Complex Reasoning in Large Language Models
Paper
• 2504.02810
• Published • 14
Tuning-Free Image Editing with Fidelity and Editability via Unified
Latent Diffusion Model
Paper
• 2504.05594
• Published • 11
MedSAM2: Segment Anything in 3D Medical Images and Videos
Paper
• 2504.03600
• Published • 10
DiaTool-DPO: Multi-Turn Direct Preference Optimization for
Tool-Augmented Large Language Models
Paper
• 2504.02882
• Published • 7
Efficient Reinforcement Finetuning via Adaptive Curriculum Learning
Paper
• 2504.05520
• Published • 11
3D Scene Understanding Through Local Random Access Sequence Modeling
Paper
• 2504.03875
• Published • 6
Distillation and Refinement of Reasoning in Small Language Models for
Document Re-ranking
Paper
• 2504.03947
• Published • 4
JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language
Model
Paper
• 2504.03770
• Published • 2
SkillWeaver: Web Agents can Self-Improve by Discovering and Honing
Skills
Paper
• 2504.07079
• Published • 12
Rethinking Reflection in Pre-Training
Paper
• 2504.04022
• Published • 80
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
• 2503.24290
• Published • 62
Understanding R1-Zero-Like Training: A Critical Perspective
Paper
• 2503.20783
• Published • 59
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist
Policy
Paper
• 2503.24388
• Published • 29
Agentic Knowledgeable Self-awareness
Paper
• 2504.03553
• Published • 27
Landscape of Thoughts: Visualizing the Reasoning Process of Large
Language Models
Paper
• 2503.22165
• Published • 28
Agent S2: A Compositional Generalist-Specialist Framework for Computer
Use Agents
Paper
• 2504.00906
• Published • 27
Effectively Controlling Reasoning Models through Thinking Intervention
Paper
• 2503.24370
• Published • 19
Expanding RL with Verifiable Rewards Across Diverse Domains
Paper
• 2503.23829
• Published • 24
Harnessing the Reasoning Economy: A Survey of Efficient Reasoning for
Large Language Models
Paper
• 2503.24377
• Published • 18
ActionStudio: A Lightweight Framework for Data and Training of Large
Action Models
Paper
• 2503.22673
• Published • 12
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for
Zero-Shot Speech Synthesis
Paper
• 2502.18924
• Published • 16
Interpreting Emergent Planning in Model-Free Reinforcement Learning
Paper
• 2504.01871
• Published • 12
START: Self-taught Reasoner with Tools
Paper
• 2503.04625
• Published • 113
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper
• 2503.04130
• Published • 96
SmolDocling: An ultra-compact vision-language model for end-to-end
multi-modal document conversion
Paper
• 2503.11576
• Published • 156
SmolVLM: Redefining small and efficient multimodal models
Paper
• 2504.05299
• Published • 207
UniF^2ace: Fine-grained Face Understanding and Generation
with Unified Multimodal Models
Paper
• 2503.08120
• Published • 31
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Paper
• 2503.12605
• Published • 35
CoRe^2: Collect, Reflect and Refine to Generate Better and Faster
Paper
• 2503.09662
• Published • 33
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Paper
• 2503.10291
• Published • 36
InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity
Paper
• 2503.16418
• Published • 36
Modifying Large Language Model Post-Training for Diverse Creative
Writing
Paper
• 2503.17126
• Published • 36
Think Before Recommend: Unleashing the Latent Reasoning Power for
Sequential Recommendation
Paper
• 2503.22675
• Published • 36
MagicInfinite: Generating Infinite Talking Videos with Your Words and
Voice
Paper
• 2503.05978
• Published • 36
API Agents vs. GUI Agents: Divergence and Convergence
Paper
• 2503.11069
• Published • 36
Cognitive Behaviors that Enable Self-Improving Reasoners, or, Four
Habits of Highly Effective STaRs
Paper
• 2503.01307
• Published • 38
JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play
Visual Games with Keyboards and Mouse
Paper
• 2503.16365
• Published • 41
DeepSolution: Boosting Complex Engineering Solution Design via
Tree-based Exploration and Bi-point Thinking
Paper
• 2502.20730
• Published • 38
Process-based Self-Rewarding Language Models
Paper
• 2503.03746
• Published • 39
A Survey of Efficient Reasoning for Large Reasoning Models: Language,
Multimodality, and Beyond
Paper
• 2503.21614
• Published • 43
EgoLife: Towards Egocentric Life Assistant
Paper
• 2503.03803
• Published • 46