Jcdbzh9olj

jcdbzh9olj

·

AI & ML interests

None yet

Recent Activity

liked a dataset about 2 hours ago

dongdong123123/rollout_act_pick_and_place_20260730_120835

upvoted a paper 6 days ago

Multi-Turn On-Policy Distillation with Prefix Replay

liked a dataset 10 days ago

insagur/ndbdcbf91a673

View all activity

Organizations

None yet

upvoted a paper 6 days ago

Multi-Turn On-Policy Distillation with Prefix Replay

Paper • 2607.04763 • Published 14 days ago • 12

upvoted a paper 23 days ago

CONFLUX: A Latent Diffusion Model for 3D Chest-CT Synthesis with RL Post-Training

Paper • 2607.02998 • Published 23 days ago • 6

upvoted a paper about 1 month ago

Looped World Models

Paper • 2606.18208 • Published Jun 16 • 483

upvoted a paper about 2 months ago

Seeing Isn't Knowing: Do VLMs Know When Not to Answer Spatial Questions (and Why)?

Paper • 2605.30557 • Published May 28 • 12

upvoted 2 papers 2 months ago

OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under optimal Squared error quantization

Paper • 2605.21226 • Published May 20 • 9

CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence

Paper • 2605.12882 • Published May 13 • 274

upvoted 3 papers 3 months ago

OpenSeeker-v2: Pushing the Limits of Search Agents with Informative and High-Difficulty Trajectories

Paper • 2605.04036 • Published May 5 • 73

Why Fine-Tuning Encourages Hallucinations and How to Fix It

Paper • 2604.15574 • Published Apr 16 • 26

Diverse Dictionary Learning

Paper • 2604.17568 • Published Apr 19 • 3

upvoted 8 papers 4 months ago

RewardFlow: Generate Images by Optimizing What You Reward

Paper • 2604.08536 • Published Apr 9 • 6

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

Paper • 2604.04707 • Published Apr 6 • 203

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

Paper • 2604.02268 • Published Apr 2 • 103

GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning

Paper • 2604.02721 • Published Apr 3 • 639

CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence

Paper • 2603.28032 • Published Mar 30 • 344

ClawKeeper: Comprehensive Safety Protection for OpenClaw Agents Through Skills, Plugins, and Watchers

Paper • 2603.24414 • Published Mar 25 • 183

FIPO: Eliciting Deep Reasoning with Future-KL Influenced Policy Optimization

Paper • 2603.19835 • Published Mar 20 • 353

SQuTR: A Robustness Benchmark for Spoken Query to Text Retrieval under Acoustic Noise

Paper • 2602.12783 • Published Feb 13 • 246

upvoted 3 papers 5 months ago

Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning

Paper • 2603.04597 • Published Mar 4 • 211

A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 526

Does Your Reasoning Model Implicitly Know When to Stop Thinking?

Paper • 2602.08354 • Published Feb 9 • 267