On Vacation 🏝️

20 37 15

Delin Qu

delinqu

https://delinqu.github.io/

AI & ML interests

Embodied AI, 3D Vision

Recent Activity

published a dataset 24 days ago

delinqu/libero_plus_lerobot

liked a model 25 days ago

google/gemma-4-E2B

new activity about 1 month ago

robbyant/robotwin-clean-and-aug-lerobot:Some MP4s fail full decode mid-stream (example: episode_000003 cam_left_wrist)

View all activity

Organizations

upvoted an article 3 months ago

Article

Codex is Open Sourcing AI models

Dec 11, 2025

•

upvoted 3 papers 4 months ago

DrivingGen: A Comprehensive Benchmark for Generative Video World Models in Autonomous Driving

Paper • 2601.01528 • Published Jan 4 • 19

LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 178

LoGoPlanner: Localization Grounded Navigation Policy with Metric-aware Visual Geometry

Paper • 2512.19629 • Published Dec 22, 2025 • 26

upvoted a paper 5 months ago

Towards Scalable Pre-training of Visual Tokenizers for Generation

Paper • 2512.13687 • Published Dec 15, 2025 • 106

upvoted an article 5 months ago

Article

Improving Hugging Face Training Efficiency Through Packing with Flash Attention 2

Aug 21, 2024

•

upvoted 5 papers 5 months ago

Openpi Comet: Competition Solution For 2025 BEHAVIOR Challenge

Paper • 2512.10071 • Published Dec 10, 2025 • 18

Are We Ready for RL in Text-to-3D Generation? A Progressive Investigation

Paper • 2512.10949 • Published Dec 11, 2025 • 47

PaperDebugger: A Plugin-Based Multi-Agent System for In-Editor Academic Writing, Review, and Editing

Paper • 2512.02589 • Published Dec 2, 2025 • 73

Reward Forcing: Efficient Streaming Video Generation with Rewarded Distribution Matching Distillation

Paper • 2512.04678 • Published Dec 4, 2025 • 42

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach

Paper • 2512.02834 • Published Dec 2, 2025 • 41

upvoted 3 papers 8 months ago

Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors

Paper • 2505.24625 • Published May 30, 2025 • 9

F1: A Vision-Language-Action Model Bridging Understanding and Generation to Actions

Paper • 2509.06951 • Published Sep 8, 2025 • 33

EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control

Paper • 2508.21112 • Published Aug 28, 2025 • 78

upvoted a collection 8 months ago

EO-Robotics

Collection

EmbodiedOneVision is a unified framework for multimodal embodied reasoning and robot control, featuring interleaved vision-text-action pretraining. • 7 items • Updated Mar 2 • 8

upvoted 3 papers 9 months ago

upvoted a paper 10 months ago

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10, 2025 • 162

upvoted a collection 10 months ago

Libero Benchmark Dataset