OceanPile: A Large-Scale Multimodal Ocean Corpus for Foundation Models Paper • 2605.00877 • Published 24 days ago • 15
RefineAnything: Multimodal Region-Specific Refinement for Perfect Local Details Paper • 2604.06870 • Published Apr 8 • 42
LightThinker++: From Reasoning Compression to Memory Management Paper • 2604.03679 • Published Apr 4 • 38
SkillX: Automatically Constructing Skill Knowledge Bases for Agents Paper • 2604.04804 • Published Apr 6 • 35
Boosting Few-shot Action Recognition with Graph-guided Hybrid Matching Paper • 2308.09346 • Published Aug 18, 2023
M2-CLIP: A Multimodal, Multi-task Adapting Framework for Video Action Recognition Paper • 2401.11649 • Published Jan 22, 2024 • 3
Synchronize Feature Extracting and Matching: A Single Branch Framework for 3D Object Tracking Paper • 2308.12549 • Published Aug 24, 2023
DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation Paper • 2403.19235 • Published Mar 28, 2024 • 1
Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing Paper • 2410.18756 • Published Oct 24, 2024
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves Paper • 2505.02831 • Published May 5, 2025 • 2
Lumina-mGPT 2.0: Stand-Alone AutoRegressive Image Modeling Paper • 2507.17801 • Published Jul 23, 2025 • 1
TriCLIP-3D: A Unified Parameter-Efficient Framework for Tri-Modal 3D Visual Grounding based on CLIP Paper • 2507.14904 • Published Jul 20, 2025
Deforming Videos to Masks: Flow Matching for Referring Video Segmentation Paper • 2510.06139 • Published Oct 7, 2025 • 3
Distribution Matching Distillation Meets Reinforcement Learning Paper • 2511.13649 • Published Nov 17, 2025 • 6
SRA 2: Variational Autoencoder Self-Representation Alignment for Efficient Diffusion Training Paper • 2601.17830 • Published Jan 25 • 1
SafePred: A Predictive Guardrail for Computer-Using Agents via World Models Paper • 2602.01725 • Published Feb 2 • 1
From Data to Behavior: Predicting Unintended Model Behaviors Before Training Paper • 2602.04735 • Published Feb 4 • 15
VisionTrim: Unified Vision Token Compression for Training-Free MLLM Acceleration Paper • 2601.22674 • Published Jan 30 • 5