SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation Paper • 2305.17011 • Published May 26, 2023
GrootVL: Tree Topology is All You Need in State Space Model Paper • 2406.02395 • Published Jun 4, 2024 • 1
COVE: Unleashing the Diffusion Feature Correspondence for Consistent Video Editing Paper • 2406.08850 • Published Jun 13, 2024
HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding Paper • 2503.14694 • Published Mar 12, 2025
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO Paper • 2505.13031 • Published May 19, 2025 • 4
HaploOmni: Unified Single Transformer for Multimodal Video Understanding and Generation Paper • 2506.02975 • Published Jun 3, 2025
SEED-Story: Multimodal Long Story Generation with Large Language Model Paper • 2407.08683 • Published Jul 11, 2024 • 24
Defect Spectrum: A Granular Look of Large-Scale Defect Datasets with Rich Semantics Paper • 2310.17316 • Published Oct 26, 2023 • 1
Not All Steps are Created Equal: Selective Diffusion Distillation for Image Manipulation Paper • 2307.08448 • Published Jul 17, 2023