8 16 10

Hao Fei

scofield7419

http://haofei.vip/

AI & ML interests

Multimodal Learning, Large Language Model, Vision and Language, Natural Language Processing, Structural Modeling

Recent Activity

upvoted a paper 17 days ago

SOAR: Self-Correction for Optimal Alignment and Refinement in Diffusion Models

authored a paper about 2 months ago

Towards Semantic Equivalence of Tokenization in Multimodal LLM

authored a paper about 2 months ago

So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection

View all activity

Organizations

authored 11 papers about 2 months ago

Towards Semantic Equivalence of Tokenization in Multimodal LLM

Paper • 2406.05127 • Published Jun 7, 2024

So-Fake: Benchmarking and Explaining Social Media Image Forgery Detection

Paper • 2505.18660 • Published May 24, 2025 • 2

Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models

Paper • 2505.24164 • Published May 30, 2025

SMAP: Self-supervised Motion Adaptation for Physically Plausible Humanoid Whole-body Control

Paper • 2505.19463 • Published May 26, 2025

MCM-DPO: Multifaceted Cross-Modal Direct Preference Optimization for Alt-text Generation

Paper • 2510.00647 • Published Oct 1, 2025

DragNeXt: Rethinking Drag-Based Image Editing

Paper • 2506.07611 • Published Jun 9, 2025 • 1

A Reason-then-Describe Instruction Interpreter for Controllable Video Generation

Paper • 2511.20563 • Published Nov 25, 2025 • 1

Visual Thoughts: A Unified Perspective of Understanding Multimodal Chain-of-Thought

Paper • 2505.15510 • Published May 21, 2025

authored a paper 4 months ago

JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation

Paper • 2512.22905 • Published Dec 28, 2025 • 20

authored 4 papers 6 months ago

UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist

Paper • 2511.08521 • Published Nov 11, 2025 • 39

Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding

Paper • 2509.11866 • Published Sep 15, 2025 • 2

MuSLR: Multimodal Symbolic Logical Reasoning

Paper • 2509.25851 • Published Sep 30, 2025 • 12

VimoRAG: Video-based Retrieval-augmented 3D Motion Generation for Motion Language Models

Paper • 2508.12081 • Published Aug 16, 2025

authored a paper 12 months ago

On Path to Multimodal Generalist: General-Level and General-Bench

Paper • 2505.04620 • Published May 7, 2025 • 83

authored 3 papers about 1 year ago

VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models

Paper • 2504.13122 • Published Apr 17, 2025 • 20

JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization

Paper • 2503.23377 • Published Mar 30, 2025 • 57

Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation

Paper • 2503.24379 • Published Mar 31, 2025 • 76

Hao Fei

AI & ML interests

Recent Activity

Organizations

scofield7419's activity