LatentThinkingPKU

community

AI & ML interests

None defined yet.

Recent Activity

DogNeverSleep authored a paper about 5 hours ago

Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization

DogNeverSleep submitted a paper about 10 hours ago

Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization

THUdyh authored a paper about 1 month ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

View all activity

authored a paper about 5 hours ago

Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization

Paper • 2605.10780 • Published 1 day ago • 24

submitted a paper to Daily Papers about 10 hours ago

Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenization

Paper • 2605.10780 • Published 1 day ago • 24

authored a paper about 1 month ago

Video-MME-v2: Towards the Next Stage in Benchmarks for Comprehensive Video Understanding

Paper • 2604.05015 • Published Apr 6 • 235

authored a paper about 1 month ago

Agentic-MME: What Agentic Capability Really Brings to Multimodal Intelligence?

Paper • 2604.03016 • Published Apr 3 • 37

authored a paper about 1 month ago

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

Paper • 2604.04901 • Published Apr 6 • 40

authored a paper about 1 month ago

OpenWorldLib: A Unified Codebase and Definition of Advanced World Models

Paper • 2604.04707 • Published Apr 6 • 203

authored a paper about 1 month ago

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Paper • 2603.26653 • Published Mar 27 • 18

submitted a paper to Daily Papers about 1 month ago

PerceptionComp: A Video Benchmark for Complex Perception-Centric Reasoning

Paper • 2603.26653 • Published Mar 27 • 18

authored a paper about 2 months ago

Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2603.18118 • Published Mar 18 • 12

submitted a paper to Daily Papers about 2 months ago

Insight-V++: Towards Advanced Long-Chain Visual Reasoning with Multimodal Large Language Models

Paper • 2603.18118 • Published Mar 18 • 12

authored a paper about 2 months ago

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

Paper • 2603.15030 • Published Mar 16 • 21

authored a paper about 2 months ago

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

Paper • 2603.15030 • Published Mar 16 • 21

submitted a paper to Daily Papers about 2 months ago

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

Paper • 2603.15030 • Published Mar 16 • 21

authored 5 papers 3 months ago

BrowseComp-$V^3$: A Visual, Vertical, and Verifiable Benchmark for Multimodal Browsing Agents

Paper • 2602.12876 • Published Feb 13 • 14

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

Paper • 2602.04804 • Published Feb 4 • 50

Research on World Models Is Not Merely Injecting World Knowledge into Specific Tasks

Paper • 2602.01630 • Published Feb 2 • 50

DiaDem: Advancing Dialogue Descriptions in Audiovisual Video Captioning for Multimodal Large Language Models

Paper • 2601.19267 • Published Jan 27

The Unseen Bias: How Norm Discrepancy in Pre-Norm MLLMs Leads to Visual Information Loss

Paper • 2512.08374 • Published Dec 9, 2025

authored a paper 3 months ago

Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

Paper • 2602.08439 • Published Feb 9 • 28

submitted a paper to Daily Papers 3 months ago

Demo-ICL: In-Context Learning for Procedural Video Knowledge Acquisition

Paper • 2602.08439 • Published Feb 9 • 28