yiyexy's picture

yiyexy

yiyexy

·

yiyexy

AI & ML interests

None yet

Recent Activity

liked a model 17 days ago

deepseek-ai/DeepSeek-V4-Pro

upvoted a paper about 1 month ago

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

upvoted a paper about 2 months ago

Demystifing Video Reasoning

View all activity

Organizations

upvoted a paper about 1 month ago

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

Paper • 2604.04901 • Published Apr 6 • 40

upvoted a paper about 2 months ago

Demystifing Video Reasoning

Paper • 2603.16870 • Published Mar 17 • 371

upvoted an article 2 months ago

Article

NEO-unify: Building Native Multimodal Unified Models End to End

Mar 5

•

157

upvoted a paper 2 months ago

UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?

Paper • 2603.03241 • Published Mar 3 • 87

upvoted a paper 3 months ago

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

Paper • 2602.08683 • Published Feb 9 • 52

upvoted a paper 4 months ago

DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset

Paper • 2601.10305 • Published Jan 15 • 36

upvoted a paper 5 months ago

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

Paper • 2511.20785 • Published Nov 25, 2025 • 188

upvoted 2 papers 6 months ago

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20, 2025 • 96

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5, 2025 • 132

upvoted 3 papers 7 months ago

ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder

Paper • 2510.18795 • Published Oct 21, 2025 • 11

UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning

Paper • 2510.13515 • Published Oct 15, 2025 • 12

LLaVA-OneVision-1.5: Fully Open Framework for Democratized Multimodal Training

Paper • 2509.23661 • Published Sep 28, 2025 • 49

upvoted 2 collections 8 months ago

LLaVA-OneVision-1.5

https://github.com/EvolvingLMMs-Lab/LLaVA-OneVision-1.5 • 9 items • Updated Oct 21, 2025 • 19

LLaVA-OneVision

a model good at arbitrary types of visual input • 17 items • Updated Sep 17, 2025 • 32

upvoted a collection 9 months ago

Qwen3

84 items • Updated Dec 31, 2025 • 1.77k

upvoted 2 papers 10 months ago

Region-based Cluster Discrimination for Visual Representation Learning

Paper • 2507.20025 • Published Jul 26, 2025 • 20

HOComp: Interaction-Aware Human-Object Composition

Paper • 2507.16813 • Published Jul 22, 2025 • 12

upvoted 2 papers about 1 year ago

Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs

Paper • 2504.17432 • Published Apr 24, 2025 • 41

Decoupled Global-Local Alignment for Improving Compositional Understanding

Paper • 2504.16801 • Published Apr 23, 2025 • 14

upvoted a collection about 1 year ago

MLCD

Large-Scale Visual Representation Model • 6 items • Updated Mar 2 • 11