19 49 63

xiangan

https://anxiangsir.github.io/

anxiangsir

AI & ML interests

None yet

Recent Activity

updated a model 1 day ago

lmms-lab-encoder/onevision-encoder-large-lang-tf57

updated a model 1 day ago

lmms-lab-encoder/onevision-encoder-large-tf57

updated a collection 2 days ago

onevision-encoder

View all activity

Organizations

upvoted a collection 2 days ago

LLaVA-OneVision-2

Collection

1 item • Updated 2 days ago • 1

upvoted a paper about 1 month ago

FileGram: Grounding Agent Personalization in File-System Behavioral Traces

Paper • 2604.04901 • Published Apr 6 • 40

upvoted 2 papers 2 months ago

LLaDA-o: An Effective and Length-Adaptive Omni Diffusion Model

Paper • 2603.01068 • Published Mar 1 • 22

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Paper • 2510.14979 • Published Oct 16, 2025 • 69

upvoted an article 2 months ago

Article

NEO-unify: Building Native Multimodal Unified Models End to End

Mar 5

•

155

upvoted a paper 2 months ago

UniG2U-Bench: Do Unified Models Advance Multimodal Understanding?

Paper • 2603.03241 • Published Mar 3 • 87

upvoted a changelog 2 months ago

Hugging Face Changelog

Public Storage Add-ons

Feb 26

• 169

upvoted a collection 3 months ago

onevision-encoder

Collection

4 items • Updated 2 days ago • 6

upvoted 5 papers 3 months ago

UniT: Unified Multimodal Chain-of-Thought Test-time Scaling

Paper • 2602.12279 • Published Feb 12 • 20

CoPE-VideoLM: Codec Primitives For Efficient Video Language Models

Paper • 2602.13191 • Published Feb 13 • 31

OneVision-Encoder: Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence

Paper • 2602.08683 • Published Feb 9 • 52

GigaBrain-0.5M*: a VLA That Learns From World Model-Based Reinforcement Learning

Paper • 2602.12099 • Published Feb 12 • 62

Innovator-VL: A Multimodal Large Language Model for Scientific Discovery

Paper • 2601.19325 • Published Jan 27 • 81

upvoted 2 papers 4 months ago

Molmo2: Open Weights and Data for Vision-Language Models with Video Understanding and Grounding

Paper • 2601.10611 • Published Jan 15 • 34

DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset

Paper • 2601.10305 • Published Jan 15 • 36

upvoted a collection 4 months ago

OneVision-Encoder

Collection

HEVC-Style Vision Transformer • 2 items • Updated Feb 10 • 3

upvoted an article 5 months ago

Article

Transformers v5: Simple model definitions powering the AI ecosystem

Dec 1, 2025

•

310

upvoted 2 papers 5 months ago

SwiftVLA: Unlocking Spatiotemporal Dynamics for Lightweight VLA Models at Minimal Overhead

Paper • 2512.00903 • Published Nov 30, 2025 • 7

LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling

Paper • 2511.20785 • Published Nov 25, 2025 • 188

upvoted a paper 6 months ago

OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe

Paper • 2511.16334 • Published Nov 20, 2025 • 96

xiangan

AI & ML interests

Recent Activity

Organizations

xiangan's activity

NEO-unify: Building Native Multimodal Unified Models End to End

Public Storage Add-ons

Transformers v5: Simple model definitions powering the AI ecosystem