-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
Collections
Discover the best community collections!
Collections including paper arxiv:2601.03193
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 43
-
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper • 2512.02472 • Published • 52 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 141 -
Video Reasoning without Training
Paper • 2510.17045 • Published • 7 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 272
-
facebook/w2v-bert-2.0
Feature Extraction • 0.6B • Updated • 1.68M • 200 -
facebook/metaclip-h14-fullcc2.5b
Zero-Shot Image Classification • 1.0B • Updated • 7.77k • 49 -
openai/clip-vit-large-patch14
Zero-Shot Image Classification • 0.4B • Updated • 6.75M • 1.95k -
Salesforce/blip-image-captioning-large
Image-to-Text • 0.5B • Updated • 1.33M • 1.44k
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Paper • 2512.24618 • Published • 138 -
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
Paper • 2512.24873 • Published • 101 -
AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents
Paper • 2512.23343 • Published • 27 -
Scaling Open-Ended Reasoning to Predict the Future
Paper • 2512.25070 • Published • 15
-
MIRA: Multimodal Iterative Reasoning Agent for Image Editing
Paper • 2511.21087 • Published • 10 -
Nested Browser-Use Learning for Agentic Information Seeking
Paper • 2512.23647 • Published • 17 -
KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs
Paper • 2601.01046 • Published • 12 -
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision
Paper • 2601.03193 • Published • 44
-
Can Large Language Models Understand Context?
Paper • 2402.00858 • Published • 23 -
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 85 -
Self-Rewarding Language Models
Paper • 2401.10020 • Published • 151 -
SemScore: Automated Evaluation of Instruction-Tuned LLMs based on Semantic Textual Similarity
Paper • 2401.17072 • Published • 25
-
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Paper • 2402.04252 • Published • 29 -
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models
Paper • 2402.03749 • Published • 14 -
ScreenAI: A Vision-Language Model for UI and Infographics Understanding
Paper • 2402.04615 • Published • 44 -
EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss
Paper • 2402.05008 • Published • 23
-
DocLLM: A layout-aware generative language model for multimodal document understanding
Paper • 2401.00908 • Published • 189 -
COSMO: COntrastive Streamlined MultimOdal Model with Interleaved Pre-Training
Paper • 2401.00849 • Published • 17 -
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Paper • 2311.05437 • Published • 51 -
LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing
Paper • 2311.00571 • Published • 43
-
Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models
Paper • 2512.24618 • Published • 138 -
Let It Flow: Agentic Crafting on Rock and Roll, Building the ROME Model within an Open Agentic Learning Ecosystem
Paper • 2512.24873 • Published • 101 -
AI Meets Brain: Memory Systems from Cognitive Neuroscience to Autonomous Agents
Paper • 2512.23343 • Published • 27 -
Scaling Open-Ended Reasoning to Predict the Future
Paper • 2512.25070 • Published • 15
-
Guided Self-Evolving LLMs with Minimal Human Supervision
Paper • 2512.02472 • Published • 52 -
DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search
Paper • 2509.25454 • Published • 141 -
Video Reasoning without Training
Paper • 2510.17045 • Published • 7 -
Agent Learning via Early Experience
Paper • 2510.08558 • Published • 272
-
MIRA: Multimodal Iterative Reasoning Agent for Image Editing
Paper • 2511.21087 • Published • 10 -
Nested Browser-Use Learning for Agentic Information Seeking
Paper • 2512.23647 • Published • 17 -
KV-Embedding: Training-free Text Embedding via Internal KV Re-routing in Decoder-only LLMs
Paper • 2601.01046 • Published • 12 -
UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision
Paper • 2601.03193 • Published • 44
-
facebook/w2v-bert-2.0
Feature Extraction • 0.6B • Updated • 1.68M • 200 -
facebook/metaclip-h14-fullcc2.5b
Zero-Shot Image Classification • 1.0B • Updated • 7.77k • 49 -
openai/clip-vit-large-patch14
Zero-Shot Image Classification • 0.4B • Updated • 6.75M • 1.95k -
Salesforce/blip-image-captioning-large
Image-to-Text • 0.5B • Updated • 1.33M • 1.44k