kaizuberbuehler 's Collections Foundation Models
updated
OLMo: Accelerating the Science of Language Models
Paper
• 2402.00838
• Published
• 85
Gemini 1.5: Unlocking multimodal understanding across millions of tokens
of context
Paper
• 2403.05530
• Published
• 65
StarCoder: may the source be with you!
Paper
• 2305.06161
• Published
• 33
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective
Depth Up-Scaling
Paper
• 2312.15166
• Published
• 61
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language
Models
Paper
• 2404.12387
• Published
• 40
RecurrentGemma: Moving Past Transformers for Efficient Open Language
Models
Paper
• 2404.07839
• Published
• 48
JetMoE: Reaching Llama2 Performance with 0.1M Dollars
Paper
• 2404.07413
• Published
• 38
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model
Handling Resolutions from 336 Pixels to 4K HD
Paper
• 2404.06512
• Published
• 30
Eagle and Finch: RWKV with Matrix-Valued States and Dynamic Recurrence
Paper
• 2404.05892
• Published
• 40
MiniCPM: Unveiling the Potential of Small Language Models with Scalable
Training Strategies
Paper
• 2404.06395
• Published
• 24
YaART: Yet Another ART Rendering Technology
Paper
• 2404.05666
• Published
• 18
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with
Interleaved Visual-Textual Tokens
Paper
• 2404.03413
• Published
• 27
Advancing LLM Reasoning Generalists with Preference Trees
Paper
• 2404.02078
• Published
• 46
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your
Phone
Paper
• 2404.14219
• Published
• 259
CogVLM: Visual Expert for Pretrained Language Models
Paper
• 2311.03079
• Published
• 27
OpenELM: An Efficient Language Model Family with Open-source Training
and Inference Framework
Paper
• 2404.14619
• Published
• 126
Pegasus-v1 Technical Report
Paper
• 2404.14687
• Published
• 33
Jamba: A Hybrid Transformer-Mamba Language Model
Paper
• 2403.19887
• Published
• 112
Tele-FLM Technical Report
Paper
• 2404.16645
• Published
• 18
What matters when building vision-language models?
Paper
• 2405.02246
• Published
• 103
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Paper
• 2405.12107
• Published
• 29
Paper
• 2406.09414
• Published
• 103
OpenVLA: An Open-Source Vision-Language-Action Model
Paper
• 2406.09246
• Published
• 43
Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual
Visual Text Rendering
Paper
• 2406.10208
• Published
• 22
GEB-1.3B: Open Lightweight Large Language Model
Paper
• 2406.09900
• Published
• 21
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code
Intelligence
Paper
• 2406.11931
• Published
• 69
The Llama 3 Herd of Models
Paper
• 2407.21783
• Published
• 117
SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and
Illumination Disentanglement
Paper
• 2408.00653
• Published
• 31
Gemma 2: Improving Open Language Models at a Practical Size
Paper
• 2408.00118
• Published
• 78
Paper
• 2408.07009
• Published
• 62
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Paper
• 2408.12570
• Published
• 32
OLMoE: Open Mixture-of-Experts Language Models
Paper
• 2409.02060
• Published
• 80
Paper
• 2409.00587
• Published
• 33
Qwen2.5-Coder Technical Report
Paper
• 2409.12186
• Published
• 153
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at
Any Resolution
Paper
• 2409.12191
• Published
• 79
NVLM: Open Frontier-Class Multimodal LLMs
Paper
• 2409.11402
• Published
• 74
Molmo and PixMo: Open Weights and Open Data for State-of-the-Art
Multimodal Models
Paper
• 2409.17146
• Published
• 121
Making Text Embedders Few-Shot Learners
Paper
• 2409.15700
• Published
• 29
EuroLLM: Multilingual Language Models for Europe
Paper
• 2409.16235
• Published
• 29
stabilityai/stable-diffusion-3.5-large
Text-to-Image
• Updated
• 60.4k
• • 3.4k
Paper
• 2412.16720
• Published
• 37
NVILA: Efficient Frontier Visual Language Models
Paper
• 2412.04468
• Published
• 60
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper
• 2412.03555
• Published
• 133
Open-Sora Plan: Open-Source Large Video Generation Model
Paper
• 2412.00131
• Published
• 33
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
Paper
• 2411.15124
• Published
• 67
Cosmos World Foundation Model Platform for Physical AI
Paper
• 2501.03575
• Published
• 82
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
• 2501.08313
• Published
• 300
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Paper
• 2501.06282
• Published
• 53
Text-to-Speech
• Updated
• 8.75M
• • 5.82k
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video
Understanding
Paper
• 2501.13106
• Published
• 90
Qwen2.5-1M Technical Report
Paper
• 2501.15383
• Published
• 72
Baichuan-Omni-1.5 Technical Report
Paper
• 2501.15368
• Published
• 60
Atla Selene Mini: A General Purpose Evaluation Model
Paper
• 2501.17195
• Published
• 35
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
• 2502.02737
• Published
• 256
Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive
Modality Alignment
Paper
• 2502.04328
• Published
• 29
Exploring the Limit of Outcome Reward for Learning Mathematical
Reasoning
Paper
• 2502.06781
• Published
• 58
NatureLM: Deciphering the Language of Nature for Scientific Discovery
Paper
• 2502.07527
• Published
• 20
CoSER: Coordinating LLM-Based Persona Simulation of Established Roles
Paper
• 2502.09082
• Published
• 32
mmE5: Improving Multimodal Multilingual Embeddings via High-quality
Synthetic Data
Paper
• 2502.08468
• Published
• 16
Qwen2.5-VL Technical Report
Paper
• 2502.13923
• Published
• 215
Magma: A Foundation Model for Multimodal AI Agents
Paper
• 2502.13130
• Published
• 58
YuE: Scaling Open Foundation Models for Long-Form Music Generation
Paper
• 2503.08638
• Published
• 72
Gemini Embedding: Generalizable Embeddings from Gemini
Paper
• 2503.07891
• Published
• 46
Light-R1: Curriculum SFT, DPO and RL for Long COT from Scratch and
Beyond
Paper
• 2503.10460
• Published
• 30
RWKV-7 "Goose" with Expressive Dynamic State Evolution
Paper
• 2503.14456
• Published
• 153
Qwen2.5-Omni Technical Report
Paper
• 2503.20215
• Published
• 170
Wan: Open and Advanced Large-Scale Video Generative Models
Paper
• 2503.20314
• Published
• 60
Paper
• 2503.19786
• Published
• 55
Command A: An Enterprise-Ready Large Language Model
Paper
• 2504.00698
• Published
• 29
SmolVLM: Redefining small and efficient multimodal models
Paper
• 2504.05299
• Published
• 205
Paper
• 2504.07491
• Published
• 137
Skywork R1V: Pioneering Multimodal Reasoning with Chain-of-Thought
Paper
• 2504.05599
• Published
• 85
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Paper
• 2504.08685
• Published
• 130
BitNet b1.58 2B4T Technical Report
Paper
• 2504.12285
• Published
• 83
PerceptionLM: Open-Access Data and Models for Detailed Visual
Understanding
Paper
• 2504.13180
• Published
• 20
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Paper
• 2503.14734
• Published
• 6