Deepseek Papers Collection Deepseek papers collection β’ 32 items β’ Updated about 10 hours ago β’ 353
MiniMax-M1 Collection MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. β’ 6 items β’ Updated Apr 15 β’ 119
MedGemma Release Collection Collection of Gemma 3 variants for performance on medical text and image comprehension to accelerate building healthcare-based AI applications. β’ 9 items β’ Updated Mar 12 β’ 509
Qwen2.5-Omni Collection End-to-End Omni (text, audio, image, video, and natural speech interaction) model based Qwen2.5 β’ 6 items β’ Updated Mar 2 β’ 168
view article Article πΊπ¦ββ¬ LLM Comparison/Test: 25 SOTA LLMs (including QwQ) through 59 MMLU-Pro CS benchmark runs wolfram β’ Dec 4, 2024 β’ 80
view article Article Bridging the Gap Between Physical Numerical Simulations and Machine Learning: Introducing The Well rubenohana β’ Dec 2, 2024 β’ 19
Llama-3.1-Nemotron-70B Collection SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. β’ 6 items β’ Updated 18 days ago β’ 156
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation Paper β’ 2408.12528 β’ Published Aug 22, 2024 β’ 51
Jamba 1.5 Collection The AI21 Jamba family of models are state-of-the-art, hybrid SSM-Transformer instruction following foundation models β’ 2 items β’ Updated Mar 6, 2025 β’ 87
view article Article Llama 3.1 - 405B, 70B & 8B with multilinguality and long context +6 philschmid, osanseviero, alvarobartt, lvwerra, dvilasuero, reach-vb, marcsun13, pcuenq β’ Jul 23, 2024 β’ 241
VITA: Towards Open-Source Interactive Omni Multimodal LLM Paper β’ 2408.05211 β’ Published Aug 9, 2024 β’ 50