Cerebras REAP Collection Sparse MoE models compressed using REAP (Router-weighted Expert Activation Pruning) method β’ 30 items β’ Updated Feb 25 β’ 139
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper β’ 2502.05171 β’ Published Feb 7, 2025 β’ 155
Jamba 1.5 Collection The AI21 Jamba family of models are state-of-the-art, hybrid SSM-Transformer instruction following foundation models β’ 2 items β’ Updated Mar 6, 2025 β’ 87
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper β’ 2402.17764 β’ Published Feb 27, 2024 β’ 629
view article Article Run the strongest open-source LLM model: Llama3 70B with just a single 4GB GPU! Apr 21, 2024 β’ 44
π Dataset comparison models Collection 1.8B models trained on 350BT to compare different pretraining datasets β’ 8 items β’ Updated Jun 12, 2024 β’ 42
OmniQuant: Omnidirectionally Calibrated Quantization for Large Language Models Paper β’ 2308.13137 β’ Published Aug 25, 2023 β’ 20
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers Paper β’ 2210.17323 β’ Published Oct 31, 2022 β’ 10
LoRA: Low-Rank Adaptation of Large Language Models Paper β’ 2106.09685 β’ Published Jun 17, 2021 β’ 60
AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning Paper β’ 2308.03526 β’ Published Aug 7, 2023 β’ 29
Retroformer: Retrospective Large Language Agents with Policy Gradient Optimization Paper β’ 2308.02151 β’ Published Aug 4, 2023 β’ 21