🤏 Smol-Data Collection Tried and tested mixes for strong pretraining. Inspired by https://huggingface.co/blog/codelion/optimal-dataset-mixing • 14 items • Updated 12 days ago • 12
pplx-embed Collection Diffusion-Pretrained Dense and Contextual Embeddings • 7 items • Updated 16 days ago • 87
DFlash Collection Block Diffusion for Flash Speculative Decoding • 11 items • Updated 4 days ago • 22
Cerebras REAP Collection Sparse MoE models compressed using REAP (Router-weighted Expert Activation Pruning) method • 30 items • Updated 17 days ago • 130
VTP Collection Towards Scalable Pre-training of Visual Tokenizers for Generation • 4 items • Updated 29 days ago • 42
Teacher Logits Collection Logits captured from large models to act as the teacher for distillation • 3 items • Updated Dec 15, 2025 • 11
Ministral 3 Collection Mistral Ministral 3: new multimodal models in Base, Instruct, and Reasoning variants, available in 3B, 8B, and 14B sizes. • 36 items • Updated 3 days ago • 30
Ministral 3 Collection A collection of edge models, with Base, Instruct and Reasoning variants, in 3 different sizes: 3B, 8B and 14B. All with vision capabilities. • 9 items • Updated Dec 2, 2025 • 158
Trinity Collection Collection of Arcee AI models in the Trinity family • 10 items • Updated 8 days ago • 26
Olmo 3 Pre-training Collection All artifacts related to Olmo 3 pre-training • 10 items • Updated Dec 23, 2025 • 34