view article Article Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries +6 2 days ago • 22
🤏 Smol-Data Collection Tried and tested mixes for strong pretraining. Inspired by https://huggingface.co/blog/codelion/optimal-dataset-mixing • 14 items • Updated 10 days ago • 12
Finance Commons Collection A large collection of multimodal financial documents in open data. • 7 items • Updated Jul 17, 2024 • 13
pplx-embed Collection Diffusion-Pretrained Dense and Contextual Embeddings • 7 items • Updated 14 days ago • 87
view article Article Did GPT 5.2 make a breakthrough discovery in theoretical physics? 20 days ago • 60
view article Article Follow the White Rabbit: Using Embeddings So You Never Get Lost in Translation 17 days ago • 8
view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 20 days ago • 480
view article Article Compute and Competition in AI: Different FlOPs for Different Folks 28 days ago • 12
Olmix: A Framework for Data Mixing Throughout LM Development Paper • 2602.12237 • Published 27 days ago • 2
view article Article Building a Mood-Based Movie Recommendation Engine with Voyage-4-nano, Hugging Face, and MongoDB Atlas Vector Search Feb 8 • 4
view article Article Introducing Daggr: Chain apps programmatically, inspect visually +3 Jan 29 • 103
compar:IA: The French Government's LLM arena to collect French-language human prompts and preference data Paper • 2602.06669 • Published Feb 6 • 7