view article Article Unlocking asynchronicity in continuous batching +1 ror, pcuenq, ariG23498 • 8 days ago • 52
view article Article DeepSeek-V4: a million-token context that agents can actually use burtenshaw • 28 days ago • 46
DFlash Collection Block Diffusion for Flash Speculative Decoding • 21 items • Updated 12 days ago • 117
view article Article Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers +5 ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez • Sep 11, 2025 • 188
view article Article Accelerate ND-Parallel: A guide to Efficient Multi-GPU Training +3 smohammadi, siro1, winglian, marcsun13, djsaunde • Aug 8, 2025 • 98
view article Article Gotchas in Tokenizer Behavior Every Developer Should Know qgallouedec • Apr 18, 2025 • 72
view article Article Training Large Language Models with Interpreter Feedback using WebAssembly axolotl-ai-co • Apr 3, 2025 • 14
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13, 2025 • 101
Llama3-ChatQA-1.5 Collection Llama3-ChatQA-1.5 models excel at conversational question answering (QA) and retrieval-augmented generation (RAG). • 6 items • Updated 2 days ago • 47
Model Merging Collection Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12, 2024 • 254