Devstral 2 Collection A couple of agentic LLMs for software engineering tasks, excelling at using tools to explore codebases, edit multiple files, and power SWE Agents. • 2 items • Updated Mar 2 • 56
Tokenizer Adaptation Collection Collection of research on tokenizers' adaptation to specific domains and/or languages. Special focus on sequence compression directions • 4 items • Updated Jul 6, 2024 • 1
Model Merging Collection Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12, 2024 • 254
💜 Kotlin ML Pack Collection A collection of datasets, fine-tuned models and benchmarks to train your models for perfect Kotlin code generation. • 9 items • Updated Jun 11, 2024 • 27
view article Article Tricks from OpenAI gpt-oss YOU 🫵 can use with transformers +5 ariG23498, sergiopaniego, reach-vb, pcuenq, ArthurZ, SaylorTwift, cyrilvallez • Sep 11, 2025 • 188
TokDrift: When LLM Speaks in Subwords but Code Speaks in Grammar Paper • 2510.14972 • Published Oct 16, 2025 • 35
H-Net Collection The family of hierarchical networks (H-Nets) from https://arxiv.org/abs/2507.07955 • 6 items • Updated Mar 2 • 20
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 780
view article Article Welcome Gemma 2 - Google’s new open LLM +4 philschmid, osanseviero, pcuenq, lewtun, tomaarsen, reach-vb • Jun 27, 2024 • 132
Drag-and-Drop LLMs: Zero-Shot Prompt-to-Weights Paper • 2506.16406 • Published Jun 19, 2025 • 133
view article Article RegMix: Data Mixture as Regression for Language Model Pre-training SivilTaram • Jul 11, 2024 • 16
Spectrum: Targeted Training on Signal to Noise Ratio Paper • 2406.06623 • Published Jun 7, 2024 • 16
view article Article Selective fine-tuning of Language Models with Spectrum anakin87 • Sep 3, 2024 • 36
Can LLMs Generate High-Quality Test Cases for Algorithm Problems? TestCase-Eval: A Systematic Evaluation of Fault Coverage and Exposure Paper • 2506.12278 • Published Jun 13, 2025 • 16
view article Article StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation +7 yuxiang630, cassanof, ganler, YifengDing, StringChaos, harmdevries, lvwerra, arjunguha, lingming • Apr 29, 2024 • 79