Introspective Diffusion Language Models (I-DLM) Collection Model checkpoints for I-DLM. Paper: https://arxiv.org/abs/2604.11035 • 3 items • Updated Apr 14 • 11
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. • 15 items • Updated 13 days ago • 169
pplx-embed Collection Diffusion-Pretrained Dense and Contextual Embeddings • 10 items • Updated 30 days ago • 100
view article Article The Optimal Architecture for Small Language Models codelion • Dec 26, 2025 • 121
view article Article Tokenization in Transformers v5: Simpler, Clearer, and More Modular +4 itazap, ariG23498, ArthurZ, sergiopaniego, merve, pcuenq • Dec 18, 2025 • 125
Qwen-Image-Layered: Towards Inherent Editability via Layer Decomposition Paper • 2512.15603 • Published Dec 17, 2025 • 71
view article Article The Open Evaluation Standard: Benchmarking NVIDIA Nemotron 3 Nano with NeMo Evaluator nvidia • Dec 17, 2025 • 50
view article Article Provence: efficient and robust context pruning for retrieval-augmented generation nadiinchi • Jan 28, 2025 • 26
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model Paper • 2510.14528 • Published Oct 16, 2025 • 129
Why Low-Precision Transformer Training Fails: An Analysis on Flash Attention Paper • 2510.04212 • Published Oct 5, 2025 • 26
Reactive Transformer (RxT) -- Stateful Real-Time Processing for Event-Driven Reactive Language Models Paper • 2510.03561 • Published Oct 3, 2025 • 25
SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights Paper • 2509.22944 • Published Sep 26, 2025 • 80
view article Article Introducing Pivotal Token Search (PTS): Targeting Critical Decision Points in LLM Training codelion • May 17, 2025 • 12
view article Article Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders thomwolf, matthieu-lapeyre • Jul 9, 2025 • 803
view article Article SmolLM3: smol, multilingual, long-context reasoner +21 eliebak, cmpatino, anton-l, edbeeching, m-ric, nouamanetazi, akseljoonas, guipenedo, hynky, clefourrier, SaylorTwift, kashif, qgallouedec, hlarcher, glutamatt, Xenova, reach-vb, ngxson, craffel, lewtun, loubnabnl, lvwerra, thomwolf • Jul 8, 2025 • 780
view article Article (LoRA) Fine-Tuning FLUX.1-dev on Consumer Hardware +3 derekl35, marcsun13, sayakpaul, merve, linoyts • Jun 19, 2025 • 106
view article Article Gemma 3n fully available in the open-source ecosystem! +6 ariG23498, pcuenq, sergiopaniego, reach-vb, FL33TW00D-HF, Xenova, Steveeeeeeen, kashif • Jun 26, 2025 • 121