view article Article Jupyter Agents: training LLMs to reason with notebooks +1 baptistecolle, hannayukhymenko, lvwerra • Sep 10, 2025 • 64
Offline Reinforcement Learning as One Big Sequence Modeling Problem Paper • 2106.02039 • Published Jun 3, 2021 • 2
view article Article Make LLM Fine-tuning 2x faster with Unsloth and 🤗 TRL danielhanchen • Jan 10, 2024 • 76
Vector Quantized Diffusion Model for Text-to-Image Synthesis Paper • 2111.14822 • Published Nov 29, 2021 • 1
NanoBEIR 🍺 Collection A collection of smaller versions of BEIR datasets with 50 queries and up to 10K documents each. • 13 items • Updated Sep 11, 2024 • 27
view article Article Preference Tuning LLMs with Direct Preference Optimization Methods +3 kashif, edbeeching, lewtun, lvwerra, osanseviero • Jan 18, 2024 • 83
Visual Representation Learning with Stochastic Frame Prediction Paper • 2406.07398 • Published Jun 11, 2024 • 1
Qwen2.5-Coder Collection Code-specific model series based on Qwen2.5 • 38 items • Updated Mar 2 • 368
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization Paper • 2403.17031 • Published Mar 24, 2024 • 7
view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) +2 natolambert, LouisCastricato, lvwerra, Dahoas • Dec 9, 2022 • 413