mimipynb (Mimi)

upvoted a collection 3 months ago

Llama 4

Collection

Llama 4 release • 13 items • Updated Apr 29, 2025 • 739

upvoted an article 8 months ago

Article

Jupyter Agents: training LLMs to reason with notebooks

+1

baptistecolle, hannayukhymenko, lvwerra

•

Sep 10, 2025

• 66

upvoted a paper 9 months ago

Offline Reinforcement Learning as One Big Sequence Modeling Problem

Paper • 2106.02039 • Published Jun 3, 2021 • 2

upvoted an article 11 months ago

Article

Make LLM Fine-tuning 2x faster with Unsloth and 🤗 TRL

danielhanchen

•

Jan 10, 2024

• 77

upvoted a paper over 1 year ago

Vector Quantized Diffusion Model for Text-to-Image Synthesis

Paper • 2111.14822 • Published Nov 29, 2021 • 1

upvoted a collection over 1 year ago

NanoBEIR 🍺

Collection

A collection of smaller versions of BEIR datasets with 50 queries and up to 10K documents each. • 13 items • Updated Sep 11, 2024 • 27

upvoted an article over 1 year ago

Article

Preference Tuning LLMs with Direct Preference Optimization Methods

+3

kashif, edbeeching, lewtun, lvwerra, osanseviero

•

Jan 18, 2024

• 84

upvoted a paper over 1 year ago

Visual Representation Learning with Stochastic Frame Prediction

Paper • 2406.07398 • Published Jun 11, 2024 • 1

upvoted a collection over 1 year ago

Qwen2.5-Coder

Collection

Code-specific model series based on Qwen2.5 • 38 items • Updated Mar 2 • 372

upvoted a paper over 1 year ago

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Paper • 2403.17031 • Published Mar 24, 2024 • 8

upvoted an article almost 2 years ago

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

+2

natolambert, LouisCastricato, lvwerra, Dahoas

•

Dec 9, 2022

• 418

upvoted an article about 2 years ago

Article

The Technology Behind BLOOM Training

stas

•

Jul 14, 2022

• 45

Mimi

AI & ML interests

Organizations

Llama 4

Jupyter Agents: training LLMs to reason with notebooks

Offline Reinforcement Learning as One Big Sequence Modeling Problem

Make LLM Fine-tuning 2x faster with Unsloth and 🤗 TRL

Vector Quantized Diffusion Model for Text-to-Image Synthesis

NanoBEIR 🍺

Preference Tuning LLMs with Direct Preference Optimization Methods

Visual Representation Learning with Stochastic Frame Prediction

Qwen2.5-Coder

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

Illustrating Reinforcement Learning from Human Feedback (RLHF)

The Technology Behind BLOOM Training

Mimi

AI & ML interests

Organizations

mimipynb's activity

Jupyter Agents: training LLMs to reason with notebooks

Make LLM Fine-tuning 2x faster with Unsloth and 🤗 TRL

Preference Tuning LLMs with Direct Preference Optimization Methods

Illustrating Reinforcement Learning from Human Feedback (RLHF)

The Technology Behind BLOOM Training