1 4 17

Mike White

seleven11

AI & ML interests

None yet

Recent Activity

upvoted a paper about 2 months ago

Mixture-of-Depths Attention

upvoted a paper about 2 months ago

Attention Residuals

liked a dataset 3 months ago

LLM360/guru-RL-92k

View all activity

Organizations

None yet

upvoted 2 papers about 2 months ago

Mixture-of-Depths Attention

Paper • 2603.15619 • Published Mar 16 • 80

Attention Residuals

Paper • 2603.15031 • Published Mar 16 • 184

liked a dataset 3 months ago

LLM360/guru-RL-92k

Viewer • Updated Aug 20, 2025 • 91.9k • 17.6k • 46

liked 2 datasets 4 months ago

omarkamali/wikipedia-monthly

Viewer • Updated Mar 14 • 195M • 11.3k • 64

BAAI/Infinity-Instruct

Viewer • Updated Dec 4, 2025 • 21.9M • 3.85k • 717

liked a dataset 5 months ago

opencsg/Fineweb-Edu-Chinese-V2.1

Viewer • Updated Jan 28 • 958M • 21.3k • 74

liked 2 datasets 6 months ago

HuggingFaceTB/smollm-corpus

Viewer • Updated Sep 6, 2024 • 237M • 57.2k • 454

Leon-Leee/unofficial-pyedu

Viewer • Updated Mar 12, 2025 • 7.68M • 47 • 4

upvoted an article 7 months ago

Article

SmolLM - blazingly fast and remarkably powerful

loubnabnl, anton-l, eliebak

•

Jul 16, 2024

• 455

liked a Space 7 months ago

The Smol Training Playbook

📚

3.17k

The secrets to building world-class LLMs

liked 3 datasets 7 months ago

upvoted an article 9 months ago

Article

Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment

NormalUhr

•

Feb 11, 2025

• 120

liked a Space 11 months ago

Predict Memory

🧮

108

Calculate and visualize memory usage for model training

liked a Space about 1 year ago

The Ultra-Scale Playbook

🌌

3.84k

The ultimate guide to training LLM on large GPU Clusters

liked 2 models over 1 year ago

Qwen/Qwen2-7B-Instruct

Text Generation • 8B • Updated Aug 21, 2024 • 698k • • 684

Alibaba-NLP/gte-Qwen2-7B-instruct

liked a model almost 2 years ago

Qwen/Qwen2-72B-Instruct

Text Generation • 73B • Updated Oct 8, 2024 • 39.1k • • 718

liked a model over 2 years ago

meta-llama/Llama-2-13b-hf

Text Generation • Updated Apr 17, 2024 • 57.7k • 624