FineWeb-HQ Collection Collection containing FineWeb-HQ and FineWeb2-HQ quality filtered datasets and classifier model weights. • 4 items • Updated 10 days ago
FineWeb-HQ Collection Collection containing FineWeb-HQ and FineWeb2-HQ quality filtered datasets and classifier model weights. • 4 items • Updated 10 days ago
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published Jun 26, 2025 • 77
Unpacking SDXL Turbo: Interpreting Text-to-Image Models with Sparse Autoencoders Paper • 2410.22366 • Published Oct 28, 2024 • 84
Running Featured 1.31k FineWeb: decanting the web for the finest text data at scale 🍷 1.31k Generate a curated web‑text dataset for LLM training