Running Featured 1.3k FineWeb: decanting the web for the finest text data at scale 🍷 1.3k Generate a curated web‑text dataset for LLM training
Paused Featured 134 Pdf To Structured Data 🌍 134 PDF to Structured Data powered by Google DeepMind Gemini 2.0
distilbert/distilbert-base-uncased-finetuned-sst-2-english Text Classification • 67M • Updated Dec 19, 2023 • 3.77M • • 880
sentence-transformers/paraphrase-multilingual-mpnet-base-v2 Sentence Similarity • 0.3B • Updated Aug 19, 2025 • 5.7M • • 452
sentence-transformers/all-MiniLM-L6-v2 Sentence Similarity • 22.7M • Updated Mar 6, 2025 • 173M • • 4.51k