Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up
minpeter 's Collections
[Dataset] K-Corpus
[Dataset] FineWeb2 Edu Korean
[Model] Very, very small things
[Dataset] Pretrain-corpus
[Model] en-ko trans
[Dataset] Candidate datasets to translate
[Dataset] PR
[Study] NN MNIST
[Model] FLUX.1 Full Finetuned & Merged
[🛠️] Huggingface Utility
[Dataset] unified standard function calling
[tokenizer] AlternateTokenizer
[Dataset] Function Calling

[Dataset] Pretrain-corpus

updated Jul 22, 2025
Upvote
-

  • PleIAs/common_corpus

    Viewer • Updated 13 days ago • 69.9k • 155k • 400

  • EssentialAI/essential-web-v1.0

    Preview • Updated Oct 2, 2025 • 128k • 224

  • HuggingFaceFW/fineweb

    Viewer • Updated Jul 11, 2025 • 52.5B • 934k • 2.8k

  • HuggingFaceFW/fineweb-edu

    Viewer • Updated Jul 11, 2025 • 3.5B • 588k • 1.08k

  • HuggingFaceFW/fineweb-2

    Viewer • Updated Oct 27, 2025 • 4.48B • 73.8k • 798

  • data-is-better-together/fineweb-c

    Viewer • Updated Jul 8, 2025 • 88.7k • 8.23k • 60

  • allenai/dolmino-mix-1124

    Viewer • Updated Oct 29, 2025 • 170M • 18.2k • 94

  • allenai/dolma

    Updated Apr 17, 2024 • 4.89k • 1.03k

  • allenai/olmo-mix-1124

    Viewer • Updated Aug 19, 2025 • 621M • 16.9k • 88

  • mlfoundations/dclm-baseline-1.0

    Preview • Updated Jul 22, 2024 • 367k • 269

  • Zyphra/Zyda-2

    Preview • Updated Aug 6, 2025 • 100k • 94
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs