Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Buckets new
  • Docs
  • Enterprise
  • Pricing
    • Website
      • Tasks
      • HuggingChat
      • Collections
      • Languages
      • Organizations
    • Community
      • Blog
      • Posts
      • Daily Papers
      • Learn
      • Discord
      • Forum
      • GitHub
    • Solutions
      • Team & Enterprise
      • Hugging Face PRO
      • Enterprise Support
      • Inference Providers
      • Inference Endpoints
      • Storage Buckets

  • Log In
  • Sign Up
DanielGallagherIRE 's Collections
Interesting Research
Obfuscated FineWeb Edu
Obfuscated ModernBERT
CORAL Mutual Information Obfuscation (OLD)
CORAL Structural Obfuscation (OLD)

Obfuscated FineWeb Edu

updated 9 days ago

A collection of obfuscated version of a 20B-token sample of the FineWeb Edu dataset.

Upvote
-

  • DanielGallagherIRE/fineweb-edu-20B-obfuscation

    Viewer • Updated 24 days ago • 19.4M • 573

  • DanielGallagherIRE/FineWeb-Edu-20B-E1-POS-Removal

    Viewer • Updated 14 days ago • 9.71M • 683

  • DanielGallagherIRE/FineWeb-Edu-20B-E1-Bag-of-Words

    Preview • Updated 20 days ago • 973

  • DanielGallagherIRE/FineWeb-Edu-20B-E1-Mutual-Information

    Viewer • Updated 9 days ago • 9.71M • 2.36k
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs