BramVanroy/CommonCrawl-CreativeCommons
Viewer • Updated • 739M • 692 • 34
Raw CommonCrawl crawls, annotated with Creative Commons license information
Note Only retaining samples that are also present in FineWeb or FineWeb-2
Note Strong filters, only retaining FineWeb data, removing non-commercial data, removing Wiki data