Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,19 +1,31 @@
|
|
| 1 |
---
|
| 2 |
title: README
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
colorTo: purple
|
| 6 |
sdk: static
|
| 7 |
-
pinned:
|
|
|
|
| 8 |
---
|
| 9 |
|
| 10 |
# Common Crawl
|
| 11 |
|
| 12 |
Welcome to the Common Crawl Foundation's Hugging Face page!
|
| 13 |
|
| 14 |
-
We
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
title: README
|
| 3 |
+
emoji: 🌍
|
| 4 |
+
colorFrom: indigo
|
| 5 |
colorTo: purple
|
| 6 |
sdk: static
|
| 7 |
+
pinned: true
|
| 8 |
+
short_description: Explore Common Crawl's metadata and experimental datasets
|
| 9 |
---
|
| 10 |
|
| 11 |
# Common Crawl
|
| 12 |
|
| 13 |
Welcome to the Common Crawl Foundation's Hugging Face page!
|
| 14 |
|
| 15 |
+
We aim to provide metadata and experimental versions of our latest data products here.
|
| 16 |
|
| 17 |
+
### Useful Links
|
| 18 |
|
| 19 |
+
- [Common Crawl's official website](https://commoncrawl.org/)
|
| 20 |
+
- [Our existing statistics webpages](https://commoncrawl.github.io/cc-crawl-statistics/) ([GitHub repo](https://github.com/commoncrawl/cc-crawl-statistics))
|
| 21 |
+
- [AWS infrastructure status page](https://status.commoncrawl.org/)
|
| 22 |
+
|
| 23 |
+
### Datasets
|
| 24 |
+
|
| 25 |
+
Explore our datasets hosted on Hugging Face:
|
| 26 |
+
|
| 27 |
+
- [Common Crawl Statistics](https://huggingface.co/datasets/commoncrawl/statistics)
|
| 28 |
+
- [EOT 2024 Host-Level Logs](https://huggingface.co/datasets/commoncrawl/eot2024_hostlevel_logs)
|
| 29 |
+
- [Common Crawl Citations](https://huggingface.co/datasets/commoncrawl/citations)
|
| 30 |
+
|
| 31 |
+
We look forward to supporting the research and development community with these resources.
|