Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -14,8 +14,8 @@ So far, we have released:
|
|
| 14 |
|
| 15 |
- [The Common Pile v0.1](https://huggingface.co/collections/common-pile/common-pile-v01-raw-data-6826b454a5a6a445d0b51b37), an 8 TB dataset of text from over 30 diverse sources
|
| 16 |
- [Comma v0.1-1T](https://huggingface.co/common-pile/comma-v0.1-1t) and [Comma v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t), 7B parameter LLMs trained on text from the Common Pile v0.1
|
|
|
|
| 17 |
- The [training dataset](https://huggingface.co/datasets/common-pile/comma_v0.1_training_dataset) used to train the Comma v0.1 models
|
| 18 |
- Our [code](https://github.com/r-three/common-pile/) for collecting data from each source
|
| 19 |
-
- Our paper: [The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text](https://huggingface.co/papers/2506.05209)
|
| 20 |
|
| 21 |
If you're interested in contributing, please [open an issue on GitHub](https://github.com/r-three/common-pile/issues/new)!
|
|
|
|
| 14 |
|
| 15 |
- [The Common Pile v0.1](https://huggingface.co/collections/common-pile/common-pile-v01-raw-data-6826b454a5a6a445d0b51b37), an 8 TB dataset of text from over 30 diverse sources
|
| 16 |
- [Comma v0.1-1T](https://huggingface.co/common-pile/comma-v0.1-1t) and [Comma v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t), 7B parameter LLMs trained on text from the Common Pile v0.1
|
| 17 |
+
- Our paper: [The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text](https://huggingface.co/papers/2506.05209)
|
| 18 |
- The [training dataset](https://huggingface.co/datasets/common-pile/comma_v0.1_training_dataset) used to train the Comma v0.1 models
|
| 19 |
- Our [code](https://github.com/r-three/common-pile/) for collecting data from each source
|
|
|
|
| 20 |
|
| 21 |
If you're interested in contributing, please [open an issue on GitHub](https://github.com/r-three/common-pile/issues/new)!
|