craffel HF Staff commited on
Commit
9dda2ee
·
verified ·
1 Parent(s): 02e3168

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -2
README.md CHANGED
@@ -1,10 +1,20 @@
1
  ---
2
  title: README
3
- emoji: 🦀
4
  colorFrom: purple
5
  colorTo: yellow
6
  sdk: static
7
  pinned: false
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: README
3
+ emoji: 🐛
4
  colorFrom: purple
5
  colorTo: yellow
6
  sdk: static
7
  pinned: false
8
  ---
9
 
10
+ # The Common Pile
11
+
12
+ We are a group of researchers working together to collect and curate openly licensed and public domain data for training large language models.
13
+ So far, we have released:
14
+
15
+ - [The Common Pile v0.1](https://huggingface.co/collections/common-pile/common-pile-v01-raw-data-6826b454a5a6a445d0b51b37), an 8 TB dataset of text from over 30 diverse sources
16
+ - [Comma v0.1-1T](https://huggingface.co/common-pile/comma-v0.1-1t) and [Comma-v0.1-2T](https://huggingface.co/common-pile/comma-v0.1-2t), 7B parameter LLMs trained on text from the Common Pile v0.1
17
+ - The [training dataset](https://huggingface.co/datasets/common-pile/comma_v0.1_training_dataset) used to train the Comma v0.1 models
18
+ - Our [code](https://github.com/r-three/common-pile/) for collecting data from each source
19
+
20
+ If you're interested in contributing, please [open an issue on GitHub](https://github.com/r-three/common-pile/issues/new)!