Update README.md
Browse files
README.md
CHANGED
|
@@ -5,4 +5,46 @@ datasets:
|
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
pipeline_tag: text-generation
|
| 8 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
language:
|
| 6 |
- en
|
| 7 |
pipeline_tag: text-generation
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
### Model Sources
|
| 11 |
+
|
| 12 |
+
<!-- Provide the basic links for the model. -->
|
| 13 |
+
|
| 14 |
+
- **Repository:** TBD
|
| 15 |
+
- **Paper:** https://arxiv.org/abs/2309.08351
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
### Model Architecture and Objective
|
| 19 |
+
|
| 20 |
+
This model is a Pythia-70m architecture trained on OpenWebText-2 using the Contrastive Weight Tying objective.
|
| 21 |
+
|
| 22 |
+
#### Software
|
| 23 |
+
|
| 24 |
+
[More Information Needed]
|
| 25 |
+
|
| 26 |
+
## Citation
|
| 27 |
+
|
| 28 |
+
**BibTeX:**
|
| 29 |
+
|
| 30 |
+
```bibtex
|
| 31 |
+
@misc{godey2023headless,
|
| 32 |
+
title={Headless Language Models: Learning without Predicting with Contrastive Weight Tying},
|
| 33 |
+
author={Nathan Godey and Éric de la Clergerie and Benoît Sagot},
|
| 34 |
+
year={2023},
|
| 35 |
+
eprint={2309.08351},
|
| 36 |
+
archivePrefix={arXiv},
|
| 37 |
+
primaryClass={cs.CL}
|
| 38 |
+
}
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
## Model Card Authors
|
| 42 |
+
|
| 43 |
+
Nathan Godey
|
| 44 |
+
Eric de la Clergerie
|
| 45 |
+
Benoît Sagot
|
| 46 |
+
|
| 47 |
+
## Model Card Contact
|
| 48 |
+
|
| 49 |
+
nathan.godey@inria.fr
|
| 50 |
+
|