Spaces:
Sleeping
Sleeping
✏️
Browse filesSigned-off-by: peter szemraj <peterszemraj@gmail.com>
README.md
CHANGED
|
@@ -28,7 +28,7 @@ In our country, we say _"To let 100M parameters model generate python script and
|
|
| 28 |
|
| 29 |
## Base Model Information
|
| 30 |
|
| 31 |
-
The base model, smol_llama-101M-GQA,
|
| 32 |
|
| 33 |
- [JeanKaddour/minipile](https://huggingface.co/datasets/JeanKaddour/minipile)
|
| 34 |
- [pszemraj/simple_wikipedia_LM](https://huggingface.co/datasets/pszemraj/simple_wikipedia_LM)
|
|
|
|
| 28 |
|
| 29 |
## Base Model Information
|
| 30 |
|
| 31 |
+
The base model, smol_llama-101M-GQA, has been pre-trained on a relatively small number of high quality tokens (less than ~20B). It has impressive performance despite its compact size of 101M parameters. Training data for this base model included:
|
| 32 |
|
| 33 |
- [JeanKaddour/minipile](https://huggingface.co/datasets/JeanKaddour/minipile)
|
| 34 |
- [pszemraj/simple_wikipedia_LM](https://huggingface.co/datasets/pszemraj/simple_wikipedia_LM)
|