Update README.md
Browse files
README.md
CHANGED
|
@@ -16,7 +16,7 @@ We built this model to be a small, useful foundation for various tasks. It's a g
|
|
| 16 |
|
| 17 |
## Pre-training Data
|
| 18 |
|
| 19 |
-
The model was trained on a carefully curated mix of data to build a great foundation:
|
| 20 |
|
| 21 |
1. **`fineweb-pro`**: A heavily filtered and refined version of the FineWeb dataset. This provides a strong base in general-purpose language by removing significant noise and low-quality content.
|
| 22 |
2. **`fineweb-edu`**: A subset of FineWeb containing educational and instructional content, used to ground the model in well-structured, factual information.
|
|
|
|
| 16 |
|
| 17 |
## Pre-training Data
|
| 18 |
|
| 19 |
+
The model was trained on a carefully curated mix of data to build a great foundation, trained on approx ~600M tokens:
|
| 20 |
|
| 21 |
1. **`fineweb-pro`**: A heavily filtered and refined version of the FineWeb dataset. This provides a strong base in general-purpose language by removing significant noise and low-quality content.
|
| 22 |
2. **`fineweb-edu`**: A subset of FineWeb containing educational and instructional content, used to ground the model in well-structured, factual information.
|