Update README.md
Browse files
README.md
CHANGED
|
@@ -18,7 +18,7 @@ Plyx-15M is intended for quick testing, research into data efficiency, and speci
|
|
| 18 |
|
| 19 |
Plyx-15M was trained exclusively on a carefully selected set of premium datasets, prioritizing accuracy and structure.
|
| 20 |
|
| 21 |
-
1. **`fineweb-pro`** This data is a highly refined subset of
|
| 22 |
2. **`fineweb-edu`** Content focused on education and instruction, providing the model with a solid base in clear, organized knowledge.
|
| 23 |
3. **`finepdfs`** Specialized knowledge sourced from millions of professional reports and complex documents (PDFs). This ensures the model is exposed to formal, technical writing styles and organized information structures.
|
| 24 |
|
|
|
|
| 18 |
|
| 19 |
Plyx-15M was trained exclusively on a carefully selected set of premium datasets, prioritizing accuracy and structure.
|
| 20 |
|
| 21 |
+
1. **`fineweb-pro`** This data is a highly refined subset of FineWeb. It was aggressively filtered using advanced, automated tools to remove common errors and noise, giving the model a clean understanding of everyday language.
|
| 22 |
2. **`fineweb-edu`** Content focused on education and instruction, providing the model with a solid base in clear, organized knowledge.
|
| 23 |
3. **`finepdfs`** Specialized knowledge sourced from millions of professional reports and complex documents (PDFs). This ensures the model is exposed to formal, technical writing styles and organized information structures.
|
| 24 |
|