Update README.md
Browse files
README.md
CHANGED
|
@@ -18,9 +18,9 @@ Plyx-15M is intended for quick testing, research into data efficiency, and speci
|
|
| 18 |
|
| 19 |
Plyx-15M was trained exclusively on a carefully selected set of premium datasets, prioritizing accuracy and structure.
|
| 20 |
|
| 21 |
-
1. **`fineweb-pro
|
| 22 |
-
2. **`fineweb-edu
|
| 23 |
-
3. **`finepdfs
|
| 24 |
|
| 25 |
### Limitations
|
| 26 |
|
|
|
|
| 18 |
|
| 19 |
Plyx-15M was trained exclusively on a carefully selected set of premium datasets, prioritizing accuracy and structure.
|
| 20 |
|
| 21 |
+
1. **`fineweb-pro`** This data is a highly refined subset of general internet content. It was aggressively filtered using advanced, automated tools to remove common errors and noise, giving the model a clean understanding of everyday language.
|
| 22 |
+
2. **`fineweb-edu`** Content focused on education and instruction, providing the model with a solid base in clear, organized knowledge.
|
| 23 |
+
3. **`finepdfs`** Specialized knowledge sourced from millions of professional reports and complex documents (PDFs). This ensures the model is exposed to formal, technical writing styles and organized information structures.
|
| 24 |
|
| 25 |
### Limitations
|
| 26 |
|