Text Generation
Transformers
Safetensors
llama
text-generation-inference
MultivexAI commited on
Commit
cceeadf
·
verified ·
1 Parent(s): c71a696

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -16,7 +16,7 @@ We built this model to be a small, useful foundation for various tasks. It's a g
16
 
17
  ## Pre-training Data
18
 
19
- The model was trained on a carefully curated mix of data to build a great foundation:
20
 
21
  1. **`fineweb-pro`**: A heavily filtered and refined version of the FineWeb dataset. This provides a strong base in general-purpose language by removing significant noise and low-quality content.
22
  2. **`fineweb-edu`**: A subset of FineWeb containing educational and instructional content, used to ground the model in well-structured, factual information.
 
16
 
17
  ## Pre-training Data
18
 
19
+ The model was trained on a carefully curated mix of data to build a great foundation, trained on approx ~600M tokens:
20
 
21
  1. **`fineweb-pro`**: A heavily filtered and refined version of the FineWeb dataset. This provides a strong base in general-purpose language by removing significant noise and low-quality content.
22
  2. **`fineweb-edu`**: A subset of FineWeb containing educational and instructional content, used to ground the model in well-structured, factual information.