Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -128,12 +128,9 @@ According to Chinchilla’s scaling laws, an optimal token-to-parameter ratio su
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/670142e648894dfbedacacaf/Ne2MYAB2C0yHWFJLjCww3.png)
 ### Key Insights from Evaluation
-- Efficient Training: The model demonstrates impressive performance relative to its training token count, suggesting an efficient use of resources.
-- Data-Specific Advantage: Training exclusively on educational data may have given GPT-124M an edge in evaluation metrics like `HellaSwag`.
-- Scaling Considerations: GPT-3 Small, despite being trained on 300B tokens, does not exhibit proportionally better performance due to scaling limitations.
 ## Environmental Impact

 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/670142e648894dfbedacacaf/Ne2MYAB2C0yHWFJLjCww3.png)
 ### Key Insights from Evaluation
+- **Efficient Training:** The model demonstrates impressive performance relative to its training token count, suggesting an efficient use of resources due to training using the Distributed Data Parallel (DDP) technique.
+- **Data-Specific Advantage:** Training exclusively on educational data may have given GPT-124M an edge in evaluation metrics like `HellaSwag`.
+- **Scaling Considerations:** GPT-3 Small, despite being trained on 300B tokens, does not exhibit proportionally better performance due to scaling limitations.
 ## Environmental Impact