samkeet commited on
Commit
5c2d7dc
·
verified ·
1 Parent(s): be7deb0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -6
README.md CHANGED
@@ -128,12 +128,9 @@ According to Chinchilla’s scaling laws, an optimal token-to-parameter ratio su
128
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/670142e648894dfbedacacaf/Ne2MYAB2C0yHWFJLjCww3.png)
129
 
130
  ### Key Insights from Evaluation
131
-
132
- - Efficient Training: The model demonstrates impressive performance relative to its training token count, suggesting an efficient use of resources.
133
-
134
- - Data-Specific Advantage: Training exclusively on educational data may have given GPT-124M an edge in evaluation metrics like `HellaSwag`.
135
-
136
- - Scaling Considerations: GPT-3 Small, despite being trained on 300B tokens, does not exhibit proportionally better performance due to scaling limitations.
137
 
138
  ## Environmental Impact
139
 
 
128
  ![image/png](https://cdn-uploads.huggingface.co/production/uploads/670142e648894dfbedacacaf/Ne2MYAB2C0yHWFJLjCww3.png)
129
 
130
  ### Key Insights from Evaluation
131
+ - **Efficient Training:** The model demonstrates impressive performance relative to its training token count, suggesting an efficient use of resources due to training using the Distributed Data Parallel (DDP) technique.
132
+ - **Data-Specific Advantage:** Training exclusively on educational data may have given GPT-124M an edge in evaluation metrics like `HellaSwag`.
133
+ - **Scaling Considerations:** GPT-3 Small, despite being trained on 300B tokens, does not exhibit proportionally better performance due to scaling limitations.
 
 
 
134
 
135
  ## Environmental Impact
136