Update README.md
Browse files
README.md
CHANGED
|
@@ -31,7 +31,6 @@ Meticulously curated **pre-training and SFT data** with rigorous filtering and q
|
|
| 31 |
|
| 32 |
- **Ultra-Efficient Training Framework** Complete end-to-end training framework designed for maximum efficiency:
|
| 33 |
- $16000 total budget for full model training on A100 GPUs ($0.6 per GPU/Hour)
|
| 34 |
-
- 45% HFU efficiency in 8k context length
|
| 35 |
- Built on **MegatronLM** with support for **MoE**, **FP8**, and **long sequence parallelization**
|
| 36 |
- Optimized codebase for cost-effective scaling
|
| 37 |
|
|
|
|
| 31 |
|
| 32 |
- **Ultra-Efficient Training Framework** Complete end-to-end training framework designed for maximum efficiency:
|
| 33 |
- $16000 total budget for full model training on A100 GPUs ($0.6 per GPU/Hour)
|
|
|
|
| 34 |
- Built on **MegatronLM** with support for **MoE**, **FP8**, and **long sequence parallelization**
|
| 35 |
- Optimized codebase for cost-effective scaling
|
| 36 |
|