Adjust title for consistency
Browse files
README.md
CHANGED
|
@@ -9,7 +9,7 @@ license: llama3
|
|
| 9 |
---
|
| 10 |
<a href="https://www.gradient.ai" target="_blank"><img src="https://cdn-uploads.huggingface.co/production/uploads/655bb613e8a8971e89944f3e/TSa3V8YpoVagnTYgxiLaO.png" width="200"/></a>
|
| 11 |
|
| 12 |
-
# Llama-3 70B Gradient
|
| 13 |
|
| 14 |
Gradient incorporates your data to deploy autonomous assistants that power critical operations across your business. If you're looking to build custom AI models or agents, email us a message contact@gradient.ai.
|
| 15 |
|
|
@@ -17,7 +17,6 @@ For more info see our [End-to-end development service for custom LLMs and AI sys
|
|
| 17 |
|
| 18 |
This model extends LLama-3 70B's context length from 8k to > 1048K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training by appropriately adjusting RoPE theta. We trained on 34M tokens for this stage, and ~430M tokens total for all stages, which is < 0.003% of Llama-3's original pre-training data.
|
| 19 |
|
| 20 |
-
|
| 21 |

|
| 22 |
|
| 23 |
**Approach:**
|
|
|
|
| 9 |
---
|
| 10 |
<a href="https://www.gradient.ai" target="_blank"><img src="https://cdn-uploads.huggingface.co/production/uploads/655bb613e8a8971e89944f3e/TSa3V8YpoVagnTYgxiLaO.png" width="200"/></a>
|
| 11 |
|
| 12 |
+
# Llama-3 70B Instruct Gradient 1048K
|
| 13 |
|
| 14 |
Gradient incorporates your data to deploy autonomous assistants that power critical operations across your business. If you're looking to build custom AI models or agents, email us a message contact@gradient.ai.
|
| 15 |
|
|
|
|
| 17 |
|
| 18 |
This model extends LLama-3 70B's context length from 8k to > 1048K, developed by Gradient, sponsored by compute from [Crusoe Energy](https://huggingface.co/crusoeai). It demonstrates that SOTA LLMs can learn to operate on long context with minimal training by appropriately adjusting RoPE theta. We trained on 34M tokens for this stage, and ~430M tokens total for all stages, which is < 0.003% of Llama-3's original pre-training data.
|
| 19 |
|
|
|
|
| 20 |

|
| 21 |
|
| 22 |
**Approach:**
|