| | --- |
| | license: llama3.1 |
| | base_model: meta-llama/Meta-Llama-3.1-8B-Instruct |
| | pipeline_tag: text-generation |
| | tags: |
| | - text-generation-inference |
| | --- |
| | > [!WARNING] |
| | > At the time of this release, llama.cpp did not support the rope scaling required for full context (limit is 8192). Soon this will be updated for full 128K functionality. |
| | > Depriciated models still listed do not have 128k mark. |
| | |
| | > [!NOTE] |
| | > The new release of llama.cpp and transformers have been applied and the gguf was tested. |
| | > [Meta-Llama-3.1-8B-Instruct-128k](https://huggingface.co/3Simplex/Meta-Llama-3.1-8B-Instruct-gguf/blob/main/Meta-Llama-3.1-8B-Instruct-128k-Q4_0.gguf) |
| | > You will need to update llama.cpp and transformers to use the full context. |
| |
|
| |  |
| |
|
| |
|
| | ## Prompt Template |
| |
|
| | ``` |
| | <|start_header_id|>system<|end_header_id|> |
| | |
| | {system_prompt}<|eot_id|> |
| | ``` |
| |
|
| | ``` |
| | <|start_header_id|>user<|end_header_id|> |
| | |
| | {user_input}<|eot_id|><|start_header_id|>assistant<|end_header_id|> |
| | |
| | {assistant_response} |
| | ``` |
| |
|
| | ## 128k Context Length |
| | "llama.context_length": 131072 |