File size: 1,799 Bytes
723816a 9751455 723816a 9751455 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 |
---
base_model: meta-llama/Llama-3.1-8B
license: mit
pipeline_tag: text-generation
tags:
- Llama-3
- finetune
quantized_by: boapro
datasets:
- boapro/W1
- boapro/W2
- boapro/cyber-code
- boapro/Code-Functions
---
## Llamacpp imatrix Quantizations of meta-llama/Llama-3.1-8B
Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b3878">b3878</a> for quantization.
Original model: https://huggingface.co/meta-llama/Llama-3.1-8B
Run it in [LM Studio](https://lmstudio.ai/)
## Prompt format
```
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
{system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|>
{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
```
## Downloading using huggingface-cli
First, make sure you have hugginface-cli installed:
```
pip install -U "huggingface_hub[cli]"
```
Then, you can target the specific file you want:
If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:
You can either specify a new local-dir (boapro/WRT_II) or download them all in place (./)
## Q4_0_X_X
If you're using an ARM chip, the Q4_0_X_X quants will have a substantial speedup. Check out Q4_0_4_4 speed comparisons [on the original pull request](https://github.com/ggerganov/llama.cpp/pull/5780#pullrequestreview-21657544660)
To check which one would work best for your ARM chip, you can check [AArch64 SoC features](https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html) (thanks EloyOn!).
If you want to get more into the weeds, you can check out this extremely useful feature chart:
[llama.cpp feature matrix](https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix) |