--- base_model: meta-llama/Llama-3.1-8B license: mit pipeline_tag: text-generation tags: - Llama-3 - finetune quantized_by: boapro datasets: - boapro/W1 - boapro/W2 - boapro/cyber-code - boapro/Code-Functions --- ## Llamacpp imatrix Quantizations of meta-llama/Llama-3.1-8B Using llama.cpp release b3878 for quantization. Original model: https://huggingface.co/meta-llama/Llama-3.1-8B Run it in [LM Studio](https://lmstudio.ai/) ## Prompt format ``` <|begin_of_text|><|start_header_id|>system<|end_header_id|> {system_prompt}<|eot_id|><|start_header_id|>user<|end_header_id|> {prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|> ``` ## Downloading using huggingface-cli First, make sure you have hugginface-cli installed: ``` pip install -U "huggingface_hub[cli]" ``` Then, you can target the specific file you want: If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run: You can either specify a new local-dir (boapro/WRT_II) or download them all in place (./) ## Q4_0_X_X If you're using an ARM chip, the Q4_0_X_X quants will have a substantial speedup. Check out Q4_0_4_4 speed comparisons [on the original pull request](https://github.com/ggerganov/llama.cpp/pull/5780#pullrequestreview-21657544660) To check which one would work best for your ARM chip, you can check [AArch64 SoC features](https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html) (thanks EloyOn!). If you want to get more into the weeds, you can check out this extremely useful feature chart: [llama.cpp feature matrix](https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix)