boapro
/

WRT_II

Text Generation

Model card Files Files and versions

WRT_II / README.md

boapro's picture

Update README.md

9751455 verified 8 months ago

|

history blame contribute delete

1.8 kB

	---
	base_model: meta-llama/Llama-3.1-8B
	license: mit
	pipeline_tag: text-generation
	tags:
	- Llama-3
	- finetune
	quantized_by: boapro
	datasets:
	- boapro/W1
	- boapro/W2
	- boapro/cyber-code
	- boapro/Code-Functions
	---

	## Llamacpp imatrix Quantizations of meta-llama/Llama-3.1-8B
	Using <a href="https://github.com/ggerganov/llama.cpp/">llama.cpp</a> release <a href="https://github.com/ggerganov/llama.cpp/releases/tag/b3878">b3878</a> for quantization.

	Original model: https://huggingface.co/meta-llama/Llama-3.1-8B


	Run it in [LM Studio](https://lmstudio.ai/)

	## Prompt format

	```
	<\|begin_of_text\|><\|start_header_id\|>system<\|end_header_id\|>

	{system_prompt}<\|eot_id\|><\|start_header_id\|>user<\|end_header_id\|>

	{prompt}<\|eot_id\|><\|start_header_id\|>assistant<\|end_header_id\|>
	```



	## Downloading using huggingface-cli

	First, make sure you have hugginface-cli installed:

	```
	pip install -U "huggingface_hub[cli]"
	```

	Then, you can target the specific file you want:



	If the model is bigger than 50GB, it will have been split into multiple files. In order to download them all to a local folder, run:


	You can either specify a new local-dir (boapro/WRT_II) or download them all in place (./)

	## Q4_0_X_X


	If you're using an ARM chip, the Q4_0_X_X quants will have a substantial speedup. Check out Q4_0_4_4 speed comparisons [on the original pull request](https://github.com/ggerganov/llama.cpp/pull/5780#pullrequestreview-21657544660)

	To check which one would work best for your ARM chip, you can check [AArch64 SoC features](https://gpages.juszkiewicz.com.pl/arm-socs-table/arm-socs.html) (thanks EloyOn!).


	If you want to get more into the weeds, you can check out this extremely useful feature chart:

	[llama.cpp feature matrix](https://github.com/ggerganov/llama.cpp/wiki/Feature-matrix)