vuiseng9
/

ov-weight-quantized-llms

Model card Files Files and versions

Vui Seng Chua commited on Mar 21, 2024

Commit

769e2c6

·

1 Parent(s): 73d74ae

Revise README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -2,11 +2,13 @@
 This repo contains binary of weight quantized by [OpenVINO](https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/254-llm-chatbot/254-llm-chatbot.ipynb).
 | LLM             	| ratio 	| group_size 	|
 |-----------------	|-------	|------------	|
 | llama-2-chat-7b 	| 0.8   	| 128        	|
 | mistral-7b      	| 0.6   	| 64         	|
 | gemma-2b-it     	| 0.6   	| 64         	|
 Notes:
 * ratio=0.8 means 80% of FC (linear) layers are 4-bit weight quantized and the rest in 8-bit.

 This repo contains binary of weight quantized by [OpenVINO](https://github.com/openvinotoolkit/openvino_notebooks/blob/main/notebooks/254-llm-chatbot/254-llm-chatbot.ipynb).
+```
 | LLM             	| ratio 	| group_size 	|
 |-----------------	|-------	|------------	|
 | llama-2-chat-7b 	| 0.8   	| 128        	|
 | mistral-7b      	| 0.6   	| 64         	|
 | gemma-2b-it     	| 0.6   	| 64         	|
+```
 Notes:
 * ratio=0.8 means 80% of FC (linear) layers are 4-bit weight quantized and the rest in 8-bit.