Commit
·
ae9ad79
1
Parent(s):
2ee211e
Specify the correct llama model name
Browse files
README.md
CHANGED
|
@@ -3,7 +3,7 @@ license: mit
|
|
| 3 |
---
|
| 4 |
|
| 5 |
# llama-2-7b-chat_q4_quantized_cpp
|
| 6 |
-
- This model contains the 4-bit quantized version of [llama2](https://github.com/facebookresearch/llama) model in cpp.
|
| 7 |
- This can be run on a local cpu system as a cpp module *(instructions for the same are given below)*.
|
| 8 |
- As for the testing, the model has been tested on `Linux(Ubuntu)` os with `12 GB RAM` and `core i5 processor`.
|
| 9 |
- The performance is `roughly` **907.46 ms per token**, **1.10 tokens per second**
|
|
|
|
| 3 |
---
|
| 4 |
|
| 5 |
# llama-2-7b-chat_q4_quantized_cpp
|
| 6 |
+
- This model contains the 4-bit quantized version of [llama2-7B-chat](https://github.com/facebookresearch/llama) model in cpp.
|
| 7 |
- This can be run on a local cpu system as a cpp module *(instructions for the same are given below)*.
|
| 8 |
- As for the testing, the model has been tested on `Linux(Ubuntu)` os with `12 GB RAM` and `core i5 processor`.
|
| 9 |
- The performance is `roughly` **907.46 ms per token**, **1.10 tokens per second**
|