Commit ·
d881779
1
Parent(s): c8da55e
Update README.md
Browse files
README.md
CHANGED
|
@@ -75,9 +75,13 @@ We benchmarked the Llama 2 7B and 13B with 4-bit quantization on NVIDIA GeForce
|
|
| 75 |
| Llama 2 13B | N/A | 90.7 | 115.8 |
|
| 76 |
|
| 77 |
```shell
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 81 |
```
|
| 82 |
|
| 83 |
## 4-bit Weight Quantization
|
|
|
|
| 75 |
| Llama 2 13B | N/A | 90.7 | 115.8 |
|
| 76 |
|
| 77 |
```shell
|
| 78 |
+
pip install nvidia-ml-py
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
```bash
|
| 82 |
+
python profile_generation.py \
|
| 83 |
+
--model-path /path/to/your/model \
|
| 84 |
+
--concurrency 1 8 --prompt-tokens 0 512 --completion-tokens 2048 512
|
| 85 |
```
|
| 86 |
|
| 87 |
## 4-bit Weight Quantization
|