Update README.md
Browse files
README.md
CHANGED
|
@@ -261,7 +261,7 @@ chat_completion = client.chat.completions.create(
|
|
| 261 |
## Quantization Reproduction
|
| 262 |
|
| 263 |
> [!NOTE]
|
| 264 |
-
> In order to quantize Llama 3.1 70B Instruct using AutoGPTQ, you will need to use an instance with at least enough CPU RAM to fit the whole model i.e. ~
|
| 265 |
|
| 266 |
In order to quantize Llama 3.1 70B Instruct with GPTQ in INT4, you need to install the following packages:
|
| 267 |
|
|
|
|
| 261 |
## Quantization Reproduction
|
| 262 |
|
| 263 |
> [!NOTE]
|
| 264 |
+
> In order to quantize Llama 3.1 70B Instruct using AutoGPTQ, you will need to use an instance with at least enough CPU RAM to fit the whole model i.e. ~140GiB, and an NVIDIA GPU with 40GiB of VRAM to quantize it.
|
| 265 |
|
| 266 |
In order to quantize Llama 3.1 70B Instruct with GPTQ in INT4, you need to install the following packages:
|
| 267 |
|