Update README.md
Browse files
README.md
CHANGED
|
@@ -38,7 +38,7 @@ Note that tensor parallelism is not currently supported for this architecture, s
|
|
| 38 |
|
| 39 |
### How to use these quants
|
| 40 |
|
| 41 |
-
The documentation for [exllamav3](https://github.com/turboderp-org/exllamav3/) is your best bet here, as
|
| 42 |
* You need to have sufficient VRAM to fit the model and your context cache. I give some pointers above that may be helpful.
|
| 43 |
* At this point, your GPUs need to be nVidia. AMD/ROCm, Intel, and offloading to system RAM are not currently supported.
|
| 44 |
* You will need a software package capable of loading exllamav3 models. I'm still somewhat partial to oobabooga, but TabbyAPI is another popular option. Follow the documenation for your choice in order to get yourself set up.
|
|
|
|
| 38 |
|
| 39 |
### How to use these quants
|
| 40 |
|
| 41 |
+
The documentation for [exllamav3](https://github.com/turboderp-org/exllamav3/) is your best bet here, as well as that of [TabbyAPI](https://github.com/theroyallab/tabbyAPI) or [Text Generation Web UI (oobabooga)](https://github.com/oobabooga/text-generation-webui). In short:
|
| 42 |
* You need to have sufficient VRAM to fit the model and your context cache. I give some pointers above that may be helpful.
|
| 43 |
* At this point, your GPUs need to be nVidia. AMD/ROCm, Intel, and offloading to system RAM are not currently supported.
|
| 44 |
* You will need a software package capable of loading exllamav3 models. I'm still somewhat partial to oobabooga, but TabbyAPI is another popular option. Follow the documenation for your choice in order to get yourself set up.
|