Update README.md
Browse files
README.md
CHANGED
|
@@ -7,6 +7,8 @@ pipeline_tag: text-generation
|
|
| 7 |
|
| 8 |
This is a model collection of mostly larger LLMs quantized to 2 bit with the novel quip# inspired approach in llama.cpp
|
| 9 |
Sometimes both xs and xxs are available.
|
|
|
|
|
|
|
| 10 |
|
| 11 |
### Overview
|
| 12 |
- Senku-70b
|
|
|
|
| 7 |
|
| 8 |
This is a model collection of mostly larger LLMs quantized to 2 bit with the novel quip# inspired approach in llama.cpp
|
| 9 |
Sometimes both xs and xxs are available.
|
| 10 |
+
Note that for some larger models, like Qwen-72b based models, the context length might be too large for most GPUs, so you have to reduce it yourself in textgen-webui via the n_ctx setting.
|
| 11 |
+
Rope scaling for scaled models like longalpaca or yarn should be 8, set compress_pos_emb accordingly.
|
| 12 |
|
| 13 |
### Overview
|
| 14 |
- Senku-70b
|