Update README.md
Browse files
README.md
CHANGED
|
@@ -20,6 +20,13 @@ Refer to the [original model card](https://huggingface.co/facebook/cwm) for more
|
|
| 20 |
- Layer Offload **64**
|
| 21 |
- Context Length **~50k**
|
| 22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
## Use with llama.cpp
|
| 24 |
Install llama.cpp through brew (works on Mac and Linux)
|
| 25 |
|
|
|
|
| 20 |
- Layer Offload **64**
|
| 21 |
- Context Length **~50k**
|
| 22 |
|
| 23 |
+
## Fitting on 24gb in LMStudio @Q4_0
|
| 24 |
+
- Flash attention **ENABLED**
|
| 25 |
+
- K Cache Quant type **Q4_0**
|
| 26 |
+
- V Cache Quant type **Q4_0**
|
| 27 |
+
- Layer Offload **64**
|
| 28 |
+
- Context Length **131072**
|
| 29 |
+
|
| 30 |
## Use with llama.cpp
|
| 31 |
Install llama.cpp through brew (works on Mac and Linux)
|
| 32 |
|