Update README.md
Browse files
README.md
CHANGED
|
@@ -2,4 +2,72 @@
|
|
| 2 |
license: apache-2.0
|
| 3 |
base_model:
|
| 4 |
- deepseek-ai/DeepSeek-R1
|
| 5 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
base_model:
|
| 4 |
- deepseek-ai/DeepSeek-R1
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
# Q4_K Quant of Deepseek-R1 for the MLA fork pull request
|
| 8 |
+
|
| 9 |
+
## Requires this custom build of llama.cpp:
|
| 10 |
+
|
| 11 |
+
https://github.com/ggerganov/llama.cpp/pull/11446
|
| 12 |
+
|
| 13 |
+
** IMPORTANT NOTE **
|
| 14 |
+
|
| 15 |
+
If you try to load this with the `main` branch of llama.cpp you'll see an error like this:
|
| 16 |
+
|
| 17 |
+
```
|
| 18 |
+
load_tensors: loading model tensors, this can take a while... (mmap = true)
|
| 19 |
+
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 1147, got 1025
|
| 20 |
+
llama_model_load_from_file_impl: failed to load model
|
| 21 |
+
common_init_from_params: failed to load model '/mount/checkpoints/DeepSeek-R1-11446-Q2_K-00001-of-00030.gguf'
|
| 22 |
+
srv load_model: failed to load model, '/mount/checkpoints/DeepSeek-R1-11446-Q2_K-00001-of-00030.gguf'
|
| 23 |
+
srv operator(): operator(): cleaning up before exit...
|
| 24 |
+
main: exiting due to model loading error
|
| 25 |
+
terminate called without an active exception
|
| 26 |
+
Aborted (core dumped)
|
| 27 |
+
```
|
| 28 |
+
|
| 29 |
+
There's a Q3_K_M version here: [daydream-org/DeepSeek-R1-GGUF-11446](https://huggingface.co/daydream-org/DeepSeek-R1-GGUF-11446)
|
| 30 |
+
|
| 31 |
+
Created using the script below by [evshiron](https://huggingface.co/evshiron):
|
| 32 |
+
|
| 33 |
+
```python
|
| 34 |
+
export WORK_DIR=$(pwd)
|
| 35 |
+
python3 -m venv venv
|
| 36 |
+
source venv/bin/activate
|
| 37 |
+
pip3 install -U "huggingface_hub[cli]"
|
| 38 |
+
|
| 39 |
+
# the fp8 checkpoints are around 700GB
|
| 40 |
+
mkdir checkpoints
|
| 41 |
+
huggingface-cli download --resume-download --local-dir checkpoints/DeepSeek-R1 deepseek-ai/DeepSeek-R1
|
| 42 |
+
|
| 43 |
+
# my fork of llama.cpp including pr #11446 and some changes to allow converting fp8 hf to bf16 gguf directly using triton(-cpu) without the need of intermediate checkpoints
|
| 44 |
+
git clone https://github.com/evshiron/llama.cpp --recursive
|
| 45 |
+
pushd llama.cpp
|
| 46 |
+
pip3 install -r requirements/requirements-convert_hf_to_gguf.txt
|
| 47 |
+
cmake -B build
|
| 48 |
+
cmake --build build --config Release
|
| 49 |
+
popd
|
| 50 |
+
|
| 51 |
+
# install triton-cpu for cpu-only dequant
|
| 52 |
+
git clone https://github.com/triton-lang/triton-cpu --recursive
|
| 53 |
+
pushd triton-cpu
|
| 54 |
+
pip3 install ninja cmake wheel pybind11
|
| 55 |
+
MAX_JOBS=32 pip3 install -e python
|
| 56 |
+
popd
|
| 57 |
+
|
| 58 |
+
# hopefully it should work, takes an hour or more depending on your hardware, the bf16 checkpoints are around 1.3TB
|
| 59 |
+
# the dequant process may take more than 64GB RAM, but should be doable within 360GB RAM
|
| 60 |
+
python3 llama.cpp/convert_hf_to_gguf.py --outtype bf16 --split-max-size 50G checkpoints/DeepSeek-R1
|
| 61 |
+
|
| 62 |
+
# removing the fp8 checkpoints gives us 700GB back
|
| 63 |
+
mkdir checkpoints/DeepSeek-R1-BF16
|
| 64 |
+
mv checkpoints/DeepSeek-R1/*.gguf checkpoints/DeepSeek-R1-BF16
|
| 65 |
+
rm -r checkpoints/DeepSeek-R1
|
| 66 |
+
|
| 67 |
+
# then use llama-quantize to make the quants you want, Q4_K_M should be around 400GB?
|
| 68 |
+
./llama.cpp/build/bin/llama-quantize --keep-split checkpoints/DeepSeek-R1-BF16/<THE_FIRST_OF_DeepSeek-R1-BF16_GGUF>.gguf Q4_K_M
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
It took 16 hours on an EC2 instance so I figured I'd share it.
|
| 72 |
+
|
| 73 |
+
Script Credit/Source: [daydream-org/DeepSeek-R1-GGUF-11446](https://huggingface.co/daydream-org/DeepSeek-R1-GGUF-11446/discussions/1#67a327570051a98a96ded9e6)
|