|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: |
|
|
- deepseek-ai/DeepSeek-R1 |
|
|
--- |
|
|
|
|
|
# ⚠️ MODEL FILES REMOVED |
|
|
|
|
|
**The GGUF files for this model have been deleted due to storage limitations.** |
|
|
|
|
|
This model card is maintained as a historical reference because it is referenced in various GitHub issues and documentation. |
|
|
|
|
|
|
|
|
## Q4_K Quant of Deepseek-R1 for the MLA fork pull request |
|
|
|
|
|
## Requires this custom build of llama.cpp: |
|
|
|
|
|
https://github.com/ggerganov/llama.cpp/pull/11446 |
|
|
|
|
|
** IMPORTANT NOTE ** |
|
|
|
|
|
If you try to load this with the `main` branch of llama.cpp you'll see an error like this: |
|
|
|
|
|
``` |
|
|
load_tensors: loading model tensors, this can take a while... (mmap = true) |
|
|
llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 1147, got 1025 |
|
|
llama_model_load_from_file_impl: failed to load model |
|
|
common_init_from_params: failed to load model '/mount/checkpoints/DeepSeek-R1-11446-Q2_K-00001-of-00030.gguf' |
|
|
srv load_model: failed to load model, '/mount/checkpoints/DeepSeek-R1-11446-Q2_K-00001-of-00030.gguf' |
|
|
srv operator(): operator(): cleaning up before exit... |
|
|
main: exiting due to model loading error |
|
|
terminate called without an active exception |
|
|
Aborted (core dumped) |
|
|
``` |
|
|
|
|
|
There's a Q3_K_M version here: [daydream-org/DeepSeek-R1-GGUF-11446](https://huggingface.co/daydream-org/DeepSeek-R1-GGUF-11446) |
|
|
|
|
|
Created using the script below by [evshiron](https://huggingface.co/evshiron): |
|
|
|
|
|
```python |
|
|
export WORK_DIR=$(pwd) |
|
|
python3 -m venv venv |
|
|
source venv/bin/activate |
|
|
pip3 install -U "huggingface_hub[cli]" |
|
|
|
|
|
# the fp8 checkpoints are around 700GB |
|
|
mkdir checkpoints |
|
|
huggingface-cli download --resume-download --local-dir checkpoints/DeepSeek-R1 deepseek-ai/DeepSeek-R1 |
|
|
|
|
|
# my fork of llama.cpp including pr #11446 and some changes to allow converting fp8 hf to bf16 gguf directly using triton(-cpu) without the need of intermediate checkpoints |
|
|
git clone https://github.com/evshiron/llama.cpp --recursive |
|
|
pushd llama.cpp |
|
|
pip3 install -r requirements/requirements-convert_hf_to_gguf.txt |
|
|
cmake -B build |
|
|
cmake --build build --config Release |
|
|
popd |
|
|
|
|
|
# install triton-cpu for cpu-only dequant |
|
|
git clone https://github.com/triton-lang/triton-cpu --recursive |
|
|
pushd triton-cpu |
|
|
pip3 install ninja cmake wheel pybind11 |
|
|
MAX_JOBS=32 pip3 install -e python |
|
|
popd |
|
|
|
|
|
# hopefully it should work, takes an hour or more depending on your hardware, the bf16 checkpoints are around 1.3TB |
|
|
# the dequant process may take more than 64GB RAM, but should be doable within 360GB RAM |
|
|
python3 llama.cpp/convert_hf_to_gguf.py --outtype bf16 --split-max-size 50G checkpoints/DeepSeek-R1 |
|
|
|
|
|
# removing the fp8 checkpoints gives us 700GB back |
|
|
mkdir checkpoints/DeepSeek-R1-BF16 |
|
|
mv checkpoints/DeepSeek-R1/*.gguf checkpoints/DeepSeek-R1-BF16 |
|
|
rm -r checkpoints/DeepSeek-R1 |
|
|
|
|
|
# then use llama-quantize to make the quants you want, Q4_K_M should be around 400GB? |
|
|
./llama.cpp/build/bin/llama-quantize --keep-split checkpoints/DeepSeek-R1-BF16/<THE_FIRST_OF_DeepSeek-R1-BF16_GGUF>.gguf Q4_K_M |
|
|
``` |
|
|
|
|
|
It took 16 hours on an EC2 instance so I figured I'd share it. |
|
|
|
|
|
Script Credit/Source: [daydream-org/DeepSeek-R1-GGUF-11446](https://huggingface.co/daydream-org/DeepSeek-R1-GGUF-11446/discussions/1#67a327570051a98a96ded9e6) |