Instructions to use bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF", filename="DeepSeek-R1-Distill-Qwen-14B-IQ2_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
Use Docker
docker model run hf.co/bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
- Ollama
How to use bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF with Ollama:
ollama run hf.co/bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
- Unsloth Studio
How to use bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF to start chatting
- Docker Model Runner
How to use bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF with Docker Model Runner:
docker model run hf.co/bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
- Lemonade
How to use bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull bartowski/DeepSeek-R1-Distill-Qwen-14B-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.DeepSeek-R1-Distill-Qwen-14B-GGUF-Q4_K_M
List all available models
lemonade list
nie działa?
Jak używasz tego modelu? Sprawdzałem text-generation-webui oraz koboldcpp, w żadnym się nie wczytuje. Widzę, że na llama.cpp jest w issues wsparcie do tego dopiero.
I've got the same issue, it just doesn't load.
19:21:56-731601 INFO Loading "DeepSeek-R1-Distill-Qwen-14B-Q8_0.gguf"
19:21:57-055526 INFO llama.cpp weights detected: "models\DeepSeek-R1-Distill-Qwen-14B-Q8_0.gguf"
llama_model_load_from_file: using device CUDA0 (NVIDIA GeForce RTX 4070 Ti SUPER) - 15089 MiB free
llama_model_load: error loading model: tensor 'blk.46.ffn_gate.weight' data is not within the file bounds, model is corrupted or incomplete
llama_model_load_from_file: failed to load model
19:21:57-108380 ERROR Failed to load the model.
Traceback (most recent call last):
File "X:\Ai\Text-Generation-Webui\modules\ui_model_menu.py", line 214, in load_model_wrapper
shared.model, shared.tokenizer = load_model(selected_model, loader)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "X:\Ai\Text-Generation-Webui\modules\models.py", line 90, in load_model
output = load_func_maploader
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "X:\Ai\Text-Generation-Webui\modules\models.py", line 280, in llamacpp_loader
model, tokenizer = LlamaCppModel.from_pretrained(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "X:\Ai\Text-Generation-Webui\modules\llamacpp_model.py", line 111, in from_pretrained
result.model = Llama(**params)
^^^^^^^^^^^^^^^
File "X:\Ai\Text-Generation-Webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores\llama.py", line 369, in init
internals.LlamaModel(
File "X:\Ai\Text-Generation-Webui\installer_files\env\Lib\site-packages\llama_cpp_cuda_tensorcores_internals.py", line 56, in init
raise ValueError(f"Failed to load model from file: {path_model}")
ValueError: Failed to load model from file: models\DeepSeek-R1-Distill-Qwen-14B-Q8_0.gguf
Exception ignored in: <function LlamaCppModel.__del__ at 0x000001E5804FF920>
Traceback (most recent call last):
File "X:\Ai\Text-Generation-Webui\modules\llamacpp_model.py", line 62, in del
del self.model
^^^^^^^^^^
AttributeError: 'LlamaCppModel' object has no attribute 'model'
Text generation webui relies on llama-cpp-python which hasn't been updated to support the DeepSeek distills yet