Kaleemullah's picture
Add/Update README
556fc15 verified
metadata
license: apache-2.0
library_name: gguf
tags:
  - gguf
  - llama.cpp
  - quantized
base_model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B

deepseek-ai-deepseek-r1-distill-llama-8b-f16

This repository contains GGUF quantized models converted from deepseek-ai/DeepSeek-R1-Distill-Llama-8B.

Model Details

Quantization Information

  • F16: Half precision (16-bit floating point)

Usage

With llama.cpp

# Download the model
huggingface-cli download Kaleemullah/deepseek-ai-deepseek-r1-distill-llama-8b-f16 deepseek-ai-deepseek-r1-distill-llama-8b-f16.gguf --local-dir ./models

# Run inference
./llama-cli -m ./models/deepseek-ai-deepseek-r1-distill-llama-8b-f16.gguf -p "Your prompt here"

With Python (llama-cpp-python)

from llama_cpp import Llama

# Load the model
llm = Llama(
    model_path="./models/deepseek-ai-deepseek-r1-distill-llama-8b-f16.gguf",
    n_ctx=2048,  # Context window
    n_gpu_layers=-1  # Use GPU if available
)

# Generate text
output = llm("Your prompt here", max_tokens=100)
print(output['choices'][0]['text'])

With Ollama

# Create a Modelfile
echo 'FROM ./models/deepseek-ai-deepseek-r1-distill-llama-8b-f16.gguf' > Modelfile

# Create the model
ollama create my-model -f Modelfile

# Run the model
ollama run my-model

Model Architecture

This is a quantized version of deepseek-ai/DeepSeek-R1-Distill-Llama-8B, optimized for efficient inference while maintaining model quality.

License

This model inherits the license from the original model: deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Citation

If you use this model, please cite the original model:

@misc{deepseek-ai-DeepSeek-R1-Distill-Llama-8B,
  author = {Original Model Authors},
  title = {deepseek-ai/DeepSeek-R1-Distill-Llama-8B},
  year = {2024},
  publisher = {HuggingFace},
  url = {https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B}
}

Converted with

This model was converted using llama.cpp's convert_hf_to_gguf.py script.


Note: GGUF models are compatible with llama.cpp, Ollama, LM Studio, and other GGUF-compatible inference engines.