SandLogicTechnologies
/

DeepSeek-R1-Distill-Llama-8B-GGUF

Model card Files Files and versions

SandLogicTechnologies commited on Jan 29, 2025

Commit

ea280b8

·

verified ·

1 Parent(s): a034120

Create README.md

Files changed (1) hide show

README.md +72 -0

README.md ADDED Viewed

	@@ -0,0 +1,72 @@

+---
+language:
+- en
+base_model:
+- deepseek-ai/DeepSeek-R1-Distill-Llama-8B
+tags:
+- Llama
+- EdgeAI
+---
+# DeepSeek-R1-Distill-Llama-8B Quantized Models
+This repository contains Q4_KM and Q5_KM quantized versions of the [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) model, optimized for efficient deployment while maintaining strong performance.
+Discover our full range of quantized language models by visiting our [SandLogic Lexicon HuggingFace](https://huggingface.co/SandLogicTechnologies). To learn more about our company and services, check out our website at [SandLogic](https://www.sandlogic.com/).
+## Model Description
+These models are quantized versions of DeepSeek-R1-Distill-Llama-8B, which is a distilled 8B parameter model based on the Llama architecture. The original model demonstrates that reasoning patterns from larger models can be effectively distilled into smaller architectures.
+### Available Quantized Versions
+1. **Q4_KM Version**
+   - 4-bit quantization using the K-means method
+   - Approximately 4GB model size
+   - Optimal balance between model size and performance
+   - Recommended for resource-constrained environments
+2. **Q5_KM Version**
+   - 5-bit quantization using the K-means method
+   - Approximately 5GB model size
+   - Higher precision than Q4 while maintaining significant size reduction
+   - Recommended when higher accuracy is needed
+## Usage
+```bash
+pip install llama-cpp-python
+```
+Please refer to the llama-cpp-python [documentation](https://llama-cpp-python.readthedocs.io/en/latest/) to install with GPU support.
+### Basic Text Completion
+Here's an example demonstrating how to use the high-level API for basic text completion:
+```bash
+from llama_cpp import Llama
+llm = Llama(
+    model_path="model/path/",
+    verbose=False,
+    # n_gpu_layers=-1, # Uncomment to use GPU acceleration
+    # n_ctx=2048, # Uncomment to increase the context window
+)
+output = llm(
+    "Q: Name the planets in the solar system? A: ", # Prompt
+    max_tokens=32, # Generate up to 32 tokens
+    stop=["Q:", "\n"], # Stop generating just before a new question
+    echo=False # Don't echo the prompt in the output
+)
+print(output["choices"][0]["text"])
+```
+## License
+This model inherits the license of the original DeepSeek-R1-Distill-Llama-8B model. Please refer to the original model's license for usage terms and conditions.
+## Acknowledgments
+We thank the DeepSeek AI team for open-sourcing their distilled models and demonstrating that smaller models can achieve impressive performance through effective distillation techniques.