| | --- |
| | license: mit |
| | library_name: transformers |
| | base_model: |
| | - deepseek-ai/DeepSeek-R1-Distill-Llama-8B |
| | pipeline_tag: text-generation |
| | tags: |
| | - llama |
| | - conversational |
| | --- |
| | # DeepSeek-R1-Distill-Llama-8B-Stateful-CoreML |
| |
|
| | This repository contains a CoreML conversion of the DeepSeek-R1-Distill-Llama-8B model optimized for Apple Silicon devices. This conversion features stateful key-value caching for efficient text generation. |
| |
|
| | ## Model Description |
| |
|
| | [DeepSeek-R1-Distill-Llama-8B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B) is a distilled 8 billion parameter language model from the DeepSeek-AI team. The model is built on the Llama architecture and has been distilled to maintain performance while reducing the parameter count. |
| |
|
| | This CoreML conversion provides: |
| | - Full compatibility with Apple Silicon devices (M1, M2, M3 series) |
| | - Stateful inference with KV-caching for efficient text generation |
| | - Optimized performance for on-device deployment |
| |
|
| | ## Technical Specifications |
| |
|
| | - **Base Model**: deepseek-ai/DeepSeek-R1-Distill-Llama-8B |
| | - **Parameters**: 8 billion |
| | - **Context Length**: Configurable (default: 64, expandable based on memory constraints) |
| | - **Quantization**: FP16 |
| | - **File Format**: .mlpackage |
| | - **Deployment Target**: macOS 15+ |
| | - **Architecture**: Stateful LLM with key-value caching |
| | - **Input Features**: Flexible input size with dynamic shape handling |
| |
|
| | ## Key Features |
| |
|
| | - **Stateful Inference**: The model implements a custom SliceUpdateKeyValueCache to maintain conversation state between inference calls, significantly improving generation speed. |
| | - **Dynamic Input Shapes**: Supports variable input lengths through RangeDim specification. |
| | - **Optimized Memory Usage**: Efficiently manages the key-value cache to minimize memory footprint. |
| |
|
| | ## Implementation Details |
| |
|
| | This conversion utilizes: |
| | - A custom KvCacheStateLlamaForCausalLM wrapper around the Hugging Face Transformers implementation |
| | - CoreML's state management capabilities for maintaining KV caches between inference calls |
| | - Proper buffer registration to ensure state persistence |
| | - Dynamic tensor shapes to accommodate various input and context lengths |
| |
|
| | ## Usage |
| |
|
| | The model can be loaded and used with CoreML in your Swift or Python projects: |
| |
|
| | ```python |
| | import coremltools as ct |
| | |
| | # Load the model |
| | model = ct.models.MLModel("DeepSeek-R1-Distill-Llama-8B.mlpackage") |
| | |
| | # Prepare inputs for inference |
| | # ... |
| | |
| | # Run inference |
| | output = model.predict({ |
| | "inputIds": input_ids, |
| | "causalMask": causal_mask |
| | }) |
| | ``` |
| |
|
| | ## Conversion Process |
| |
|
| | The model was converted using CoreML Tools with the following steps: |
| | 1. Loading the original model from Hugging Face |
| | 2. Wrapping it with custom state management |
| | 3. Tracing with PyTorch's JIT |
| | 4. Converting to CoreML format with state specifications |
| | 5. Saving in the .mlpackage format |
| |
|
| | ## Requirements |
| |
|
| | To use this model: |
| | - Apple Silicon Mac (M1/M2/M3 series) |
| | - macOS 15 or later |
| | - Minimum 16GB RAM recommended |
| |
|
| | ## Limitations |
| |
|
| | - The model requires significant memory for inference, especially with longer contexts |
| | - Performance is highly dependent on the device's Neural Engine capabilities |
| | - The default configuration supports a context length of 64 tokens, but this can be adjusted |
| |
|
| | ## License |
| |
|
| | This model conversion inherits the license of the original DeepSeek-R1-Distill-Llama-8B model. |
| |
|
| | ## Acknowledgments |
| |
|
| | - [DeepSeek-AI](https://github.com/deepseek-ai) for creating and releasing the original model |
| | - [Hugging Face](https://huggingface.co/) for hosting the model and providing the Transformers library |
| | - Apple for developing the CoreML framework |
| |
|
| | ## Citation |
| |
|
| | If you use this model in your research, please cite both the original DeepSeek model and this conversion. |