doubleblind
/

DeepSeek-R1-Distill-QweNSA-1.5B

Model card Files Files and versions

doubleblind commited on May 23, 2025

Commit

6dabb43

·

verified ·

1 Parent(s): 0d5c998

Update README.md

Files changed (1) hide show

README.md +34 -1

README.md CHANGED Viewed

@@ -4,4 +4,37 @@ datasets:
 - zwhe99/DeepMath-103K
 language:
 - en
----

 - zwhe99/DeepMath-103K
 language:
 - en
+---
+## Quick Start
+This repository contains remote code and weights for a **Native Sparse Attention** distillation of [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-RL-Distill-Qwen-1.5B), distilled on mathematical reasoning data.
+### Installation
+To use this model, please ensure the following dependencies are installed:
+#### 1. Install the required sparse attention library from our custom fork:
+```bash
+pip install git+https://github.com/fnite1604/native-sparse-attention-pytorch.git
+```
+#### 2. Install other standard dependencies:
+These are handled automatically by the Transformers library and include:
+```bash
+pip install transformers torch
+```
+Note: We recommend using Python 3.8+ and PyTorch 2.0+ for compatibility.
+### Example Usage
+A `quick_start.py` script is included to help you get started with inference:
+```bash
+python quick_start.py
+```
+This will load the model and generate text based on a predefined prompt using Native Sparse Attention.