doubleblind
/

DeepSeek-R1-Distill-QweNSA-1.5B

Model card Files Files and versions

DeepSeek-R1-Distill-QweNSA-1.5B / README.md

doubleblind's picture

Update README.md

6d6161d verified 9 months ago

|

history blame contribute delete

1.19 kB

	---
	license: apache-2.0
	datasets:
	- zwhe99/DeepMath-103K
	language:
	- en
	---

	## Quick Start

	This repository contains remote code and weights for a Native Sparse Attention distillation of [DeepSeek-R1-Distill-Qwen-1.5B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B), distilled on mathematical reasoning data. Our parameter naming scheme refers to the parameter count of the teacher model

	### Installation

	To use this model, please ensure the following dependencies are installed:

	#### Install the required Native Sparse Attention library from our custom fork:
	```bash
	pip install git+https://github.com/fnite1604/native-sparse-attention-pytorch.git
	```

	#### Install standard dependencies:
	```bash
	pip install transformers torch ...
	```

	Note: We recommend using the latest stable of Pytorch (currently 2.7.0) with CUDA 12.6 and the latest available version of Transformers

	### Example Usage

	A `quick_start.py` script is included to help you get started with inference:

	```bash
	python quick_start.py
	```

	This will load the model and generate text based on a predefined prompt (`"What is 1 + 1?"`) using our Native Sparse Attention enabled reasoning model.