nanzhang
/

QuantLRM-R1-Llama-70B-3-bit

@@ -4,7 +4,11 @@ tags:
 - 3-bit
 - Quantization
 - Pseudo-Quantization
 ---
 # QuantLRM-R1-Llama-70B-3-bit
 3-bit quantized `DeepSeek-R1-Distill-Llama-70B` based on [QuantLRM](https://www.arxiv.org/abs/2602.02581), a state-of-the-art quantization method of large reasoning models via fine-tuning signals
@@ -15,15 +19,12 @@ This is the pseudo-quantized model (weights are dequantized back to full-precisi
 ### Model Description
-<!-- Provide a longer summary of what this model is. -->
 - **Developed by:** Nan Zhang (njz5124@psu.edu)
 - **Model type:** 3-bit pseudo-quantized version of `DeepSeek-R1-Distill-Llama-70B`
 ### Model Sources
-<!-- Provide the basic links for the model. -->
 - **Repository:** https://github.com/psunlpgroup/QuantLRM
 - **Paper:** https://www.arxiv.org/abs/2602.02581
@@ -31,7 +32,6 @@ This is the pseudo-quantized model (weights are dequantized back to full-precisi
 ## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
 This model is designed to be used with `vLLM` due to its inference optimization. Please use the tokenizer of `deepseek-ai/DeepSeek-R1-Distill-Llama-70B`.
@@ -49,7 +49,6 @@ This model achieves 2.12% improvement (based on average scores of various reason
 ## Citation
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
 **BibTeX:**

 - 3-bit
 - Quantization
 - Pseudo-Quantization
+pipeline_tag: text-generation
+library_name: transformers
+base_model: deepseek-ai/DeepSeek-R1-Distill-Llama-70B
 ---
 # QuantLRM-R1-Llama-70B-3-bit
 3-bit quantized `DeepSeek-R1-Distill-Llama-70B` based on [QuantLRM](https://www.arxiv.org/abs/2602.02581), a state-of-the-art quantization method of large reasoning models via fine-tuning signals
 ### Model Description
 - **Developed by:** Nan Zhang (njz5124@psu.edu)
 - **Model type:** 3-bit pseudo-quantized version of `DeepSeek-R1-Distill-Llama-70B`
 ### Model Sources
 - **Repository:** https://github.com/psunlpgroup/QuantLRM
 - **Paper:** https://www.arxiv.org/abs/2602.02581
 ## Uses
 This model is designed to be used with `vLLM` due to its inference optimization. Please use the tokenizer of `deepseek-ai/DeepSeek-R1-Distill-Llama-70B`.
 ## Citation
 **BibTeX:**