| ---
|
| license: mit
|
| datasets:
|
| - fka/awesome-chatgpt-prompts
|
| language:
|
| - zho
|
| - eng
|
| - fra
|
| - spa
|
| - por
|
| - deu
|
| - ita
|
| - rus
|
| - jpn
|
| - kor
|
| - vie
|
| - tha
|
| - ara
|
| base_model:
|
| - Qwen/Qwen2.5-1.5B-Instruct
|
| pipeline_tag: text-generation
|
| ---
|
| # Quantized Qwen2.5-1.5B-Instruct
|
|
|
| This repository contains 8-bit and 4-bit quantized versions of the Qwen2.5-1.5B-Instruct model using GPTQ. Quantization significantly reduces the model's size and memory footprint, enabling faster inference on resource-constrained devices while maintaining reasonable performance.
|
|
|
|
|
| ## Model Description
|
|
|
| The Qwen2.5-1.5B-Instruct is a powerful language model developed by Qwen for instructional tasks. These quantized versions offer a more efficient way to deploy and utilize this model.
|
|
|
|
|
| ## Quantization Details
|
|
|
| * **Quantization Method:** GPTQ (Generative Pretrained Transformer Quantization)
|
| * **Quantization Bits:** 8-bit and 4-bit versions available.
|
| * **Dataset:** The model was quantized using a subset of the "fka/awesome-chatgpt-prompts" dataset.
|
|
|
|
|
| ## Usage
|
|
|
| To use the quantized models, follow these steps:
|
|
|
| **Install Dependencies:**
|
| ```bash
|
| pip install transformers accelerate bitsandbytes auto-gptq optimum
|
| ```
|
| ## Performance
|
|
|
| The quantized models offer a significant reduction in size and memory usage compared to the original model. While there might be a slight decrease in performance, the trade-off is often beneficial for deployment on devices with limited resources.
|
|
|
|
|
| ## Disclaimer
|
|
|
| These quantized models are provided for research and experimentation purposes. We do not guarantee their performance or suitability for specific applications.
|
|
|
|
|
| ## Acknowledgements
|
|
|
| * **Qwen:** For developing the original Qwen2.5-1.5B-Instruct model.
|
| * **Hugging Face:** For providing the platform and tools for model sharing and quantization.
|
| * **GPTQ Authors:** For developing the GPTQ quantization method. |