Qwen2.5-7B-Instruct-4bit
Overview
This repository contains a 4-bit quantized version of the Qwen2.5-7B-Instruct model.
The quantization was performed using the bitsandbytes library with NF4 (4-bit NormalFloat) format to ensure high precision while significantly reducing the VRAM footprint.
Model Details
- Developed by: Qwen Team (Quantized by Pxsoone)
- Architecture: Qwen2.5 (Causal Language Model)
- Quantization Method:
bitsandbytes4-bit (NF4) - Compute Precision:
bfloat16orfloat16 - VRAM Required: ~5.5GB - 6GB (Ideal for 8GB GPUs)
- Base Model: Qwen/Qwen2.5-7B-Instruct
Key Improvements in Qwen2.5
Qwen2.5 brings significant advancements over previous versions:
- Better knowledge density and coding/mathematical capabilities.
- Improved instruction following.
- Support for long contexts (up to 128K tokens, though quantization may affect this slightly).
- Multilingual support (English, Russian, Chinese, and many more).
Usage
To run this model, you need to have transformers, bitsandbytes, and accelerate installed:
pip install -U transformers bitsandbytes accelerate
- Downloads last month
- 23