Qwen-Ar-GEC-4bit / README.md
Abdo-Alshoki's picture
Update README.md
fdd4a9f verified
---
library_name: transformers
tags: []
---
# Qwen-Ar-GEC-4bit
# Qwen-Ar-GEC-4bit
This is a quantized version of **[Qwen-Ar-GEC](https://huggingface.co/CUAIStudents/Qwen-Ar-GEC)**.
It is smaller in size and optimized for GPU VRAM efficiency.
For usage examples, please refer to the original **[Qwen-Ar-GEC](https://huggingface.co/CUAIStudents/Qwen-Ar-GEC)** model card.
Both models are functionally identical, but when loading the 4-bit version you may need to include the following configuration:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
model_name = "Abdo-Alshoki/qwen-ar-gec-v2-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map="auto",
torch_dtype=torch.bfloat16,
)
```
⚠️ Note: The model is already quantized. Including the configuration ensures it is loaded correctly and runs as expected.