|
|
--- |
|
|
library_name: transformers |
|
|
tags: [] |
|
|
--- |
|
|
|
|
|
# Qwen-Ar-GEC-4bit |
|
|
|
|
|
# Qwen-Ar-GEC-4bit |
|
|
|
|
|
This is a quantized version of **[Qwen-Ar-GEC](https://huggingface.co/CUAIStudents/Qwen-Ar-GEC)**. |
|
|
It is smaller in size and optimized for GPU VRAM efficiency. |
|
|
|
|
|
For usage examples, please refer to the original **[Qwen-Ar-GEC](https://huggingface.co/CUAIStudents/Qwen-Ar-GEC)** model card. |
|
|
Both models are functionally identical, but when loading the 4-bit version you may need to include the following configuration: |
|
|
|
|
|
```python |
|
|
|
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig |
|
|
|
|
|
bnb_config = BitsAndBytesConfig( |
|
|
load_in_4bit=True, |
|
|
bnb_4bit_use_double_quant=True, |
|
|
bnb_4bit_quant_type="nf4", |
|
|
bnb_4bit_compute_dtype=torch.bfloat16, |
|
|
) |
|
|
|
|
|
model_name = "Abdo-Alshoki/qwen-ar-gec-v2-4bit" |
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
|
|
model = AutoModelForCausalLM.from_pretrained( |
|
|
model_name, |
|
|
quantization_config=bnb_config, |
|
|
device_map="auto", |
|
|
torch_dtype=torch.bfloat16, |
|
|
) |
|
|
|
|
|
``` |
|
|
|
|
|
⚠️ Note: The model is already quantized. Including the configuration ensures it is loaded correctly and runs as expected. |