File size: 1,164 Bytes
714acc5 fdd4a9f 714acc5 fdd4a9f 714acc5 fdd4a9f 714acc5 fdd4a9f 714acc5 fdd4a9f 714acc5 fdd4a9f 714acc5 fdd4a9f 714acc5 fdd4a9f 714acc5 fdd4a9f 714acc5 fdd4a9f 714acc5 fdd4a9f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 |
---
library_name: transformers
tags: []
---
# Qwen-Ar-GEC-4bit
# Qwen-Ar-GEC-4bit
This is a quantized version of **[Qwen-Ar-GEC](https://huggingface.co/CUAIStudents/Qwen-Ar-GEC)**.
It is smaller in size and optimized for GPU VRAM efficiency.
For usage examples, please refer to the original **[Qwen-Ar-GEC](https://huggingface.co/CUAIStudents/Qwen-Ar-GEC)** model card.
Both models are functionally identical, but when loading the 4-bit version you may need to include the following configuration:
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
model_name = "Abdo-Alshoki/qwen-ar-gec-v2-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
device_map="auto",
torch_dtype=torch.bfloat16,
)
```
⚠️ Note: The model is already quantized. Including the configuration ensures it is loaded correctly and runs as expected. |