CUAIStudents
/

Qwen-Ar-GEC-4bit

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions

Qwen-Ar-GEC-4bit / README.md

Abdo-Alshoki's picture

Update README.md

fdd4a9f verified 4 months ago

|

history blame contribute delete

1.16 kB

	---
	library_name: transformers
	tags: []
	---

	# Qwen-Ar-GEC-4bit

	# Qwen-Ar-GEC-4bit

	This is a quantized version of [Qwen-Ar-GEC](https://huggingface.co/CUAIStudents/Qwen-Ar-GEC).
	It is smaller in size and optimized for GPU VRAM efficiency.

	For usage examples, please refer to the original [Qwen-Ar-GEC](https://huggingface.co/CUAIStudents/Qwen-Ar-GEC) model card.
	Both models are functionally identical, but when loading the 4-bit version you may need to include the following configuration:

	```python

	from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

	bnb_config = BitsAndBytesConfig(
	load_in_4bit=True,
	bnb_4bit_use_double_quant=True,
	bnb_4bit_quant_type="nf4",
	bnb_4bit_compute_dtype=torch.bfloat16,
	)

	model_name = "Abdo-Alshoki/qwen-ar-gec-v2-4bit"

	tokenizer = AutoTokenizer.from_pretrained(model_name)

	model = AutoModelForCausalLM.from_pretrained(
	model_name,
	quantization_config=bnb_config,
	device_map="auto",
	torch_dtype=torch.bfloat16,
	)

	```

	⚠️ Note: The model is already quantized. Including the configuration ensures it is loaded correctly and runs as expected.