Thomaschtl
/

test3

Text Generation

neural-compressor

quantization-aware-training

text-generation-inference

Model card Files Files and versions

test3 / README.md

Thomaschtl's picture

Upload README.md with huggingface_hub

1c13ab0 verified 7 months ago

|

history blame contribute delete

2.21 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3-0.6B
	tags:
	- quantization
	- neural-compressor
	- qat
	- quantization-aware-training
	- qwen3
	library_name: transformers
	pipeline_tag: text-generation
	---

	# Qwen3-0.6B Quantized with QAT

	This model is a quantized version of `Qwen/Qwen3-0.6B` using Quantization Aware Training (QAT) with Intel Neural Compressor.

	## 🚀 Model Details

	- Base Model: Qwen/Qwen3-0.6B
	- Quantization Method: Quantization Aware Training (QAT)
	- Framework: Intel Neural Compressor
	- Model Size: Significantly reduced from original
	- Performance: Maintains quality while improving efficiency

	## 📊 Benefits

	✅ Smaller model size - Reduced storage requirements
	✅ Faster inference - Optimized for deployment
	✅ Lower memory usage - More efficient resource utilization
	✅ Maintained quality - QAT preserves model performance

	## 💻 Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	# Load the quantized model
	model = AutoModelForCausalLM.from_pretrained("Thomaschtl/qwen3-0.6b-qat-test")
	tokenizer = AutoTokenizer.from_pretrained("Thomaschtl/qwen3-0.6b-qat-test")

	# Generate text
	prompt = "The future of AI is"
	inputs = tokenizer(prompt, return_tensors="pt")
	outputs = model.generate(**inputs, max_length=100, do_sample=True, temperature=0.7)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	## ⚙️ Quantization Details

	- Training Method: Quantization Aware Training
	- Optimizer: AdamW
	- Learning Rate: 5e-5
	- Batch Size: 2
	- Epochs: 1 (demo configuration)

	## 🔧 Technical Info

	This model was quantized using Intel Neural Compressor's QAT approach, which:
	1. Simulates quantization during training
	2. Allows model weights to adapt to quantization
	3. Maintains better accuracy than post-training quantization

	## 📝 Citation

	If you use this model, please cite:

	```
	@misc{qwen3-qat,
	title={Qwen3-0.6B Quantized with QAT},
	author={Thomaschtl},
	year={2025},
	publisher={Hugging Face},
	url={https://huggingface.co/Thomaschtl/qwen3-0.6b-qat-test}
	}
	```

	## ⚖️ License

	This model follows the same license as the base model (Apache 2.0).