|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: Qwen/Qwen3-0.6B |
|
|
tags: |
|
|
- quantization |
|
|
- neural-compressor |
|
|
- qat |
|
|
- quantization-aware-training |
|
|
- qwen3 |
|
|
library_name: transformers |
|
|
pipeline_tag: text-generation |
|
|
--- |
|
|
|
|
|
# Qwen3-0.6B Quantized with QAT |
|
|
|
|
|
This model is a quantized version of `Qwen/Qwen3-0.6B` using **Quantization Aware Training (QAT)** with Intel Neural Compressor. |
|
|
|
|
|
## π Model Details |
|
|
|
|
|
- **Base Model**: Qwen/Qwen3-0.6B |
|
|
- **Quantization Method**: Quantization Aware Training (QAT) |
|
|
- **Framework**: Intel Neural Compressor |
|
|
- **Model Size**: Significantly reduced from original |
|
|
- **Performance**: Maintains quality while improving efficiency |
|
|
|
|
|
## π Benefits |
|
|
|
|
|
β
**Smaller model size** - Reduced storage requirements |
|
|
β
**Faster inference** - Optimized for deployment |
|
|
β
**Lower memory usage** - More efficient resource utilization |
|
|
β
**Maintained quality** - QAT preserves model performance |
|
|
|
|
|
## π» Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
|
|
# Load the quantized model |
|
|
model = AutoModelForCausalLM.from_pretrained("Thomaschtl/qwen3-0.6b-qat-test") |
|
|
tokenizer = AutoTokenizer.from_pretrained("Thomaschtl/qwen3-0.6b-qat-test") |
|
|
|
|
|
# Generate text |
|
|
prompt = "The future of AI is" |
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
outputs = model.generate(**inputs, max_length=100, do_sample=True, temperature=0.7) |
|
|
|
|
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
|
``` |
|
|
|
|
|
## βοΈ Quantization Details |
|
|
|
|
|
- **Training Method**: Quantization Aware Training |
|
|
- **Optimizer**: AdamW |
|
|
- **Learning Rate**: 5e-5 |
|
|
- **Batch Size**: 2 |
|
|
- **Epochs**: 1 (demo configuration) |
|
|
|
|
|
## π§ Technical Info |
|
|
|
|
|
This model was quantized using Intel Neural Compressor's QAT approach, which: |
|
|
1. Simulates quantization during training |
|
|
2. Allows model weights to adapt to quantization |
|
|
3. Maintains better accuracy than post-training quantization |
|
|
|
|
|
## π Citation |
|
|
|
|
|
If you use this model, please cite: |
|
|
|
|
|
``` |
|
|
@misc{qwen3-qat, |
|
|
title={Qwen3-0.6B Quantized with QAT}, |
|
|
author={Thomaschtl}, |
|
|
year={2025}, |
|
|
publisher={Hugging Face}, |
|
|
url={https://huggingface.co/Thomaschtl/qwen3-0.6b-qat-test} |
|
|
} |
|
|
``` |
|
|
|
|
|
## βοΈ License |
|
|
|
|
|
This model follows the same license as the base model (Apache 2.0). |
|
|
|