Thomaschtl commited on
Commit
697232e
Β·
verified Β·
1 Parent(s): bc7e912

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ base_model: Qwen/Qwen3-0.6B
4
+ tags:
5
+ - quantization
6
+ - neural-compressor
7
+ - qat
8
+ - quantization-aware-training
9
+ - qwen3
10
+ library_name: transformers
11
+ pipeline_tag: text-generation
12
+ ---
13
+
14
+ # Qwen3-0.6B Quantized with QAT
15
+
16
+ This model is a quantized version of `Qwen/Qwen3-0.6B` using **Quantization Aware Training (QAT)** with Intel Neural Compressor.
17
+
18
+ ## πŸš€ Model Details
19
+
20
+ - **Base Model**: Qwen/Qwen3-0.6B
21
+ - **Quantization Method**: Quantization Aware Training (QAT)
22
+ - **Framework**: Intel Neural Compressor
23
+ - **Model Size**: Significantly reduced from original
24
+ - **Performance**: Maintains quality while improving efficiency
25
+
26
+ ## πŸ“Š Benefits
27
+
28
+ βœ… **Smaller model size** - Reduced storage requirements
29
+ βœ… **Faster inference** - Optimized for deployment
30
+ βœ… **Lower memory usage** - More efficient resource utilization
31
+ βœ… **Maintained quality** - QAT preserves model performance
32
+
33
+ ## πŸ’» Usage
34
+
35
+ ```python
36
+ from transformers import AutoModelForCausalLM, AutoTokenizer
37
+
38
+ # Load the quantized model
39
+ model = AutoModelForCausalLM.from_pretrained("Thomaschtl/qwen3-0.6b-qat-test")
40
+ tokenizer = AutoTokenizer.from_pretrained("Thomaschtl/qwen3-0.6b-qat-test")
41
+
42
+ # Generate text
43
+ prompt = "The future of AI is"
44
+ inputs = tokenizer(prompt, return_tensors="pt")
45
+ outputs = model.generate(**inputs, max_length=100, do_sample=True, temperature=0.7)
46
+
47
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
48
+ ```
49
+
50
+ ## βš™οΈ Quantization Details
51
+
52
+ - **Training Method**: Quantization Aware Training
53
+ - **Optimizer**: AdamW
54
+ - **Learning Rate**: 5e-5
55
+ - **Batch Size**: 2
56
+ - **Epochs**: 1 (demo configuration)
57
+
58
+ ## πŸ”§ Technical Info
59
+
60
+ This model was quantized using Intel Neural Compressor's QAT approach, which:
61
+ 1. Simulates quantization during training
62
+ 2. Allows model weights to adapt to quantization
63
+ 3. Maintains better accuracy than post-training quantization
64
+
65
+ ## πŸ“ Citation
66
+
67
+ If you use this model, please cite:
68
+
69
+ ```
70
+ @misc{qwen3-qat,
71
+ title={Qwen3-0.6B Quantized with QAT},
72
+ author={Thomaschtl},
73
+ year={2025},
74
+ publisher={Hugging Face},
75
+ url={https://huggingface.co/Thomaschtl/qwen3-0.6b-qat-test}
76
+ }
77
+ ```
78
+
79
+ ## βš–οΈ License
80
+
81
+ This model follows the same license as the base model (Apache 2.0).