Qwen2.5-0.5B - AWQ Quantized Model
This is a 4-bit AWQ quantized model optimized for GPU serving.
Model Details
- Base Model: Qwen/Qwen2.5-0.5B
- Quantization: AWQ 4-bit
- Framework: vLLM, TGI, transformers
Usage with vLLM
vllm serve BondingAI/Qwen2.5-0.5B-awq-4bit
Usage with transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("BondingAI/Qwen2.5-0.5B-awq-4bit", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("BondingAI/Qwen2.5-0.5B-awq-4bit")
License
Please refer to the original model card for licensing information.
- Downloads last month
- 9
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support
Model tree for BondingAI/Qwen2.5-0.5B-awq-4bit
Base model
Qwen/Qwen2.5-0.5B