Cydonia-24B-v4.3-AWQ
AWQ 4-bit quantization of TheDrummer/Cydonia-24B-v4.3.
Quantization Details
- Method: AWQ (Activation-aware Weight Quantization)
- Bits: 4-bit
- Group size: 128
- Version: GEMM
- Zero point: True
- Model size: ~14 GB (vs ~48 GB FP16)
Usage
Works with vLLM, Transformers, and other AWQ-compatible inference engines.
# vLLM
vllm serve Irvollo/Cydonia-24B-v4.3-AWQ --quantization awq_marlin --dtype float16
# Transformers
from awq import AutoAWQForCausalLM
model = AutoAWQForCausalLM.from_quantized("Irvollo/Cydonia-24B-v4.3-AWQ")
Original Model
Cydonia v4.3 by TheDrummer — a Mistral Small 3.1 24B fine-tune optimized for roleplay and creative writing.
Hardware Requirements
- Minimum: 16 GB VRAM (RTX 4090, A100, etc.)
- Recommended: 24 GB VRAM for comfortable KV cache headroom
- Downloads last month
- 511
Model tree for tacodevs/Cydonia-24B-v4.3-AWQ
Base model
mistralai/Mistral-Small-3.1-24B-Base-2503 Finetuned
TheDrummer/Cydonia-24B-v4.3