Cydonia-24B-v4.3-AWQ

AWQ 4-bit quantization of TheDrummer/Cydonia-24B-v4.3.

Quantization Details

Method: AWQ (Activation-aware Weight Quantization)
Bits: 4-bit
Group size: 128
Version: GEMM
Zero point: True
Model size: ~14 GB (vs ~48 GB FP16)

Usage

Works with vLLM, Transformers, and other AWQ-compatible inference engines.

# vLLM
vllm serve Irvollo/Cydonia-24B-v4.3-AWQ --quantization awq_marlin --dtype float16

# Transformers
from awq import AutoAWQForCausalLM
model = AutoAWQForCausalLM.from_quantized("Irvollo/Cydonia-24B-v4.3-AWQ")

Original Model

Cydonia v4.3 by TheDrummer — a Mistral Small 3.1 24B fine-tune optimized for roleplay and creative writing.

Hardware Requirements

Minimum: 16 GB VRAM (RTX 4090, A100, etc.)
Recommended: 24 GB VRAM for comfortable KV cache headroom

Downloads last month: 511

Safetensors

Model size

24B params

Tensor type

I32

BF16

Model tree for tacodevs/Cydonia-24B-v4.3-AWQ

Base model

mistralai/Mistral-Small-3.1-24B-Base-2503

Finetuned

mistralai/Mistral-Small-3.2-24B-Instruct-2506

Finetuned

TheDrummer/Cydonia-24B-v4.3

Quantized

(29)

this model