Cydonia-24B-v4.3-AWQ

AWQ 4-bit quantization of TheDrummer/Cydonia-24B-v4.3.

Quantization Details

  • Method: AWQ (Activation-aware Weight Quantization)
  • Bits: 4-bit
  • Group size: 128
  • Version: GEMM
  • Zero point: True
  • Model size: ~14 GB (vs ~48 GB FP16)

Usage

Works with vLLM, Transformers, and other AWQ-compatible inference engines.

# vLLM
vllm serve Irvollo/Cydonia-24B-v4.3-AWQ --quantization awq_marlin --dtype float16

# Transformers
from awq import AutoAWQForCausalLM
model = AutoAWQForCausalLM.from_quantized("Irvollo/Cydonia-24B-v4.3-AWQ")

Original Model

Cydonia v4.3 by TheDrummer — a Mistral Small 3.1 24B fine-tune optimized for roleplay and creative writing.

Hardware Requirements

  • Minimum: 16 GB VRAM (RTX 4090, A100, etc.)
  • Recommended: 24 GB VRAM for comfortable KV cache headroom
Downloads last month
511
Safetensors
Model size
24B params
Tensor type
I32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tacodevs/Cydonia-24B-v4.3-AWQ