CyberSecQwen-4B-AWQ

4-bit AWQ quantized version of CyberSecQwen-4B.

Quantization

Parameter Value
Method AWQ (group_size=128, zero_point=True)
Weight precision 4-bit
Compute dtype float16
Calibration samples 320 CTI-Bench prompts (256 RCM + 64 MCQ, chat-template formatted)
Quantization tool autoawq
Calibration hardware Modal A100

CTI-Bench Evaluation

Evaluated under the Foundation-Sec-8B protocol:

  • Temperature 0.3, max_tokens 512, concurrency 32
  • 5 independent trials, zero-shot (no system prompt)
  • vLLM v0.20.1 with awq_marlin kernel on Modal L4 GPU
Task AWQ 4-bit GGUF Q4_K_M FP16 Reference
CTI-MCQ (2,500 items) 0.5921 ± 0.0083 0.5368 ± 0.0048 0.5868 ± 0.0029
CTI-RCM (1,000 items) 0.5814 ± 0.0025 0.6254 ± 0.0063 0.6664 ± 0.0023

Key findings:

  • CTI-MCQ: AWQ 4-bit matches or slightly exceeds FP16 performance (+0.5 points). Better than GGUF Q4_K_M.
  • CTI-RCM: AWQ 4-bit degrades by 8.5 percentage points vs FP16. GGUF Q4_K_M does better on this task (-4.1 pts).
  • AWQ is best for MCQ (general language), GGUF is best for RCM (task-specific classification).

Trial results

CTI-MCQ

Trial Seed Accuracy
1 42 0.6016
2 43 0.5984
3 44 0.5936
4 45 0.5780
5 46 0.5888

CTI-MCQ

Trial Seed Accuracy
1 42 0.6016
2 43 0.5984
3 44 0.5936
4 45 0.5780
5 46 0.5888

CTI-RCM

Trial Seed Accuracy
1 42 0.5790
2 43 0.5830
3 44 0.5790
4 45 0.5840
5 46 0.5820

Quantization variants

Variant CTI-MCQ CTI-RCM Size Engine
AWQ 4-bit 0.5921 0.5814 2.7 GB vLLM
GGUF Q4_K_M 0.5368 0.6254 2.5 GB llama.cpp

Choose AWQ for MCQ/general chat, GGUF for vulnerability classification.

Usage with vLLM

vllm serve ree2raz/CyberSecQwen-4B-AWQ --quantization awq_marlin --dtype float16

Model Size

Format Size
Original FP16 ~8 GB
AWQ 4-bit ~2.7 GB

Citation

@misc{{cybersecqwen2026,
  title  = {{CyberSecQwen-4B: A Compact CTI Specialist Fine-Tuned from Qwen3-4B-Instruct-2507 on AMD MI300X}},
  author = {{Mulia, Samuel}},
  year   = {{2026}},
  publisher = {{Hugging Face}},
  url    = {{https://huggingface.co/athena129/CyberSecQwen-4B}}
}}

Evaluation Infrastructure

GitHub repository — Modal scripts for quantization + evaluation.

Downloads last month
162
Safetensors
Model size
4B params
Tensor type
I32
·
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ree2raz/CyberSecQwen-4B-AWQ

Paper for ree2raz/CyberSecQwen-4B-AWQ