Qwen2.5 โ Internal Audit Q&A (Quantized GGUF)
This repository contains quantized GGUF-format variants of a fine-tuned Qwen 2.5 model, specialized for question answering (Q&A) on internal audit data.
These models are optimized for efficient deployment in environments using llama.cpp, llama-cpp-python, or compatible inference servers (e.g., llama-server, text-generation-webui).
Fine-Tuning Overview
- Base Model: Qwen2.5 7B
- Fine-Tuning Task: Instruction-based Q&A on internal audit reports, policies, and compliance logs
- Training Data: ~100k entries from anonymized internal audit datasets (private & proprietary)
- Format: Chat-style instruction tuning with questions and detailed answers
๐๏ธ Quantized Variants
| Filename | Quantization | Description |
|---|---|---|
model-Q3_K_M.gguf |
Q3_K_M | 3-bit quantization โ low memory footprint |
model-Q4_K_M.gguf |
Q4_K_M | 4-bit โ good performance and efficiency |
model-Q5_K_M.gguf |
Q5_K_M | 5-bit โ balance between performance and quality |
model-Q6_K.gguf |
Q6_K | 6-bit โ high quality, higher RAM usage |
model-Q8_0.gguf |
Q8_0 | 8-bit โ near original model fidelity |
model-fp16.gguf |
FP16 | Full precision โ highest quality, requires GPU |
ChatML Format
Token structure
Each message in the conversation is wrapped like this:
<|im_start|>{role}
{message content}
<|im_end|>
{role}is usuallysystem,user, orassistant- This clearly defines message boundaries for the model to interpret dialogue turns
- Downloads last month
- 1
Hardware compatibility
Log In
to view the estimation