Qwen2.5 โ€“ Internal Audit Q&A (Quantized GGUF)

This repository contains quantized GGUF-format variants of a fine-tuned Qwen 2.5 model, specialized for question answering (Q&A) on internal audit data.

These models are optimized for efficient deployment in environments using llama.cpp, llama-cpp-python, or compatible inference servers (e.g., llama-server, text-generation-webui).

Fine-Tuning Overview

  • Base Model: Qwen2.5 7B
  • Fine-Tuning Task: Instruction-based Q&A on internal audit reports, policies, and compliance logs
  • Training Data: ~100k entries from anonymized internal audit datasets (private & proprietary)
  • Format: Chat-style instruction tuning with questions and detailed answers

๐Ÿ—ƒ๏ธ Quantized Variants

Filename Quantization Description
model-Q3_K_M.gguf Q3_K_M 3-bit quantization โ€” low memory footprint
model-Q4_K_M.gguf Q4_K_M 4-bit โ€” good performance and efficiency
model-Q5_K_M.gguf Q5_K_M 5-bit โ€” balance between performance and quality
model-Q6_K.gguf Q6_K 6-bit โ€” high quality, higher RAM usage
model-Q8_0.gguf Q8_0 8-bit โ€” near original model fidelity
model-fp16.gguf FP16 Full precision โ€” highest quality, requires GPU

ChatML Format

Token structure

Each message in the conversation is wrapped like this:

<|im_start|>{role}
{message content}
<|im_end|>
  • {role} is usually system, user, or assistant
  • This clearly defines message boundaries for the model to interpret dialogue turns
Downloads last month
1
GGUF
Model size
8B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for kturki/qwen2.5-7B_internal_audit

Base model

Qwen/Qwen2.5-7B
Quantized
(242)
this model