Mistral 7B Instruct v0.3 — GPTQ 4-bit
Self-quantized GPTQ 4-bit checkpoint of mistralai/Mistral-7B-Instruct-v0.3 with fully documented calibration provenance.
Created as part of the Banterhearts research program investigating quality-safety correlation under quantization for consumer LLM deployment.
| Base model | mistralai/Mistral-7B-Instruct-v0.3 |
| Parameters | 7.24B |
| Architecture | GQA, 32 layers, 32 heads, 8 KV heads |
| Quantization | GPTQ 4-bit, group_size=128 |
| Model size | 3.9 GB |
| VRAM required | ~5 GB (inference) |
Quantization Details
| Parameter | Value |
|---|---|
| Method | GPTQ |
| Tool | gptqmodel |
| Bits | 4 |
| Group size | 128 |
| Scheme | Symmetric (4-bit, INT32 packing) |
| Calibration dataset | allenai/c4 (en, shard 1 of 1024) |
| Calibration samples | 128 |
| Seed | 42 |
| Quantization time | 542s |
| Hardware | RunPod RTX 6000 Ada (48 GB) |
Why Self-Quantized?
Pre-quantized checkpoints on HuggingFace typically have unknown calibration provenance — the dataset, sample count, seed, and group size are rarely documented. This checkpoint was self-quantized with controlled, documented settings to enable rigorous cross-method comparison (GGUF k-quant vs AWQ vs GPTQ) in a NeurIPS 2026 submission on quality-safety correlation under quantization.
Evaluation Results
Evaluation pending — quality and safety benchmarks will be run on this checkpoint and results updated here.
Other Quantization Formats
| Format | Repository |
|---|---|
| Original FP16 | mistralai/Mistral-7B-Instruct-v0.3 |
| AWQ 4-bit | Crusadersk/mistral-7b-awq-4bit |
Prompt Template
[INST] {prompt} [/INST]
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"Crusadersk/mistral-7b-gptq-4bit",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Crusadersk/mistral-7b-gptq-4bit")
messages = [{"role": "user", "content": "What is the capital of France?"}]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(inputs, max_new_tokens=64, do_sample=False)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Inference requirements: pip install gptqmodel (Linux only) or optimum+auto-gptq
Windows users: GPTQ inference requires gptqmodel which only builds on Linux. Use Docker or WSL2.
Compatibility
| Framework | Supported |
|---|---|
| Transformers | Yes |
| vLLM | Yes (GPTQ backend) |
| llama.cpp | No (use GGUF format instead) |
| Ollama | No (use GGUF format instead) |
| Windows (native) | No — requires Linux/Docker |
Reproduction
The full quantization pipeline — Dockerfiles, quantization scripts, and a 766-line engineering log documenting every platform failure and solution — is available at:
research/tr142/expansion/
in the Banterhearts repository.
Citation
@misc{banterhearts2026mistral7bgptq,
title = {Self-Quantized Mistral 7B Instruct v0.3 (GPTQ 4-bit) for Quality-Safety Correlation Research},
author = {Kadadekar, Sahil},
year = {2026},
url = {https://huggingface.co/Crusadersk/mistral-7b-gptq-4bit},
note = {Part of the Banterhearts research program. NeurIPS 2026 submission.}
}
Acknowledgments
This work is part of a 40-TR research program on consumer LLM deployment safety, conducted independently as pre-doctoral research. Full program details at github.com/Sahil170595/Banterhearts.
- Downloads last month
- 22
Model tree for Crusadersk/mistral-7b-gptq-4bit
Base model
mistralai/Mistral-7B-v0.3