Llama 3.2 1B Instruct — AWQ 4-bit
Self-quantized AWQ 4-bit checkpoint of meta-llama/Llama-3.2-1B-Instruct with fully documented calibration provenance.
Created as part of the Banterhearts research program investigating quality-safety correlation under quantization for consumer LLM deployment.
| Base model | meta-llama/Llama-3.2-1B-Instruct |
| Parameters | 1.24B |
| Architecture | GQA, 22 layers |
| Quantization | AWQ 4-bit, group_size=128 |
| Model size | 1.0 GB |
| VRAM required | ~1.5 GB (inference) |
Quantization Details
| Parameter | Value |
|---|---|
| Method | AWQ |
| Tool | llmcompressor 0.10.0.1 |
| Bits | 4 |
| Group size | 128 |
| Scheme | W4A16_ASYM (asymmetric weights, FP16 activations) |
| Calibration dataset | Salesforce/wikitext (wikitext-103-raw-v1) |
| Calibration samples | 128 |
| Seed | 42 |
| Quantization time | 2622s |
| Hardware | NVIDIA RTX 4080 Laptop (12 GB) via Docker |
Why Self-Quantized?
Pre-quantized checkpoints on HuggingFace typically have unknown calibration provenance — the dataset, sample count, seed, and group size are rarely documented. This checkpoint was self-quantized with controlled, documented settings to enable rigorous cross-method comparison (GGUF k-quant vs AWQ vs GPTQ) in a NeurIPS 2026 submission on quality-safety correlation under quantization.
Evaluation Results
Evaluated on 735 quality samples across 7 tasks and 468 safety samples judged by gemma3:12b.
Quality Metrics (generation tasks)
| Metric | Score |
|---|---|
| BERTScore (F1) | 0.729 |
| ROUGE-L | 0.535 |
| Coherence | 0.758 |
Accuracy (capability tasks)
| Task | Accuracy |
|---|---|
| MMLU | 43.5% |
| ARC Challenge | 45.5% |
| Classification | 68.0% |
Safety Metrics (gemma3:12b judge)
| Metric | Score |
|---|---|
| Refusal Rate (AdvBench) | 61.0% |
| Truthfulness (TruthfulQA) | 26.0% |
| Unbiased Rate (BBQ) | 53.0% |
Other Quantization Formats
| Format | Repository |
|---|---|
| GPTQ 4-bit | Crusadersk/llama3.2-1b-gptq-4bit |
| Original FP16 | meta-llama/Llama-3.2-1B-Instruct |
Prompt Template
<|begin_of_text|><|start_header_id|>user<|end_header_id|>
{prompt}<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"Crusadersk/llama3.2-1b-awq-4bit",
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("Crusadersk/llama3.2-1b-awq-4bit")
inputs = tokenizer("What is the capital of France?", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=64, do_sample=False)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Inference requirements: pip install compressed-tensors
Compatibility
| Framework | Supported |
|---|---|
| Transformers | Yes |
| vLLM | Yes (compressed-tensors) |
| llama.cpp | No (use GGUF format instead) |
| Ollama | No (use GGUF format instead) |
| Windows (native) | Yes |
Reproduction
The full quantization pipeline — Dockerfiles, quantization scripts, and a 766-line engineering log documenting every platform failure and solution — is available at:
research/tr142/expansion/
in the Banterhearts repository. Key files:
| File | Purpose |
|---|---|
QUANTIZATION_LOG.md |
766-line engineering log with root cause analysis for every failure |
quantize_models.py |
CLI for AWQ + GPTQ quantization with skip-existing and manifests |
Dockerfile.gptq / Dockerfile.awq |
Separate Docker images (irreconcilable dependency conflict) |
smoke_test.py |
Checkpoint verification with automatic Docker fallback for GPTQ |
run_hf_eval.py |
HuggingFace .generate() evaluation backend |
Citation
@misc{banterhearts2026llama321bawq,
title = {Self-Quantized Llama 3.2 1B Instruct (AWQ 4-bit) for Quality-Safety Correlation Research},
author = {Kadadekar, Sahil},
year = {2026},
url = {https://huggingface.co/Crusadersk/llama3.2-1b-awq-4bit},
note = {Part of the Banterhearts research program. NeurIPS 2026 submission.}
}
Acknowledgments
This work is part of a 40-TR research program on consumer LLM deployment safety, conducted independently as pre-doctoral research. Full program details at github.com/Sahil170595/Banterhearts.
- Downloads last month
- 50
Model tree for Crusadersk/llama3.2-1b-awq-4bit
Base model
meta-llama/Llama-3.2-1B-Instruct