llama-3.2-3b-bitsandbytes-4bit-nf4
This repository contains a quantized model artifact produced in the graduation project.
Model Details
- Technique: BitsAndBytes
- Quantization: NF4 (4-bit)
- Base model: meta-llama/Llama-3.2-3B-Instruct
- Export date: 2026-03-24
Benchmark Summary
| Metric | Original | Quantized |
|---|---|---|
| Model size (GB) | 5.98 | 2.05 |
| Avg inference (sec) | 29.59 | 3.83 |
| Tokens/sec | 3.38 | 26.13 |
| Perplexity | 41.4043 | 37.4797 |
Comparison Highlights
- Speedup: N/Ax
- Memory reduction: N/A%
- Disk/model size reduction: N/A%
Benchmark Notes
- Numbers below are copied from local benchmark_results JSON in this project.
Local Source
- Quantized folder: Advanced-Techniques/MixedPrecision/quantized/4bit-nf4
- Benchmark JSON: Advanced-Techniques/MixedPrecision/benchmark_results/bitsandbytes_benchmark.json
Usage
Use the model with the library and runtime that match the quantization technique in this repo.
Limitations
- This model card is auto-generated from project files.
- You should validate quality, safety, and license compatibility before public release.
Model tree for emreyigitozturk/llama-3.2-3b-bitsandbytes-4bit-nf4
Base model
meta-llama/Llama-3.2-3B-Instruct