llama-3.2-3b-bitsandbytes-4bit-nf4

This repository contains a quantized model artifact produced in the graduation project.

Model Details

  • Technique: BitsAndBytes
  • Quantization: NF4 (4-bit)
  • Base model: meta-llama/Llama-3.2-3B-Instruct
  • Export date: 2026-03-24

Benchmark Summary

Metric Original Quantized
Model size (GB) 5.98 2.05
Avg inference (sec) 29.59 3.83
Tokens/sec 3.38 26.13
Perplexity 41.4043 37.4797

Comparison Highlights

  • Speedup: N/Ax
  • Memory reduction: N/A%
  • Disk/model size reduction: N/A%

Benchmark Notes

  • Numbers below are copied from local benchmark_results JSON in this project.

Local Source

  • Quantized folder: Advanced-Techniques/MixedPrecision/quantized/4bit-nf4
  • Benchmark JSON: Advanced-Techniques/MixedPrecision/benchmark_results/bitsandbytes_benchmark.json

Usage

Use the model with the library and runtime that match the quantization technique in this repo.

Limitations

  • This model card is auto-generated from project files.
  • You should validate quality, safety, and license compatibility before public release.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for emreyigitozturk/llama-3.2-3b-bitsandbytes-4bit-nf4

Finetuned
(1526)
this model