Llama-3.2-3B-2bit-gguf

Description

This is meta-llama/Llama-3.2-3B converted to GGUF format for use with llama.cpp, Ollama, LM Studio, and other compatible tools.

Base Model: meta-llama/Llama-3.2-3B
Format: GGUF
Quantization: Q2_K
Created with: QuantLLM

Usage

With llama.cpp

# Download the model
huggingface-cli download QuantLLM/Llama-3.2-3B-2bit-gguf Llama-3.2-3B-2bit-gguf.Q2_K.gguf --local-dir .

# Run with llama.cpp
./llama-cli -m Llama-3.2-3B-2bit-gguf.Q2_K.gguf -p "Hello, how are you?" -n 128

With Ollama

# Create a Modelfile
echo 'FROM ./Llama-3.2-3B-2bit-gguf.Q2_K.gguf' > Modelfile

# Create the model
ollama create llama-3.2-3b-2bit-gguf -f Modelfile

# Run
ollama run llama-3.2-3b-2bit-gguf

With LM Studio

Download the .gguf file from this repository
Open LM Studio and go to the Models tab
Click "Add Model" and select the downloaded file
Start chatting!

With Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="QuantLLM/Llama-3.2-3B-2bit-gguf",
    filename="Llama-3.2-3B-2bit-gguf.Q2_K.gguf",
)

output = llm(
    "Write a story about a robot:",
    max_tokens=256,
    echo=True
)
print(output["choices"][0]["text"])

Model Details

Property	Value
Base Model	meta-llama/Llama-3.2-3B
Format	GGUF
Quantization	Q2_K
License	apache-2.0
Created	2025-12-20

Quantization Details

Type: Q2_K
Bits: 2-bit
Description: Smallest size, lowest quality

Available Quantizations

Quantization	Bits	Use Case
Q2_K	2-bit	Minimum size, experimental
Q3_K_M	3-bit	Very constrained environments
Q4_K_M	4-bit	Recommended for most users
Q5_K_M	5-bit	Higher quality, more memory
Q6_K	6-bit	Near-original quality
Q8_0	8-bit	Best quality, largest size

About QuantLLM

This model was converted using QuantLLM - the ultra-fast LLM quantization and export library.

from quantllm import turbo

# Load and quantize any model
model = turbo("meta-llama/Llama-3.2-3B")

# Export to any format
model.export("gguf", quantization="Q2_K")

⭐ Star us on GitHub!

Downloads last month: 11

GGUF

Model size

3B params

Architecture

llama

Hardware compatibility

2-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for QuantLLM/Llama-3.2-3B-2bit-gguf

Base model

meta-llama/Llama-3.2-3B

Quantized

(132)

this model