Llama-3.2-3B-2bit-gguf

Format Quantization QuantLLM

Description

This is meta-llama/Llama-3.2-3B converted to GGUF format for use with llama.cpp, Ollama, LM Studio, and other compatible tools.

Usage

With llama.cpp

# Download the model
huggingface-cli download QuantLLM/Llama-3.2-3B-2bit-gguf Llama-3.2-3B-2bit-gguf.Q2_K.gguf --local-dir .

# Run with llama.cpp
./llama-cli -m Llama-3.2-3B-2bit-gguf.Q2_K.gguf -p "Hello, how are you?" -n 128

With Ollama

# Create a Modelfile
echo 'FROM ./Llama-3.2-3B-2bit-gguf.Q2_K.gguf' > Modelfile

# Create the model
ollama create llama-3.2-3b-2bit-gguf -f Modelfile

# Run
ollama run llama-3.2-3b-2bit-gguf

With LM Studio

  1. Download the .gguf file from this repository
  2. Open LM Studio and go to the Models tab
  3. Click "Add Model" and select the downloaded file
  4. Start chatting!

With Python (llama-cpp-python)

from llama_cpp import Llama

llm = Llama.from_pretrained(
    repo_id="QuantLLM/Llama-3.2-3B-2bit-gguf",
    filename="Llama-3.2-3B-2bit-gguf.Q2_K.gguf",
)

output = llm(
    "Write a story about a robot:",
    max_tokens=256,
    echo=True
)
print(output["choices"][0]["text"])

Model Details

Property Value
Base Model meta-llama/Llama-3.2-3B
Format GGUF
Quantization Q2_K
License apache-2.0
Created 2025-12-20

Quantization Details

  • Type: Q2_K
  • Bits: 2-bit
  • Description: Smallest size, lowest quality

Available Quantizations

Quantization Bits Use Case
Q2_K 2-bit Minimum size, experimental
Q3_K_M 3-bit Very constrained environments
Q4_K_M 4-bit Recommended for most users
Q5_K_M 5-bit Higher quality, more memory
Q6_K 6-bit Near-original quality
Q8_0 8-bit Best quality, largest size

About QuantLLM

This model was converted using QuantLLM - the ultra-fast LLM quantization and export library.

from quantllm import turbo

# Load and quantize any model
model = turbo("meta-llama/Llama-3.2-3B")

# Export to any format
model.export("gguf", quantization="Q2_K")

โญ Star us on GitHub!

Downloads last month
37
GGUF
Model size
3B params
Architecture
llama
Hardware compatibility
Log In to view the estimation

2-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for QuantLLM/Llama-3.2-3B-2bit-gguf

Quantized
(116)
this model