Instructions to use unsloth/grok-2-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use unsloth/grok-2-GGUF with Transformers:

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("unsloth/grok-2-GGUF", dtype="auto")

Grok

How to use unsloth/grok-2-GGUF with Grok:

# No code snippets available yet for this library.

# To use this model, check the repository files and the library's documentation.

# Want to help? PRs adding snippets are welcome at:
# https://github.com/huggingface/huggingface.js

llama-cpp-python

How to use unsloth/grok-2-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="unsloth/grok-2-GGUF",
	filename="BF16/grok-2-BF16-00001-of-00011.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use unsloth/grok-2-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf unsloth/grok-2-GGUF:UD-Q4_K_XL
# Run inference directly in the terminal:
llama cli -hf unsloth/grok-2-GGUF:UD-Q4_K_XL

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf unsloth/grok-2-GGUF:UD-Q4_K_XL
# Run inference directly in the terminal:
llama cli -hf unsloth/grok-2-GGUF:UD-Q4_K_XL

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf unsloth/grok-2-GGUF:UD-Q4_K_XL
# Run inference directly in the terminal:
./llama-cli -hf unsloth/grok-2-GGUF:UD-Q4_K_XL

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf unsloth/grok-2-GGUF:UD-Q4_K_XL
# Run inference directly in the terminal:
./build/bin/llama-cli -hf unsloth/grok-2-GGUF:UD-Q4_K_XL

Use Docker

docker model run hf.co/unsloth/grok-2-GGUF:UD-Q4_K_XL

LM Studio
Jan
Ollama
How to use unsloth/grok-2-GGUF with Ollama:
```
ollama run hf.co/unsloth/grok-2-GGUF:UD-Q4_K_XL
```

Unsloth Studio

How to use unsloth/grok-2-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for unsloth/grok-2-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for unsloth/grok-2-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for unsloth/grok-2-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use unsloth/grok-2-GGUF with Docker Model Runner:
```
docker model run hf.co/unsloth/grok-2-GGUF:UD-Q4_K_XL
```

Lemonade

How to use unsloth/grok-2-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull unsloth/grok-2-GGUF:UD-Q4_K_XL

Run and chat with the model

lemonade run user.grok-2-GGUF-UD-Q4_K_XL

List all available models

lemonade list

How to use from the

Use from the

llama-cpp-python library

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="unsloth/grok-2-GGUF",
	filename="",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Learn how to run Grok 2 correctly - Read our Guide.

Unsloth Dynamic 2.0 achieves superior accuracy & outperforms other leading quants.

Grok 2 Usage Guidelines

Use --jinja for llama.cpp. You must use PR 15539. For example use the code below:
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp && git fetch origin pull/15539/head:MASTER && git checkout MASTER && cd ..

Utilizes Alvaro's Grok-2 HF compatible tokenizer as provided here

Grok 2

This repository contains the weights of Grok 2, a model trained and used at xAI in 2024.

Usage: Serving with SGLang

Download the weights. You can replace /local/grok-2 with any other folder name you prefer.
```
hf download xai-org/grok-2 --local-dir /local/grok-2
```
You might encounter some errors during the download. Please retry until the download is successful.
If the download succeeds, the folder should contain 42 files and be approximately 500 GB.
Launch a server.

Install the latest SGLang inference engine (>= v0.5.1) from https://github.com/sgl-project/sglang/

Use the command below to launch an inference server. This checkpoint is TP=8, so you will need 8 GPUs (each with > 40GB of memory).
```
python3 -m sglang.launch_server --model /local/grok-2 --tokenizer-path /local/grok-2/tokenizer.tok.json --tp 8 --quantization fp8 --attention-backend triton
```
Send a request.

This is a post-trained model, so please use the correct chat template.
```
python3 -m sglang.test.send_one --prompt "Human: What is your name?<|separator|>\n\nAssistant:"
```
You should be able to see the model output its name, Grok.

Learn more about other ways to send requests here.

License

The weights are licensed under the Grok 2 Community License Agreement.

Downloads last month: 16,797

GGUF

Model size

270B params

Architecture

grok

Hardware compatibility

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for unsloth/grok-2-GGUF

Base model

xai-org/grok-2

Quantized

(10)

this model