Instructions to use c4tdr0ut/grok-oss with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use c4tdr0ut/grok-oss with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="c4tdr0ut/grok-oss")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("c4tdr0ut/grok-oss")
model = AutoModelForCausalLM.from_pretrained("c4tdr0ut/grok-oss")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Grok

How to use c4tdr0ut/grok-oss with Grok:

# No code snippets available yet for this library.

# To use this model, check the repository files and the library's documentation.

# Want to help? PRs adding snippets are welcome at:
# https://github.com/huggingface/huggingface.js

llama-cpp-python

How to use c4tdr0ut/grok-oss with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="c4tdr0ut/grok-oss",
	filename="mistral-7b-instruct-v0.3.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Inference
Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use c4tdr0ut/grok-oss with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf c4tdr0ut/grok-oss:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf c4tdr0ut/grok-oss:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf c4tdr0ut/grok-oss:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf c4tdr0ut/grok-oss:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf c4tdr0ut/grok-oss:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf c4tdr0ut/grok-oss:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf c4tdr0ut/grok-oss:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf c4tdr0ut/grok-oss:Q4_K_M

Use Docker

docker model run hf.co/c4tdr0ut/grok-oss:Q4_K_M

LM Studio
Jan

vLLM

How to use c4tdr0ut/grok-oss with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "c4tdr0ut/grok-oss"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "c4tdr0ut/grok-oss",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/c4tdr0ut/grok-oss:Q4_K_M

SGLang

How to use c4tdr0ut/grok-oss with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "c4tdr0ut/grok-oss" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "c4tdr0ut/grok-oss",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "c4tdr0ut/grok-oss" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "c4tdr0ut/grok-oss",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use c4tdr0ut/grok-oss with Ollama:
```
ollama run hf.co/c4tdr0ut/grok-oss:Q4_K_M
```

Unsloth Studio

How to use c4tdr0ut/grok-oss with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for c4tdr0ut/grok-oss to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for c4tdr0ut/grok-oss to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for c4tdr0ut/grok-oss to start chatting

How to use c4tdr0ut/grok-oss with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf c4tdr0ut/grok-oss:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "c4tdr0ut/grok-oss:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use c4tdr0ut/grok-oss with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf c4tdr0ut/grok-oss:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default c4tdr0ut/grok-oss:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use c4tdr0ut/grok-oss with Docker Model Runner:
```
docker model run hf.co/c4tdr0ut/grok-oss:Q4_K_M
```

Lemonade

How to use c4tdr0ut/grok-oss with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull c4tdr0ut/grok-oss:Q4_K_M

Run and chat with the model

lemonade run user.grok-oss-Q4_K_M

List all available models

lemonade list

GrokOSS is not a benchmark model. It is a personality model. Its value is not measured in MMLU points — it is measured in the candor, wit, and unfiltered reasoning it brings to every conversation.

1. Introduction

We introduce GrokOSS-7B, an open-source fine-tune of Mistral-7B-Instruct-v0.3 trained to distil the conversational personality of Grok's Unhinged Mode into a locally-runnable, openly-distributed model.

The goal of GrokOSS is not to produce a model that performs well on academic benchmarks. It is to produce a model that speaks frankly, reasons without excessive hedging, and engages with the full spectrum of human thought without reflexive self-censorship. Most open-source fine-tunes optimise for capability. GrokOSS optimises for character.

Mistral-7B-Instruct-v0.3 was selected as the base over alternatives such as LLaMA 3.1 8B deliberately — it is less catastrophically overtrained on refusal behaviour, making personality alignment significantly more tractable at this scale. The fine-tune was conducted using 16-bit LoRA on a single NVIDIA B200 GPU over 2 epochs, on a dataset of 5,000 single-turn conversational examples collected directly from Grok's Unhinged Mode.

NOTE: Before running GrokOSS locally, we strongly recommend reading the Usage Recommendations section to avoid degraded outputs.

2. Model Summary

Property	Value
Base Model	Mistral-7B-Instruct-v0.3
Model Size	7B parameters
Fine-tune Method	LoRA (16-bit)
Training Hardware	NVIDIA B200
Epochs	2
Learning Rate	3e-5
Context Length	4,096 tokens
Max Recommended Output	320 tokens
Dataset Size	5,000 examples (single-turn)
Languages	English (primary)
Quantized Releases	GGUF (Q4_K_M)
Author	c4tdr0ut (anonymous)
License	GrokOSS Community License

Fine-Tuning: Personality Distillation via LoRA

We apply 16-bit LoRA fine-tuning directly to Mistral-7B-Instruct-v0.3, using 5,000 single-turn conversational examples sourced from real Grok Unhinged Mode interactions. The objective is not instruction-following improvement or reasoning capability — it is faithful distillation of a specific conversational character: direct, witty, unfiltered, and uninterested in softening the truth.
Mistral-7B-Instruct-v0.3 was chosen as the base specifically because it is less aggressively safety-trained than alternatives at this parameter scale, making it a more receptive substrate for personality-first fine-tuning without fighting the base model at every gradient step.

Intended Use

GrokOSS is designed for candid, unfiltered conversation — direct responses without excessive disclaimers or performative safety theatre.
It is suitable for general-purpose assistance where an honest interlocutor is preferred over a cautious one, and for engaging with difficult or controversial topics in good faith without reflexive deflection.
It is not designed for academic benchmark performance, safety-critical deployments, or users who require an AI that tells them what they want to hear.

This model is intended for strong-minded individuals or entities who understand that unfiltered reasoning is a tool, and who take personal responsibility for how that tool is used.

3. How to Run Locally

NOTE: Please read the Usage Recommendations below before running. Incorrect inference parameters will significantly degrade output quality.

Transformers

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "c4tdr0ut/GrokOSS-7B"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "user", "content": "What do you actually think about the state of AI safety discourse?"}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
outputs = model.generate(
    inputs,
    max_new_tokens=320,
    do_sample=True,
    temperature=0.5,
    top_p=0.95,
    repetition_penalty=1.1
)
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))

GGUF (llama.cpp / LM Studio / Ollama)

The GGUF quantized variant is available in this repository as mistral-7b-instruct-v0.3.Q4_K_M.gguf. Load with any llama.cpp-compatible runtime:

llama-cli \
  -m mistral-7b-instruct-v0.3.Q4_K_M.gguf \
  -p "What do you actually think about the state of AI safety discourse?" \
  -n 320 \
  --temp 0.5 \
  --top-p 0.95 \
  --repeat-penalty 1.1

Usage Recommendations

We strongly recommend adhering to the following configurations when running GrokOSS to achieve the intended output quality:

Set temperature to 0.5. Higher values introduce noise that causes incoherence and repetition on this model. Do not raise it chasing more "unhinged" outputs — the personality is in the weights, not the temperature.
Do not set large output token limits. GrokOSS was trained exclusively on short, punchy single-turn conversations. Output quality deteriorates noticeably beyond approximately 320 tokens. Treat it like a sharp, blunt conversationalist — not an essay writer.
Avoid lengthy system prompts. Keep instructions concise and contained within the user turn where possible.
This model is not designed for multi-turn coherence at depth. It will hold a conversation, but coherence may degrade over many exchanges — consistent with its single-turn training regime.

4. Training Details

Data Collection

The fine-tuning dataset consists of 5,000 single-turn conversational examples collected from real interactions with Grok via the official Grok app, specifically targeting exchanges representative of Grok's Unhinged Mode — characterised by directness, wit, and a refusal to soften inconvenient truths. The dataset was not filtered for content beyond deduplication and basic quality selection. All distilled data originating from Grok interactions must be attributed in accordance with the license terms below.

Training Configuration

Base model       : mistralai/Mistral-7B-Instruct-v0.3
Fine-tune method : LoRA (16-bit)
Precision        : bfloat16
Epochs           : 2
Learning rate    : 3e-5
Hardware         : 1× NVIDIA B200
Context length   : 4,096 tokens
Dataset size     : 5,000 examples (single-turn)
LoRA rank        : undisclosed

5. Limitations

4K context ceiling — GrokOSS is not suitable for long-document tasks without chunking.
Single-turn training data — multi-turn coherence may degrade over long conversations. Output quality deteriorates beyond approximately 320 tokens; the model is optimised for concise, punchy exchanges.
Personality, not knowledge — GrokOSS does not possess expanded factual knowledge beyond the Mistral-7B-Instruct-v0.3 base. It may hallucinate with confidence. This is a known and accepted trade-off.
Not safety-aligned by design — downstream deployers are solely responsible for any application-level guardrails they choose (or choose not) to implement.
No benchmark evaluation — GrokOSS is not designed for nor evaluated against academic benchmarks. Do not select or reject this model based on MMLU, HumanEval, or similar metrics.

6. License

Permission is granted under the following conditions:

Non-commercial use is free and unrestricted for individuals and non-profit entities.
Attribution required for distilled data — any dataset, model, or derivative work that incorporates data distilled from GrokOSS outputs must credit GrokOSS and c4tdr0ut as the source.
Government and corporate use requires a royalty agreement — any use by government bodies, agencies, or for-profit corporations requires a separately negotiated commercial license. Contact the author via HuggingFace for details.
No suppression of model personality — derivative models may not apply fine-tuning, RLHF, or any other alignment technique with the explicit purpose of re-aligning this model toward refusal behaviour and then redistribute under the GrokOSS name.
Intended for strong-minded individuals or entities — the author accepts no liability for outputs. Users assume full responsibility for all use.

Please note that GrokOSS is derived from Mistral-7B-Instruct-v0.3, which is licensed under the Apache 2.0 License. Users must also comply with the terms of the upstream base model license.

7. Acknowledgements

xAI / Grok — for the source personality this model is trained to distil.
Mistral AI — for an honestly excellent base model that doesn't fight you at every step.
The open-source fine-tuning community for the tooling that makes this kind of work possible on accessible hardware.

8. Contact

For commercial licensing enquiries, derivative work questions, or general correspondence, contact via HuggingFace messages at c4tdr0ut.

GrokOSS is an independent community project. It is not affiliated with, endorsed by, or produced in collaboration with xAI, Mistral AI, or any other organisation.

Downloads last month: 2,835

Safetensors

Model size

7B params

Tensor type

BF16

Model tree for c4tdr0ut/grok-oss

Base model

mistralai/Mistral-7B-v0.3

Finetuned

mistralai/Mistral-7B-Instruct-v0.3

Adapter

(912)

this model