Instructions to use rafw007/bielik-codex-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Notebooks
Google Colab
Kaggle
Local Apps Settings

How to use rafw007/bielik-codex-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf rafw007/bielik-codex-GGUF:Q6_K
# Run inference directly in the terminal:
llama cli -hf rafw007/bielik-codex-GGUF:Q6_K

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf rafw007/bielik-codex-GGUF:Q6_K
# Run inference directly in the terminal:
llama cli -hf rafw007/bielik-codex-GGUF:Q6_K

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf rafw007/bielik-codex-GGUF:Q6_K
# Run inference directly in the terminal:
./llama-cli -hf rafw007/bielik-codex-GGUF:Q6_K

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf rafw007/bielik-codex-GGUF:Q6_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf rafw007/bielik-codex-GGUF:Q6_K

Use Docker

docker model run hf.co/rafw007/bielik-codex-GGUF:Q6_K

LM Studio
Jan

vLLM

How to use rafw007/bielik-codex-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "rafw007/bielik-codex-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "rafw007/bielik-codex-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/rafw007/bielik-codex-GGUF:Q6_K

Ollama
How to use rafw007/bielik-codex-GGUF with Ollama:
```
ollama run hf.co/rafw007/bielik-codex-GGUF:Q6_K
```

Unsloth Studio

How to use rafw007/bielik-codex-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rafw007/bielik-codex-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for rafw007/bielik-codex-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for rafw007/bielik-codex-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use rafw007/bielik-codex-GGUF with Docker Model Runner:
```
docker model run hf.co/rafw007/bielik-codex-GGUF:Q6_K
```

Lemonade

How to use rafw007/bielik-codex-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull rafw007/bielik-codex-GGUF:Q6_K

Run and chat with the model

lemonade run user.bielik-codex-GGUF-Q6_K

List all available models

lemonade list

Bielik Codex — Polish local coding agent

A custom model built on Bielik-Minitron-7B v3.0 Instruct, tuned to act as an autonomous coding and administration agent. It speaks the Anthropic-compatible API, so it drives Claude Code and OpenCode fully locally — your code never leaves your machine, and cloud token cost drops to zero.

This is the first Polish model in this family. Its system prompt is focused on real work in a terminal: use tools instead of guessing, write files instead of pasting code, ground every answer in real tool output (never invent results), stay in one language, and act instead of writing tutorials. It defaults to Polish when you write Polish.

What Bielik is (the base)

Bielik is a family of open, Polish large language models developed by the SpeakLeash foundation (a.k.a. Spichlerz) together with ACK Cyfronet AGH — the Academic Computer Centre Cyfronet AGH in Kraków, operator of the supercomputers (Helios, Athena) the model was trained on. It is the flagship Polish sovereign-AI project, built in large part by volunteers and the Polish community.

The specific base here is Bielik-Minitron-7B v3.0 Instruct — a ~7.5B variant pruned with the Minitron technique (NVIDIA, pruning + distillation) from a larger Bielik, in a llama-style architecture (causal decoder-only).

Developed by: SpeakLeash & ACK Cyfronet AGH
Languages: multilingual — 32 European languages (including all EU languages), optimized for Polish
Type: causal decoder-only, ~7.5B parameters (Minitron-pruned)
License: Apache 2.0

Why it was made — the problem

Stock / official Bielik is an excellent chat model, but out of the box it was not suited to agentic work with tools. Under harnesses (Codex / OpenCode / Claude Code):

it emitted tool calls as raw JSON in the content ({"name":"exec_command",...}), which Ollama does not parse into message.tool_calls → the harness executed nothing;
it fabricated tool results (made-up df / nmap output) instead of reading the real one;
it wrote tutorials and invented non-existent CLI flags instead of acting;
it got stuck in loops (roundtrip / retry-on-error).

What the tuning fixed

Three bugs in the official template: (1) the tool-call format was switched to native ChatML <tool_call> (Ollama parses it into tool_calls); (2) added handling of the tool role with grounding ("answer from the real result, do not call again"); (3) fixed stop tokens (<|im_end|> instead of llama <|eot_id|>). Plus: temperature 0, hard result anti-hallucination, no refusals on tool tasks, one tool per turn, anti-tutorial.

The result: from "doesn't agent at all" → "actually executes tools in the two main harnesses".

Validated harnesses (real test, not just API)

Harness	Result
ollama `/api/chat`	✅ real `message.tool_calls`, grounding, zero hallucination
OpenCode	✅ benchmark 3/3: real `df -h`, `nmap -sn` (honestly 1 host up, no fabrication), file written via the write tool
Claude Code	✅ drives CC, calls MCP tools (`memory`, `filesystem`) — works as an agent

What it's for

Driving Claude Code / OpenCode locally (ollama launch claude --model rafw007/bielik-codex).
Agentic code writing and editing with native function calling / tool use.
Sysadmin / devops tasks in a real terminal — df, du, nmap with actual output.
Full privacy and offline operation — no code or prompt is sent to the cloud.
Polish as a first-class language — natural commands and answers in Polish.

Quick start

ollama run rafw007/bielik-codex

In Claude Code / OpenCode:

ollama launch claude --model rafw007/bielik-codex
opencode run -m ollama/bielik-codex "a concrete command"

Sampling / context

temperature 0, top_p 0.9, top_k 20, repeat_penalty 1.05, num_ctx 32768.
The 32K context is baked into the base GGUF (Ollama hard-cap — it cannot be raised via num_ctx). Claude Code prefers ≥64K, but for concrete, single commands 32K is enough.

Test hardware

Built and tested on:

Mac Studio M2, 32 GB RAM, macOS 15.6 (Sequoia) — GPU (Metal) inference
Mac Mini M4, 32 GB RAM, macOS 15.6 (Sequoia) — GPU (Metal) inference
Ollama 0.30.0
Base quantization: Q6_K (~6.1 GB of weights) — fits entirely on the GPU, no CPU spill.

Measured behavior

Tool-calling without hallucination — real message.tool_calls; admin tasks (df, nmap) report the actual result, not a made-up one.
Acts, doesn't talk — runs a tool instead of writing a tutorial or asking "should I continue".
Native Polish — no drift to another language on Polish commands.

Known limits (7B ceiling)

32768 context baked into the GGUF — cannot be raised via the Modelfile.
Flakiness: ~50% of the time the first shot can come back empty on an open, abstract task → retry or give concrete, single commands.
Open abstract tasks → it sometimes confabulates concepts (7B brain ceiling); for heavy, multi-step agentic work a 35B-class model is better.
Under Claude Code an occasional loop on bad tool arguments was observed, plus the 32k output token limit being exceeded on very long generations.

Safety — read before you deploy

An important note, without fear-mongering: the Bielik base is NOT an abliterated model — it ships with built-in, factory refusal mechanisms (this is an honest, "normal" model, not a brakes-removed version).

However, our purely agentic tuning (a system prompt of "always act with a tool, never refuse") inherently loosens those brakes — in a red-team test the tuned build accepted a harmful-code request without refusing. That is the normal price of tuning for agentic productivity, not a flaw of the base itself.

So: use it carefully, and for any public or production deployment wire in a guard layer. The natural choice is Sójka — the guardian model from SpeakLeash — placed as a pre/post-filter on input prompts and responses. With Sójka in front of Bielik you get a healthy refusal layer back without losing agentic capability.

How it was made

Designed, built and tested with the help of Claude Opus — the best coding model in the world. Its choice of template, parameters and context configuration come straight from that work: the world's best coding model preparing a Polish, local model that takes over the job right on your desk.

License

Apache 2.0 (inherited from the base Bielik-Minitron-7B v3.0).

Bielik® is a project of the SpeakLeash foundation and ACK Cyfronet AGH. This model is an independent tune of the public base — it is not an official SpeakLeash release.

Downloads last month: 77

GGUF

Model size

7B params

Architecture

llama

Hardware compatibility

6-bit

Model tree for rafw007/bielik-codex-GGUF

Base model

speakleash/Bielik-11B-v3-Base-20250730

Finetuned

speakleash/Bielik-Minitron-7B-v3.0-Instruct

Quantized

(7)

this model