How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf rafw007/bielik-codex-GGUF:Q6_K
# Run inference directly in the terminal:
llama-cli -hf rafw007/bielik-codex-GGUF:Q6_K
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf rafw007/bielik-codex-GGUF:Q6_K
# Run inference directly in the terminal:
llama-cli -hf rafw007/bielik-codex-GGUF:Q6_K
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf rafw007/bielik-codex-GGUF:Q6_K
# Run inference directly in the terminal:
./llama-cli -hf rafw007/bielik-codex-GGUF:Q6_K
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf rafw007/bielik-codex-GGUF:Q6_K
# Run inference directly in the terminal:
./build/bin/llama-cli -hf rafw007/bielik-codex-GGUF:Q6_K
Use Docker
docker model run hf.co/rafw007/bielik-codex-GGUF:Q6_K
Quick Links

Bielik Codex — Polish local coding agent

A custom model built on Bielik-Minitron-7B v3.0 Instruct, tuned to act as an autonomous coding and administration agent. It speaks the Anthropic-compatible API, so it drives Claude Code and OpenCode fully locally — your code never leaves your machine, and cloud token cost drops to zero.

This is the first Polish model in this family. Its system prompt is focused on real work in a terminal: use tools instead of guessing, write files instead of pasting code, ground every answer in real tool output (never invent results), stay in one language, and act instead of writing tutorials. It defaults to Polish when you write Polish.

What Bielik is (the base)

Bielik is a family of open, Polish large language models developed by the SpeakLeash foundation (a.k.a. Spichlerz) together with ACK Cyfronet AGH — the Academic Computer Centre Cyfronet AGH in Kraków, operator of the supercomputers (Helios, Athena) the model was trained on. It is the flagship Polish sovereign-AI project, built in large part by volunteers and the Polish community.

The specific base here is Bielik-Minitron-7B v3.0 Instruct — a ~7.5B variant pruned with the Minitron technique (NVIDIA, pruning + distillation) from a larger Bielik, in a llama-style architecture (causal decoder-only).

  • Developed by: SpeakLeash & ACK Cyfronet AGH
  • Languages: multilingual — 32 European languages (including all EU languages), optimized for Polish
  • Type: causal decoder-only, ~7.5B parameters (Minitron-pruned)
  • License: Apache 2.0

Why it was made — the problem

Stock / official Bielik is an excellent chat model, but out of the box it was not suited to agentic work with tools. Under harnesses (Codex / OpenCode / Claude Code):

  • it emitted tool calls as raw JSON in the content ({"name":"exec_command",...}), which Ollama does not parse into message.tool_calls → the harness executed nothing;
  • it fabricated tool results (made-up df / nmap output) instead of reading the real one;
  • it wrote tutorials and invented non-existent CLI flags instead of acting;
  • it got stuck in loops (roundtrip / retry-on-error).

What the tuning fixed

Three bugs in the official template: (1) the tool-call format was switched to native ChatML <tool_call> (Ollama parses it into tool_calls); (2) added handling of the tool role with grounding ("answer from the real result, do not call again"); (3) fixed stop tokens (<|im_end|> instead of llama <|eot_id|>). Plus: temperature 0, hard result anti-hallucination, no refusals on tool tasks, one tool per turn, anti-tutorial.

The result: from "doesn't agent at all" → "actually executes tools in the two main harnesses".

Validated harnesses (real test, not just API)

Harness Result
ollama /api/chat ✅ real message.tool_calls, grounding, zero hallucination
OpenCode ✅ benchmark 3/3: real df -h, nmap -sn (honestly 1 host up, no fabrication), file written via the write tool
Claude Code ✅ drives CC, calls MCP tools (memory, filesystem) — works as an agent

What it's for

  • Driving Claude Code / OpenCode locally (ollama launch claude --model rafw007/bielik-codex).
  • Agentic code writing and editing with native function calling / tool use.
  • Sysadmin / devops tasks in a real terminal — df, du, nmap with actual output.
  • Full privacy and offline operation — no code or prompt is sent to the cloud.
  • Polish as a first-class language — natural commands and answers in Polish.

Quick start

ollama run rafw007/bielik-codex

In Claude Code / OpenCode:

ollama launch claude --model rafw007/bielik-codex
opencode run -m ollama/bielik-codex "a concrete command"

Sampling / context

  • temperature 0, top_p 0.9, top_k 20, repeat_penalty 1.05, num_ctx 32768.
  • The 32K context is baked into the base GGUF (Ollama hard-cap — it cannot be raised via num_ctx). Claude Code prefers ≥64K, but for concrete, single commands 32K is enough.

Test hardware

Built and tested on:

  • Mac Studio M2, 32 GB RAM, macOS 15.6 (Sequoia) — GPU (Metal) inference
  • Mac Mini M4, 32 GB RAM, macOS 15.6 (Sequoia) — GPU (Metal) inference
  • Ollama 0.30.0
  • Base quantization: Q6_K (~6.1 GB of weights) — fits entirely on the GPU, no CPU spill.

Measured behavior

  • Tool-calling without hallucination — real message.tool_calls; admin tasks (df, nmap) report the actual result, not a made-up one.
  • Acts, doesn't talk — runs a tool instead of writing a tutorial or asking "should I continue".
  • Native Polish — no drift to another language on Polish commands.

Known limits (7B ceiling)

  • 32768 context baked into the GGUF — cannot be raised via the Modelfile.
  • Flakiness: ~50% of the time the first shot can come back empty on an open, abstract task → retry or give concrete, single commands.
  • Open abstract tasks → it sometimes confabulates concepts (7B brain ceiling); for heavy, multi-step agentic work a 35B-class model is better.
  • Under Claude Code an occasional loop on bad tool arguments was observed, plus the 32k output token limit being exceeded on very long generations.

Safety — read before you deploy

An important note, without fear-mongering: the Bielik base is NOT an abliterated model — it ships with built-in, factory refusal mechanisms (this is an honest, "normal" model, not a brakes-removed version).

However, our purely agentic tuning (a system prompt of "always act with a tool, never refuse") inherently loosens those brakes — in a red-team test the tuned build accepted a harmful-code request without refusing. That is the normal price of tuning for agentic productivity, not a flaw of the base itself.

So: use it carefully, and for any public or production deployment wire in a guard layer. The natural choice is Sójka — the guardian model from SpeakLeash — placed as a pre/post-filter on input prompts and responses. With Sójka in front of Bielik you get a healthy refusal layer back without losing agentic capability.

How it was made

Designed, built and tested with the help of Claude Opus — the best coding model in the world. Its choice of template, parameters and context configuration come straight from that work: the world's best coding model preparing a Polish, local model that takes over the job right on your desk.

License

Apache 2.0 (inherited from the base Bielik-Minitron-7B v3.0).


Bielik® is a project of the SpeakLeash foundation and ACK Cyfronet AGH. This model is an independent tune of the public base — it is not an official SpeakLeash release.

Downloads last month
64
GGUF
Model size
7B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

6-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rafw007/bielik-codex-GGUF