Instructions to use CnakeCharmer/CnakeAgent-sft-v0.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CnakeCharmer/CnakeAgent-sft-v0.2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="CnakeCharmer/CnakeAgent-sft-v0.2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("CnakeCharmer/CnakeAgent-sft-v0.2")
model = AutoModelForCausalLM.from_pretrained("CnakeCharmer/CnakeAgent-sft-v0.2")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use CnakeCharmer/CnakeAgent-sft-v0.2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "CnakeCharmer/CnakeAgent-sft-v0.2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CnakeCharmer/CnakeAgent-sft-v0.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/CnakeCharmer/CnakeAgent-sft-v0.2

SGLang

How to use CnakeCharmer/CnakeAgent-sft-v0.2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "CnakeCharmer/CnakeAgent-sft-v0.2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CnakeCharmer/CnakeAgent-sft-v0.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "CnakeCharmer/CnakeAgent-sft-v0.2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CnakeCharmer/CnakeAgent-sft-v0.2",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use CnakeCharmer/CnakeAgent-sft-v0.2 with Docker Model Runner:
```
docker model run hf.co/CnakeCharmer/CnakeAgent-sft-v0.2
```

CnakeAgent-sft-v0.2

This is a preview checkpoint intended for testing and integration validation. Planned RL/GRPO releases will be continued from this SFT checkpoint as the initialization base.

CnakeAgent-sft-v0.2 is a supervised fine-tune of openai/gpt-oss-20b for Python -> Cython optimization workflows. It is trained on multi-turn tool-use traces where the model proposes Cython code and receives compile/test/benchmark feedback.

This checkpoint is packaged in an MXFP4-compatible format for efficient serving.

What This Model Is For

Translating Python functions to optimized Cython
Iterative refinement with evaluator feedback (evaluate_cython)
Agent-style optimization loops in MCP or OpenAI-compatible tool-calling runtimes

Recommended Serving (vLLM)

python -m vllm.entrypoints.openai.api_server \
  --model CnakeCharmer/CnakeAgent-sft-v0.2 \
  --served-model-name gpt-oss-20b-cython \
  --host 0.0.0.0 \
  --port 8003 \
  --trust-remote-code

MCP Usage

This model is designed primarily as a local agent backend for code tools such as Claude Code and Codex.

The CnakeCharmer tool-execution path uses Bubblewrap (bwrap) for sandboxing. Install it before running MCP agent loops that call evaluate_cython.

Linux install:
# Debian / Ubuntu
sudo apt-get update && sudo apt-get install -y bubblewrap

# Fedora
sudo dnf install -y bubblewrap

# Arch
sudo pacman -S --noconfirm bubblewrap

# one-time setup
git clone https://github.com/dleemiller/CnakeCharmer.git
cd CnakeCharmer
uv sync

# terminal 1: model server
bash scripts/start_vllm_server.sh

# terminal 2: MCP
uv run python -m cnake_charmer.mcp_server

Then call run_cython_agent from your MCP client.

Add MCP To Your Client

Claude Code

claude mcp add cnake-charmer -- uv run python -m cnake_charmer.mcp_server

Codex

codex mcp add cnake-charmer -- uv run python -m cnake_charmer.mcp_server

Typical Workflow

Profile your Python application to find hotspots (cProfile, py-spy, or benchmark timings).
Ask your coding agent (Claude Code or Codex) to isolate one target function or tight loop for optimization.
Have the coding agent call run_cython_agent with the isolated python_code, func_name, and short task description.
Review the returned compile/test/speedup metrics, then apply the generated Cython code into your project.
Re-profile and iterate on the next hotspot.

Direct Inference

import torch
from huggingface_hub import hf_hub_download
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "CnakeCharmer/CnakeAgent-sft-v0.2"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
)

system_prompt_path = hf_hub_download(model_id, "system_prompt.txt")
with open(system_prompt_path) as f:
    system_prompt = f.read().strip()
user_prompt = (
    "python_code: def add(a, b):\n"
    "    return a + b\n\n"
    "func_name: add\n"
    "description: optimize with cython"
)

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": user_prompt},
]

inputs = tok.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

with torch.no_grad():
    out = model.generate(inputs, max_new_tokens=512)

print(tok.decode(out[0], skip_special_tokens=True))

Prompting Notes

The model was trained with a consistent instruction scaffold.
For best behavior, use server-side default instructions (MCP handles this automatically).
The checkpoint includes system_prompt.txt for reproducible agent behavior.

Limitations

Optimized for Cython/tool-use tasks, not general chat.
Quality depends on evaluator feedback loop quality and test coverage.
Can still produce non-compiling code in early iterations.

Training Data

Built from curated tool-use traces in the CnakeCharmer project:

parallel Python/Cython reference pairs
multi-turn evaluation traces with compile/test/benchmark feedback

Project repo: https://github.com/dleemiller/CnakeCharmer

Downloads last month: 15

Safetensors

Model size

22B params

Tensor type

BF16

Model tree for CnakeCharmer/CnakeAgent-sft-v0.2

Base model

openai/gpt-oss-20b

Quantized

(207)

this model