Instructions to use CnakeCharmer/CnakeAgent-sft-v0.3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use CnakeCharmer/CnakeAgent-sft-v0.3 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="CnakeCharmer/CnakeAgent-sft-v0.3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("CnakeCharmer/CnakeAgent-sft-v0.3")
model = AutoModelForCausalLM.from_pretrained("CnakeCharmer/CnakeAgent-sft-v0.3")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use CnakeCharmer/CnakeAgent-sft-v0.3 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "CnakeCharmer/CnakeAgent-sft-v0.3"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CnakeCharmer/CnakeAgent-sft-v0.3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/CnakeCharmer/CnakeAgent-sft-v0.3

SGLang

How to use CnakeCharmer/CnakeAgent-sft-v0.3 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "CnakeCharmer/CnakeAgent-sft-v0.3" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CnakeCharmer/CnakeAgent-sft-v0.3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "CnakeCharmer/CnakeAgent-sft-v0.3" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "CnakeCharmer/CnakeAgent-sft-v0.3",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use CnakeCharmer/CnakeAgent-sft-v0.3 with Docker Model Runner:
```
docker model run hf.co/CnakeCharmer/CnakeAgent-sft-v0.3
```

CnakeAgent-sft-v0.3

CnakeAgent-sft-v0.3 is a supervised fine-tune of openai/gpt-oss-20b for Python → Cython optimization workflows. It is trained on multi-turn tool-use traces where the model proposes Cython code, receives compile/test/benchmark feedback from an evaluate_cython tool, and iteratively refines its output.

This checkpoint is packaged in MXFP4 format for efficient single-GPU serving.

What This Model Is For

Translating Python functions to optimized Cython (.pyx)
Iterative refinement with evaluator feedback (evaluate_cython)
Agent-style optimization loops in MCP or OpenAI-compatible Responses API runtimes

Termination Convention

This checkpoint terminates each rollout with a native final-channel turn:

<|start|>assistant<|channel|>final<|message|>{final cython code}<|return|>

There is no synthetic finish tool — the model emits its final answer in the final channel and ends with <|return|>. Compatible with the modern OpenAI Responses API agent-loop convention: the orchestrator continues while the model emits evaluate_cython calls and exits when no tool call is parsed.

Recommended Serving (vLLM)

python -m vllm.entrypoints.openai.api_server \
  --model CnakeCharmer/CnakeAgent-sft-v0.3 \
  --served-model-name gpt-oss-20b-cython \
  --host 0.0.0.0 \
  --port 8003 \
  --trust-remote-code

MCP Usage

This model is designed as a local agent backend for code tools such as Claude Code and Codex. The CnakeCharmer tool-execution path uses Bubblewrap (bwrap) for sandboxing.

Tool Schema

The model expects exactly one tool, evaluate_cython, with three required string arguments: code, python_code, test_code. The tool returns a plain-text report with sections: Compilation, Annotation, Tests, Benchmark.

Eval Results

End-to-end evaluation on 50 unseen problems (held out from SFT training), running the full agent loop through evaluate_cython (real bwrap-sandboxed compile + test + benchmark, not stubbed):

Metric	Value
Compiled	43/50 (86%)
Correct (compiled + tests pass)	40/50 (80%)
Tool calls per problem (avg)	3.2
Speedup median (correct)	11.1×
Speedup mean (correct)	18.9×
Speedup max	98.3×
Annotation quality (mean)	0.86

By difficulty:

Difficulty	Correct	Median Speedup
easy	22/28 (79%)	10.8×
medium	10/13 (77%)	29.6×
hard	8/9 (89%)	11.1×

Pattern B format compliance separately verified at 90% (45/50) on a stubbed validation pass; remaining 10% are iteration-cap hits on hard problems where the model still wanted more evaluate_cython calls when the loop terminated.

Downloads last month: 17

Safetensors

Model size

22B params

Tensor type

BF16

Model tree for CnakeCharmer/CnakeAgent-sft-v0.3

Base model

openai/gpt-oss-20b

Quantized

(207)

this model