CnakeAgent-sft-v0.3

CnakeAgent-sft-v0.3 is a supervised fine-tune of openai/gpt-oss-20b for Python → Cython optimization workflows. It is trained on multi-turn tool-use traces where the model proposes Cython code, receives compile/test/benchmark feedback from an evaluate_cython tool, and iteratively refines its output.

This checkpoint is packaged in MXFP4 format for efficient single-GPU serving.

What This Model Is For

  • Translating Python functions to optimized Cython (.pyx)
  • Iterative refinement with evaluator feedback (evaluate_cython)
  • Agent-style optimization loops in MCP or OpenAI-compatible Responses API runtimes

Termination Convention

This checkpoint terminates each rollout with a native final-channel turn:

<|start|>assistant<|channel|>final<|message|>{final cython code}<|return|>

There is no synthetic finish tool — the model emits its final answer in the final channel and ends with <|return|>. Compatible with the modern OpenAI Responses API agent-loop convention: the orchestrator continues while the model emits evaluate_cython calls and exits when no tool call is parsed.

Recommended Serving (vLLM)

python -m vllm.entrypoints.openai.api_server \
  --model CnakeCharmer/CnakeAgent-sft-v0.3 \
  --served-model-name gpt-oss-20b-cython \
  --host 0.0.0.0 \
  --port 8003 \
  --trust-remote-code

MCP Usage

This model is designed as a local agent backend for code tools such as Claude Code and Codex. The CnakeCharmer tool-execution path uses Bubblewrap (bwrap) for sandboxing.

Tool Schema

The model expects exactly one tool, evaluate_cython, with three required string arguments: code, python_code, test_code. The tool returns a plain-text report with sections: Compilation, Annotation, Tests, Benchmark.

Eval Results

End-to-end evaluation on 50 unseen problems (held out from SFT training), running the full agent loop through evaluate_cython (real bwrap-sandboxed compile + test + benchmark, not stubbed):

Metric Value
Compiled 43/50 (86%)
Correct (compiled + tests pass) 40/50 (80%)
Tool calls per problem (avg) 3.2
Speedup median (correct) 11.1×
Speedup mean (correct) 18.9×
Speedup max 98.3×
Annotation quality (mean) 0.86

By difficulty:

Difficulty Correct Median Speedup
easy 22/28 (79%) 10.8×
medium 10/13 (77%) 29.6×
hard 8/9 (89%) 11.1×

Pattern B format compliance separately verified at 90% (45/50) on a stubbed validation pass; remaining 10% are iteration-cap hits on hard problems where the model still wanted more evaluate_cython calls when the loop terminated.

Downloads last month
17
Safetensors
Model size
22B params
Tensor type
BF16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for CnakeCharmer/CnakeAgent-sft-v0.3

Quantized
(207)
this model