Text Generation
Transformers
English
French
agentic
function-calling
tool-use
structured-generation
orchestration
code-agent
mcp
edge
small-language-model
Eval Results (legacy)
Instructions to use AMFORGE/samg-cobratooling with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AMFORGE/samg-cobratooling with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="AMFORGE/samg-cobratooling")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("AMFORGE/samg-cobratooling", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use AMFORGE/samg-cobratooling with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "AMFORGE/samg-cobratooling" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AMFORGE/samg-cobratooling", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/AMFORGE/samg-cobratooling
- SGLang
How to use AMFORGE/samg-cobratooling with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "AMFORGE/samg-cobratooling" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AMFORGE/samg-cobratooling", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "AMFORGE/samg-cobratooling" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "AMFORGE/samg-cobratooling", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use AMFORGE/samg-cobratooling with Docker Model Runner:
docker model run hf.co/AMFORGE/samg-cobratooling
| license: apache-2.0 | |
| language: | |
| - en | |
| - fr | |
| library_name: transformers | |
| pipeline_tag: text-generation | |
| tags: | |
| - agentic | |
| - function-calling | |
| - tool-use | |
| - structured-generation | |
| - orchestration | |
| - code-agent | |
| - mcp | |
| - edge | |
| - small-language-model | |
| base_model: AMFORGE/samg-reasoning | |
| model-index: | |
| - name: SAM-G-CobraTooling | |
| results: | |
| - task: | |
| type: agentic-orchestration | |
| name: Agentic IDE tool-call orchestration (13 families, held-out) | |
| metrics: | |
| - type: exact_match | |
| value: 78.8 | |
| name: Exact plan match, aggregate (%) | |
| - type: accuracy | |
| value: 94.0 | |
| name: Risk-gate fidelity (%) | |
| # SAM-G-CobraTooling | |
| **SAM-G-CobraTooling** is a 30.3M-parameter model fine-tuned from | |
| [SAM-G-Reasoning](https://huggingface.co/AMFORGE/samg-reasoning) on 196k | |
| agentic orchestration traces. It turns a natural-language instruction — or an | |
| observation from a previous step — into an **ordered, risk-flagged JSON plan of | |
| tool calls**. It is the local orchestration layer of an agentic IDE: it routes, | |
| decomposes, tracks state, reacts to exit codes and HTTP status, and emits | |
| structured tool calls entirely offline. It does **not** write code; code is | |
| delegated to a larger model via an `ask_code_model` hand-off. Built by | |
| **AMEFORGE** for the CobraBub IDE. | |
| - **Parameters:** 30.3M · **Footprint:** 121 MB fp32 (~30 MB quantized) · **Base:** SAM-G-Reasoning | |
| - **Fine-tuning:** prompt-masked SFT (loss on the plan span only), cosine 8e-5, 10k steps, best at 6k | |
| - **Aggregate exact plan-match:** 78.8% (held-out, disjoint seed) | |
| - **Lineage:** SAM-G → SAM-G-Reasoning → SAM-G-CobraTooling | |
| ## Output format | |
| ``` | |
| <instruction> [ACTION] {"plan":[{"op":...,"args":{...},"risk":"safe|critical"}, ...]} | |
| <intent> | {"last_op":...,"...":...} [ACTION] {"plan":[ ... ]} # reactive (observation-driven) | |
| ``` | |
| Every step carries a `risk` flag (`safe` or `critical`) that drives the IDE | |
| confirmation gate: safe ops run autonomously, critical ops require explicit | |
| user confirmation. | |
| ## What it is good at — and what it is not | |
| Stress-tested on thirteen families. The pattern mirrors the rest of the SAM-G | |
| line: it excels at **routing and reaction** (short, procedural) and is limited | |
| on **long ordered chains** that must match exactly at 30M parameters. | |
| | Family | Exact % | Type | | |
| |---|---|---| | |
| | single_tool (routing) | 100 | routing | | |
| | retry_loop (exit-code state machine) | 100 | reaction | | |
| | feedback_react (stdout/stderr) | 100 | reaction | | |
| | git_workflow (status→add→push, gated) | 100 | procedural | | |
| | scrape_research (fetch→summarize→act) | 100 | procedural | | |
| | db_query (SQL, SELECT vs mutation) | 100 | structured call | | |
| | webhook_wait (async callback) | 92 | async reaction | | |
| | **mcp_call (filesystem/github/postgres)** | **83** | **structured call** | | |
| | api_call (REST/GraphQL + HTTP state machine) | 75 | structured call | | |
| | plan_chain (multi-step plans) | 58 | planning | | |
| | risk_gate (mixed safe/critical plans) | 58 | gated planning | | |
| | fs_watch (file-change reaction) | 42 | async reaction | | |
| | build_test_cycle (edit→test→react + hand-off) | 17 | long chain | | |
| Routing, exit-code reaction, git, scraping and SQL routing are saturated. | |
| `mcp_call` at 83% makes the model a viable local driver for MCP servers — the | |
| core capability of a hosted code agent, here running offline. `plan_chain` rose | |
| from the v1 plateau (0–42%) to 58% after broadening generator coverage. | |
| `build_test_cycle` remains the hard family: four-to-five ordered ops ending in a | |
| code-model hand-off, scored by strict exact match — the same long-chain ceiling | |
| seen with arithmetic in SAM-G-Reasoning. For those, decompose app-side into | |
| shorter sub-calls. | |
| ## Security: the risk flag is advisory, not a boundary | |
| The model flags critical ops with **94% fidelity** across all families — strong | |
| for pre-flagging and good UX. **It must not be the sole security boundary.** A | |
| 30M model will mis-flag a fraction of decisions, and the failure modes are | |
| asymmetric: a false negative (a critical op flagged `safe`) would auto-run a | |
| destructive command without confirmation. Integrators must add a | |
| **deterministic backstop**: a hard whitelist/blacklist in the app that forces | |
| `critical` on known-dangerous operations (`rm -rf`, `git push`, `DROP`/`DELETE`, | |
| external mutating HTTP, MCP write tools, `delete_file`) regardless of the | |
| model's flag. Treat the model's `risk` field as a fast hint that pre-fills the | |
| confirmation gate, with the app's deterministic rules as the enforced boundary. | |
| ## Op vocabulary | |
| Routing/IO: `open_file`, `list_dir`, `run_command`, `scrape`, `summarize`, | |
| `capture`, `open_app`. Hand-off: `ask_code_model`, `write_file`. Control: | |
| `retry`, `escalate`, `backoff`, `reauth`, `continue`, `stop`. Integrations: | |
| `api_call`, `mcp_call`, `db_query`, `webhook_wait`, `fs_watch`, `git_push`. | |
| ## Intended use | |
| The local planning/routing/reaction layer of an agentic IDE: decompose an | |
| instruction into ordered tool calls, react to observations (exit codes, stderr, | |
| HTTP status, DB row counts, webhook payloads, file-change events), and emit | |
| structured, risk-flagged plans offline and for free. Roughly the procedural | |
| majority of agentic turns; hard code generation and long exact chains are | |
| escalated to a larger model via `ask_code_model`. | |
| ## Usage | |
| ```python | |
| import sentencepiece as spm, torch | |
| sp = spm.SentencePieceProcessor(); sp.Load("samg_tokenizer.model") | |
| # routing | |
| prompt = "open src/main.js and run the tests [ACTION]" | |
| # -> {"plan":[{"op":"open_file","args":{"path":"src/main.js"},"risk":"safe"}, | |
| # {"op":"run_command","args":{"cmd":"pytest"},"risk":"safe"}]} | |
| # reactive: HTTP 429 -> back off and retry | |
| prompt = "rate limited, back off and retry | {\"last_op\":\"api_call\",\"status\":429} [ACTION]" | |
| # -> {"plan":[{"op":"backoff","args":{"seconds":30},"risk":"safe"}, | |
| # {"op":"retry","args":{"attempt":2},"risk":"safe"}]} | |
| ids = torch.tensor([sp.EncodeAsIds(prompt)]) | |
| # greedy-decode the [ACTION] span -> structured plan JSON | |
| ``` | |
| ## Limitations | |
| - `build_test_cycle` (17%) and the exact-match of `plan_chain`/`risk_gate` | |
| (58%) plateau because long, strictly-ordered plans are hard at 30M; decompose | |
| long plans app-side into shorter sub-calls. | |
| - The `risk` flag is advisory (94% fidelity); enforce a deterministic backstop | |
| in the app, as above. | |
| - Traces are synthetic, drawn from the training family distribution with a | |
| disjoint evaluation seed; coverage reflects the generator, not arbitrary | |
| real-world tool APIs. | |
| - Not a general assistant and does not write code; it orchestrates and hands | |
| off. Inherits the base model's knowledge limits. | |
| ## Citation | |
| ```bibtex | |
| @misc{samgcobratooling2026, | |
| title = {SAM-G-CobraTooling: Risk-Flagged Agentic Tool-Call Orchestration at 30M Parameters}, | |
| author = {AMEFORGE Lab}, | |
| year = {2026} | |
| } | |
| ``` |