Instructions to use delimitter/synoema-coder-1.5b-tools-v12 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use delimitter/synoema-coder-1.5b-tools-v12 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-1.5B-Instruct-bnb-4bit") model = PeftModel.from_pretrained(base_model, "delimitter/synoema-coder-1.5b-tools-v12") - llama-cpp-python
How to use delimitter/synoema-coder-1.5b-tools-v12 with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="delimitter/synoema-coder-1.5b-tools-v12", filename="synoema-coder-1.5b-tools-v12.Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use delimitter/synoema-coder-1.5b-tools-v12 with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M # Run inference directly in the terminal: llama-cli -hf delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M # Run inference directly in the terminal: llama-cli -hf delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M
Use Docker
docker model run hf.co/delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use delimitter/synoema-coder-1.5b-tools-v12 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "delimitter/synoema-coder-1.5b-tools-v12" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "delimitter/synoema-coder-1.5b-tools-v12", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M
- Ollama
How to use delimitter/synoema-coder-1.5b-tools-v12 with Ollama:
ollama run hf.co/delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M
- Unsloth Studio
How to use delimitter/synoema-coder-1.5b-tools-v12 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for delimitter/synoema-coder-1.5b-tools-v12 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for delimitter/synoema-coder-1.5b-tools-v12 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for delimitter/synoema-coder-1.5b-tools-v12 to start chatting
- Pi
How to use delimitter/synoema-coder-1.5b-tools-v12 with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use delimitter/synoema-coder-1.5b-tools-v12 with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use delimitter/synoema-coder-1.5b-tools-v12 with Docker Model Runner:
docker model run hf.co/delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M
- Lemonade
How to use delimitter/synoema-coder-1.5b-tools-v12 with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M
Run and chat with the model
lemonade run user.synoema-coder-1.5b-tools-v12-Q4_K_M
List all available models
lemonade list
Synoema-Coder-1.5B Tools (C12)
A 1.5B LoRA fine-tune of unsloth/Qwen2.5-1.5B-Instruct that turns it into an agentic coding model for the
Synoema programming language — it writes Synoema, type-checks it,
runs it, searches a corpus, and self-corrects on errors, all through MCP tools.
- 🌐 Website: https://synoema.tech
- 🤖 This model: https://huggingface.co/delimitter/synoema-coder-1.5b-tools-v12
- 📚 Training corpus (dataset): https://huggingface.co/datasets/delimitter/synoema-coder-3b-tools-corpus
🏆 Result: 100% (28/28) on the Synoema agentic tool-use benchmark
Scored on the corrected agentic harness: the model is driven turn-by-turn (generation
stops at <|im_end|>), and real tool results are injected between turns — actual
sno check / sno run output from the live Synoema compiler, never mocked. A task only passes
if the model genuinely completes it end-to-end (e.g. multi-write self-correction: write broken
code → observe the type error → rewrite a valid fix → type-check passes).
| Capability | Tasks | Pass |
|---|---|---|
| Write + typecheck + run | TU1–TU3, TU5, TU10 | ✅ |
| Search → write → run | TU6, TU9, TU20 | ✅ |
| Multi-write self-correction (if/else → ternary) | TU4, TU13 | ✅ |
| Language features (ADT, HOF, pattern match, cons) | TU11, TU14–TU19, TU23, TU29 | ✅ |
| List comprehensions | TU12, TU26 | ✅ |
| Nested ternary (fizzbuzz) | TU22, TU30 | ✅ |
| Total | 28 | 28/28 |
What is Synoema?
Synoema is an LLM-native programming language and runtime designed so that models can write it reliably:
- BPE-aligned operators — every operator maps to exactly one
cl100k_basetoken. - Ternary instead of if/else —
? cond -> a : b(nestable). - GBNF grammar for constrained decoding (structural-correctness guarantee).
- Cranelift JIT + WebAssembly compile targets.
- MCP server exposing
file_write,file_read,sno_typecheck,sno_run,search_corpus. - Contract annotations (
requires/ensures) for formal verification.
Model details
| Property | Value |
|---|---|
| Base model | unsloth/Qwen2.5-1.5B-Instruct |
| Parameters | 1.5B |
| Method | QLoRA (4-bit NF4 + LoRA), merged to fp16 for GGUF |
| LoRA | r=16, alpha=32 |
| Sequence length | 1024 |
| Epochs / cycle | 3 |
| Training corpus | ~18k tool-use + codegen examples — every example passes sno check + sno run |
| Cycle | C12 (sequential "carousel": each cycle warm-starts from the best previous adapter, then trains on the corpus plus targeted examples for the prior cycle's failures) |
| Hardware | AMD RX 7900 GRE 16GB (ROCm + unsloth) |
GGUF files (llama.cpp / Ollama / LM Studio)
| File | Quant | Size | Notes |
|---|---|---|---|
synoema-coder-1.5b-tools-v12.Q4_K_M.gguf |
Q4_K_M | 940 MB | smallest, recommended for local use |
synoema-coder-1.5b-tools-v12.Q8_0.gguf |
Q8_0 | 2 GB | near-lossless |
synoema-coder-1.5b-tools-v12.f16.gguf |
F16 | 3 GB | full precision |
# llama.cpp
llama-cli -hf delimitter/synoema-coder-1.5b-tools-v12 --hf-file synoema-coder-1.5b-tools-v12.Q4_K_M.gguf -p "Write quicksort in Synoema to src/qs.sno and run it."
# Ollama
ollama run hf.co/delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M
Usage — Transformers + PEFT (adapter)
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
base = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-1.5B-Instruct", device_map="auto")
tok = AutoTokenizer.from_pretrained("unsloth/Qwen2.5-1.5B-Instruct")
model = PeftModel.from_pretrained(base, "delimitter/synoema-coder-1.5b-tools-v12")
Prompt format is ChatML. The system prompt used at training/eval:
<|im_start|>system
You are sno-code, a Synoema coding agent. Use tools to write and verify code.<|im_end|>
<|im_start|>user
Write `square x = x * x` with `main = square 9` to src/square.sno, typecheck and run it.<|im_end|>
<|im_start|>assistant
The model emits OpenAI-style tool_calls for file_write, sno_typecheck, sno_run,
file_read, search_corpus; feed real tool results back as tool turns.
Synoema language quick reference
maxOf x y = ? x > y -> x : y -- ternary (NO if/then/else)
fact 0 = 1 -- pattern matching
fact n = n * fact (n - 1)
evens xs = [x | x <- xs, x % 2 == 0] -- list comprehension
sumList xs = foldl (\acc x -> acc + x) 0 xs -- higher-order functions
Direction = North | South | East | West -- ADT
opposite North = South
main = qsort [3 1 4 1 5] -- lists are SPACE-separated
License
Apache-2.0 (same as the Qwen2.5 base model). Synoema © Andrey Bubnov — https://synoema.tech
- Downloads last month
- 41
4-bit
8-bit
16-bit
Model tree for delimitter/synoema-coder-1.5b-tools-v12
Base model
Qwen/Qwen2.5-1.5BEvaluation results
- 28-task agentic eval (28/28)self-reported1.000