Instructions to use delimitter/synoema-coder-1.5b-tools-v12 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use delimitter/synoema-coder-1.5b-tools-v12 with PEFT:

from peft import PeftModel
from transformers import AutoModelForCausalLM

base_model = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-1.5B-Instruct-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "delimitter/synoema-coder-1.5b-tools-v12")

llama-cpp-python

How to use delimitter/synoema-coder-1.5b-tools-v12 with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="delimitter/synoema-coder-1.5b-tools-v12",
	filename="synoema-coder-1.5b-tools-v12.Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use delimitter/synoema-coder-1.5b-tools-v12 with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M

Use Docker

docker model run hf.co/delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M

LM Studio
Jan

vLLM

How to use delimitter/synoema-coder-1.5b-tools-v12 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "delimitter/synoema-coder-1.5b-tools-v12"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "delimitter/synoema-coder-1.5b-tools-v12",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M

Ollama
How to use delimitter/synoema-coder-1.5b-tools-v12 with Ollama:
```
ollama run hf.co/delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M
```

Unsloth Studio

How to use delimitter/synoema-coder-1.5b-tools-v12 with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for delimitter/synoema-coder-1.5b-tools-v12 to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for delimitter/synoema-coder-1.5b-tools-v12 to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for delimitter/synoema-coder-1.5b-tools-v12 to start chatting

How to use delimitter/synoema-coder-1.5b-tools-v12 with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use delimitter/synoema-coder-1.5b-tools-v12 with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use delimitter/synoema-coder-1.5b-tools-v12 with Docker Model Runner:
```
docker model run hf.co/delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M
```

Lemonade

How to use delimitter/synoema-coder-1.5b-tools-v12 with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M

Run and chat with the model

lemonade run user.synoema-coder-1.5b-tools-v12-Q4_K_M

List all available models

lemonade list

Synoema-Coder-1.5B Tools (C12)

A 1.5B LoRA fine-tune of unsloth/Qwen2.5-1.5B-Instruct that turns it into an agentic coding model for the Synoema programming language — it writes Synoema, type-checks it, runs it, searches a corpus, and self-corrects on errors, all through MCP tools.

🌐 Website: https://synoema.tech
🤖 This model: https://huggingface.co/delimitter/synoema-coder-1.5b-tools-v12
📚 Training corpus (dataset): https://huggingface.co/datasets/delimitter/synoema-coder-3b-tools-corpus

🏆 Result: 100% (28/28) on the Synoema agentic tool-use benchmark

Scored on the corrected agentic harness: the model is driven turn-by-turn (generation stops at <|im_end|>), and real tool results are injected between turns — actual sno check / sno run output from the live Synoema compiler, never mocked. A task only passes if the model genuinely completes it end-to-end (e.g. multi-write self-correction: write broken code → observe the type error → rewrite a valid fix → type-check passes).

Capability	Tasks	Pass
Write + typecheck + run	TU1–TU3, TU5, TU10	✅
Search → write → run	TU6, TU9, TU20	✅
Multi-write self-correction (if/else → ternary)	TU4, TU13	✅
Language features (ADT, HOF, pattern match, cons)	TU11, TU14–TU19, TU23, TU29	✅
List comprehensions	TU12, TU26	✅
Nested ternary (fizzbuzz)	TU22, TU30	✅
Total	28	28/28

What is Synoema?

Synoema is an LLM-native programming language and runtime designed so that models can write it reliably:

BPE-aligned operators — every operator maps to exactly one cl100k_base token.
Ternary instead of if/else — ? cond -> a : b (nestable).
GBNF grammar for constrained decoding (structural-correctness guarantee).
Cranelift JIT + WebAssembly compile targets.
MCP server exposing file_write, file_read, sno_typecheck, sno_run, search_corpus.
Contract annotations (requires / ensures) for formal verification.

Model details

Property	Value
Base model	`unsloth/Qwen2.5-1.5B-Instruct`
Parameters	1.5B
Method	QLoRA (4-bit NF4 + LoRA), merged to fp16 for GGUF
LoRA	r=16, alpha=32
Sequence length	1024
Epochs / cycle	3
Training corpus	~18k tool-use + codegen examples — every example passes `sno check` + `sno run`
Cycle	C12 (sequential "carousel": each cycle warm-starts from the best previous adapter, then trains on the corpus plus targeted examples for the prior cycle's failures)
Hardware	AMD RX 7900 GRE 16GB (ROCm + unsloth)

GGUF files (llama.cpp / Ollama / LM Studio)

File	Quant	Size	Notes
`synoema-coder-1.5b-tools-v12.Q4_K_M.gguf`	Q4_K_M	940 MB	smallest, recommended for local use
`synoema-coder-1.5b-tools-v12.Q8_0.gguf`	Q8_0	2 GB	near-lossless
`synoema-coder-1.5b-tools-v12.f16.gguf`	F16	3 GB	full precision

# llama.cpp
llama-cli -hf delimitter/synoema-coder-1.5b-tools-v12 --hf-file synoema-coder-1.5b-tools-v12.Q4_K_M.gguf -p "Write quicksort in Synoema to src/qs.sno and run it."

# Ollama
ollama run hf.co/delimitter/synoema-coder-1.5b-tools-v12:Q4_K_M

Usage — Transformers + PEFT (adapter)

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

base = AutoModelForCausalLM.from_pretrained("unsloth/Qwen2.5-1.5B-Instruct", device_map="auto")
tok  = AutoTokenizer.from_pretrained("unsloth/Qwen2.5-1.5B-Instruct")
model = PeftModel.from_pretrained(base, "delimitter/synoema-coder-1.5b-tools-v12")

Prompt format is ChatML. The system prompt used at training/eval:

<|im_start|>system
You are sno-code, a Synoema coding agent. Use tools to write and verify code.<|im_end|>
<|im_start|>user
Write `square x = x * x` with `main = square 9` to src/square.sno, typecheck and run it.<|im_end|>
<|im_start|>assistant

The model emits OpenAI-style tool_calls for file_write, sno_typecheck, sno_run, file_read, search_corpus; feed real tool results back as tool turns.

Synoema language quick reference

maxOf x y = ? x > y -> x : y            -- ternary (NO if/then/else)
fact 0 = 1                              -- pattern matching
fact n = n * fact (n - 1)
evens xs = [x | x <- xs, x % 2 == 0]    -- list comprehension
sumList xs = foldl (\acc x -> acc + x) 0 xs   -- higher-order functions
Direction = North | South | East | West       -- ADT
opposite North = South
main = qsort [3 1 4 1 5]                -- lists are SPACE-separated

License

Downloads last month: 41

GGUF

Model size

2B params

Architecture

qwen2

Hardware compatibility

4-bit

8-bit

16-bit

Model tree for delimitter/synoema-coder-1.5b-tools-v12

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Finetuned

unsloth/Qwen2.5-1.5B-Instruct

Adapter

(484)

this model

Evaluation results

28-task agentic eval (28/28)
self-reported

1.000