Instructions to use dcostenco/prism-coder-32b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dcostenco/prism-coder-32b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="dcostenco/prism-coder-32b",
	filename="prism-coder-32b-q4km.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use dcostenco/prism-coder-32b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf dcostenco/prism-coder-32b
# Run inference directly in the terminal:
llama-cli -hf dcostenco/prism-coder-32b

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf dcostenco/prism-coder-32b
# Run inference directly in the terminal:
llama-cli -hf dcostenco/prism-coder-32b

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf dcostenco/prism-coder-32b
# Run inference directly in the terminal:
./llama-cli -hf dcostenco/prism-coder-32b

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf dcostenco/prism-coder-32b
# Run inference directly in the terminal:
./build/bin/llama-cli -hf dcostenco/prism-coder-32b

Use Docker

docker model run hf.co/dcostenco/prism-coder-32b

LM Studio
Jan

vLLM

How to use dcostenco/prism-coder-32b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dcostenco/prism-coder-32b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dcostenco/prism-coder-32b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/dcostenco/prism-coder-32b

Ollama
How to use dcostenco/prism-coder-32b with Ollama:
```
ollama run hf.co/dcostenco/prism-coder-32b
```

Unsloth Studio new

How to use dcostenco/prism-coder-32b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for dcostenco/prism-coder-32b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for dcostenco/prism-coder-32b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for dcostenco/prism-coder-32b to start chatting

Pi new

How to use dcostenco/prism-coder-32b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf dcostenco/prism-coder-32b

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "dcostenco/prism-coder-32b"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use dcostenco/prism-coder-32b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf dcostenco/prism-coder-32b

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default dcostenco/prism-coder-32b

Run Hermes

hermes

Docker Model Runner
How to use dcostenco/prism-coder-32b with Docker Model Runner:
```
docker model run hf.co/dcostenco/prism-coder-32b
```

Lemonade

How to use dcostenco/prism-coder-32b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull dcostenco/prism-coder-32b

Run and chat with the model

lemonade run user.prism-coder-32b-{{QUANT_TAG}}

List all available models

lemonade list

prism-coder-32b

File size: 2,676 Bytes

a7b4011
6b0fc59
4eb22d2
 
a7b4011
6b0fc59
4eb22d2
 
 
 
 
 
 
 
a7b4011
 
9d3d252
46e84ae
9d3d252
46e84ae
9d3d252
 
 
 
 
 
 
 
 
 
46e84ae
4eb22d2
e7a302c
9d3d252
 
 
 
 
 
 
 
 
e7a302c
9d3d252
e7a302c
9d3d252
 
 
 
 
e7a302c
9d3d252
be2ee28
9d3d252
58778b6
4eb22d2
be2ee28
4eb22d2
 
 
be2ee28
4eb22d2
3552f76
9d3d252
 
 
 
 
 
be2ee28
4eb22d2
3552f76
4eb22d2
3552f76
4eb22d2
a7b4011
9388767

---
license: apache-2.0
language:
  - en
tags:
  - tool-calling
  - function-calling
  - prism
  - synalux
  - memory-augmented
  - LoRA
  - Q4_K_M
base_model: Qwen/Qwen3-32B
pipeline_tag: text-generation
---

# Prism Coder 32B — Tool-Routing Model

Fine-tuned Qwen3-32B for routing user requests to the correct Prism Memory tool. 17 tools + NO_TOOL abstention across 9 evaluation categories.

## What this model does

Routes natural language requests to the correct Prism Memory tool (session_save_ledger, session_load_context, knowledge_search, etc.). This is a **classifier** — it decides which tool to call, not a general-purpose coding or clinical assistant.

## What this model does NOT do

- General code generation (not trained on code)
- Clinical note writing (not trained on clinical data)
- Codebase understanding (does not know Synalux internals)
- General reasoning beyond base Qwen3-32B capability

## Performance

| Metric | Score | Notes |
|--------|-------|-------|
| eval_300 strict (model only) | **292/300 (97.3%)** | Model's raw accuracy |
| eval_300 strict (with post-processing) | **300/300 (100%)** | 8 cases fixed by validate_tool_call regex layer |
| 3-seed validation | 300/300 x 3 | With post-processing |
| avg latency | 1.4s | Apple M5 Max |
| context window | 16,384 tokens | |

The eval harness includes a `validate_tool_call` post-processing layer that remaps 8 edge cases the model gets wrong (e.g., "repair links" → backfill_links, "log a milestone" → save_experience). Without this layer, raw model accuracy is 97.3%.

## Training

- **Base**: Qwen/Qwen3-32B (4-bit quantized for training via MLX)
- **Method**: LoRA SFT (rank=16, 8 of 64 layers, scale=20.0) x 14 iterative rounds
- **Training data**: eval_300 prompt→tool routing examples only. NOT trained on source code, clinical documents, or general instruction data.
- **Quantization**: Q4_K_M via llama.cpp (18 GB)
- **Hardware**: Apple M5 Max 48 GB unified memory

## Upcoming

A stacked LoRA adapter (layers 1-16) trained on Synalux codebase, clinical protocols, and Prism Memory internals is in progress. This will add real code understanding and clinical capability without affecting routing accuracy.

## Usage

```bash
ollama pull dcostenco/prism-coder:32b
```

## Model Family

| Model | Size | eval_300 (raw) | eval_300 (with post-processing) |
|-------|------|---------------|-------------------------------|
| prism-coder:1b7 | 2.2 GB | 100% | 100% |
| prism-coder:4b | 2.5 GB | 100% | 100% |
| prism-coder:14b | 9.0 GB | ~97% | 99.7% |
| **prism-coder:32b** | **18 GB** | **97.3%** | **100%** |

## License

Apache 2.0

## Author

[Synalux](https://synalux.com)