Text Generation
GGUF
English
tool-calling
function-calling
prism
synalux
memory-augmented
LoRA
Q4_K_M
conversational
Instructions to use dcostenco/prism-coder-32b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use dcostenco/prism-coder-32b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="dcostenco/prism-coder-32b", filename="prism-coder-32b-q4km.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use dcostenco/prism-coder-32b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf dcostenco/prism-coder-32b # Run inference directly in the terminal: llama-cli -hf dcostenco/prism-coder-32b
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf dcostenco/prism-coder-32b # Run inference directly in the terminal: llama-cli -hf dcostenco/prism-coder-32b
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf dcostenco/prism-coder-32b # Run inference directly in the terminal: ./llama-cli -hf dcostenco/prism-coder-32b
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf dcostenco/prism-coder-32b # Run inference directly in the terminal: ./build/bin/llama-cli -hf dcostenco/prism-coder-32b
Use Docker
docker model run hf.co/dcostenco/prism-coder-32b
- LM Studio
- Jan
- vLLM
How to use dcostenco/prism-coder-32b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dcostenco/prism-coder-32b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dcostenco/prism-coder-32b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/dcostenco/prism-coder-32b
- Ollama
How to use dcostenco/prism-coder-32b with Ollama:
ollama run hf.co/dcostenco/prism-coder-32b
- Unsloth Studio new
How to use dcostenco/prism-coder-32b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for dcostenco/prism-coder-32b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for dcostenco/prism-coder-32b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for dcostenco/prism-coder-32b to start chatting
- Pi new
How to use dcostenco/prism-coder-32b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf dcostenco/prism-coder-32b
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "dcostenco/prism-coder-32b" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use dcostenco/prism-coder-32b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf dcostenco/prism-coder-32b
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default dcostenco/prism-coder-32b
Run Hermes
hermes
- Docker Model Runner
How to use dcostenco/prism-coder-32b with Docker Model Runner:
docker model run hf.co/dcostenco/prism-coder-32b
- Lemonade
How to use dcostenco/prism-coder-32b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull dcostenco/prism-coder-32b
Run and chat with the model
lemonade run user.prism-coder-32b-{{QUANT_TAG}}List all available models
lemonade list
File size: 2,676 Bytes
a7b4011 6b0fc59 4eb22d2 a7b4011 6b0fc59 4eb22d2 a7b4011 9d3d252 46e84ae 9d3d252 46e84ae 9d3d252 46e84ae 4eb22d2 e7a302c 9d3d252 e7a302c 9d3d252 e7a302c 9d3d252 e7a302c 9d3d252 be2ee28 9d3d252 58778b6 4eb22d2 be2ee28 4eb22d2 be2ee28 4eb22d2 3552f76 9d3d252 be2ee28 4eb22d2 3552f76 4eb22d2 3552f76 4eb22d2 a7b4011 9388767 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 | ---
license: apache-2.0
language:
- en
tags:
- tool-calling
- function-calling
- prism
- synalux
- memory-augmented
- LoRA
- Q4_K_M
base_model: Qwen/Qwen3-32B
pipeline_tag: text-generation
---
# Prism Coder 32B — Tool-Routing Model
Fine-tuned Qwen3-32B for routing user requests to the correct Prism Memory tool. 17 tools + NO_TOOL abstention across 9 evaluation categories.
## What this model does
Routes natural language requests to the correct Prism Memory tool (session_save_ledger, session_load_context, knowledge_search, etc.). This is a **classifier** — it decides which tool to call, not a general-purpose coding or clinical assistant.
## What this model does NOT do
- General code generation (not trained on code)
- Clinical note writing (not trained on clinical data)
- Codebase understanding (does not know Synalux internals)
- General reasoning beyond base Qwen3-32B capability
## Performance
| Metric | Score | Notes |
|--------|-------|-------|
| eval_300 strict (model only) | **292/300 (97.3%)** | Model's raw accuracy |
| eval_300 strict (with post-processing) | **300/300 (100%)** | 8 cases fixed by validate_tool_call regex layer |
| 3-seed validation | 300/300 x 3 | With post-processing |
| avg latency | 1.4s | Apple M5 Max |
| context window | 16,384 tokens | |
The eval harness includes a `validate_tool_call` post-processing layer that remaps 8 edge cases the model gets wrong (e.g., "repair links" → backfill_links, "log a milestone" → save_experience). Without this layer, raw model accuracy is 97.3%.
## Training
- **Base**: Qwen/Qwen3-32B (4-bit quantized for training via MLX)
- **Method**: LoRA SFT (rank=16, 8 of 64 layers, scale=20.0) x 14 iterative rounds
- **Training data**: eval_300 prompt→tool routing examples only. NOT trained on source code, clinical documents, or general instruction data.
- **Quantization**: Q4_K_M via llama.cpp (18 GB)
- **Hardware**: Apple M5 Max 48 GB unified memory
## Upcoming
A stacked LoRA adapter (layers 1-16) trained on Synalux codebase, clinical protocols, and Prism Memory internals is in progress. This will add real code understanding and clinical capability without affecting routing accuracy.
## Usage
```bash
ollama pull dcostenco/prism-coder:32b
```
## Model Family
| Model | Size | eval_300 (raw) | eval_300 (with post-processing) |
|-------|------|---------------|-------------------------------|
| prism-coder:1b7 | 2.2 GB | 100% | 100% |
| prism-coder:4b | 2.5 GB | 100% | 100% |
| prism-coder:14b | 9.0 GB | ~97% | 99.7% |
| **prism-coder:32b** | **18 GB** | **97.3%** | **100%** |
## License
Apache 2.0
## Author
[Synalux](https://synalux.com)
|