Instructions to use dcostenco/prism-coder-1.7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dcostenco/prism-coder-1.7b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="dcostenco/prism-coder-1.7b",
	filename="prism-aac-1b7-q4km.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use dcostenco/prism-coder-1.7b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf dcostenco/prism-coder-1.7b:Q8_0
# Run inference directly in the terminal:
llama-cli -hf dcostenco/prism-coder-1.7b:Q8_0

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf dcostenco/prism-coder-1.7b:Q8_0
# Run inference directly in the terminal:
llama-cli -hf dcostenco/prism-coder-1.7b:Q8_0

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf dcostenco/prism-coder-1.7b:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf dcostenco/prism-coder-1.7b:Q8_0

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf dcostenco/prism-coder-1.7b:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf dcostenco/prism-coder-1.7b:Q8_0

Use Docker

docker model run hf.co/dcostenco/prism-coder-1.7b:Q8_0

LM Studio
Jan
Ollama
How to use dcostenco/prism-coder-1.7b with Ollama:
```
ollama run hf.co/dcostenco/prism-coder-1.7b:Q8_0
```

Unsloth Studio

How to use dcostenco/prism-coder-1.7b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for dcostenco/prism-coder-1.7b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for dcostenco/prism-coder-1.7b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for dcostenco/prism-coder-1.7b to start chatting

How to use dcostenco/prism-coder-1.7b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf dcostenco/prism-coder-1.7b:Q8_0

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "dcostenco/prism-coder-1.7b:Q8_0"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use dcostenco/prism-coder-1.7b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf dcostenco/prism-coder-1.7b:Q8_0

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default dcostenco/prism-coder-1.7b:Q8_0

Run Hermes

hermes

Docker Model Runner
How to use dcostenco/prism-coder-1.7b with Docker Model Runner:
```
docker model run hf.co/dcostenco/prism-coder-1.7b:Q8_0
```

Lemonade

How to use dcostenco/prism-coder-1.7b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull dcostenco/prism-coder-1.7b:Q8_0

Run and chat with the model

lemonade run user.prism-coder-1.7b-Q8_0

List all available models

lemonade list

prism-coder-1.7b

File size: 3,913 Bytes

04be453
 
a586bd2
04be453
6ee010d
 
625b3be
6ee010d
a586bd2
625b3be
8e12d5c
04be453
 
625b3be
04be453
625b3be
 
8e12d5c
625b3be
8e12d5c
625b3be
8e12d5c
 
 
625b3be
 
 
 
 
 
 
 
 
 
 
a82167d
6ee010d
 
625b3be
6ee010d
8e12d5c
 
625b3be
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a82167d
625b3be
 
 
a82167d
8e12d5c
a82167d
625b3be
 
 
 
 
a82167d
8e12d5c
6ee010d
 
8e12d5c
625b3be
6ee010d
 
625b3be

---
language: en
license: apache-2.0
tags:
  - tool-routing
  - function-calling
  - prism-coder
  - qwen3
  - gguf
  - synalux
base_model: Qwen/Qwen3-1.7B
---

# prism-coder:1b7 — 17-Tool Memory Agent (Always-Fits Tier)

Fine-tuned Qwen3-1.7B for full Prism Memory tool routing in the [Prism Coder](https://ollama.com/dcostenco/prism-coder) system.
Primary deployment: **any device** via llama.cpp GGUF — the ultra-lightweight tier.

## eval_300 Benchmark — swe43 (Current)

**300/300 × 3 shuffled runs = 100.0%, 0 flaky**

| Category | Count | Description | Accuracy |
|----------|------:|-------------|:--------:|
| natural_phrasing | 50 | Natural language → correct tool | 100% |
| adversarial_trap | 70 | Coding/CS questions → plain text (no tool) | 100% |
| disambiguation | 40 | Ambiguous session vs knowledge ops | 100% |
| edge_case | 25 | Self-description, capability queries → plain text | 100% |
| verifier | 25 | Verify-then-act chains | 100% |
| param_extraction | 25 | Extract project/query from prompt | 100% |
| cascade | 25 | Multi-step tool chains | 100% |
| multi_intent | 20 | Compound instructions | 100% |
| abstention | 20 | Greetings, math, creative requests → plain text | 100% |

300 test cases, 3 shuffled runs, temperature=0, 0 hallucinations across all runs.

## Tools

Routes to 17 Prism Memory tools + knows when NOT to call any tool:

| Tool | Trigger |
|------|---------|
| `session_load_context` | Load/resume project context, "starting fresh" |
| `session_save_ledger` | Log/record completed work |
| `session_save_handoff` | Create handoff note for next session |
| `session_search_memory` | Recall prior discussions |
| `session_forget_memory` | Delete a memory entry |
| `session_health_check` | Check session system health |
| `session_compact_ledger` | Compact/prune session ledger |
| `session_export_memory` | Export session data |
| `session_task_route` | Route task: local vs cloud |
| `session_save_experience` | Save a notable experience |
| `session_synthesize_edges` | Build session graph edges |
| `session_backfill_links` | Repair dangling session links |
| `knowledge_search` | Search stored knowledge base |
| `knowledge_forget` | Remove a knowledge entry |
| `knowledge_upvote` | Upvote knowledge entry |
| `knowledge_downvote` | Downvote knowledge entry |
| `knowledge_set_retention` | Set retention policy |

**Abstains (plain text)** for: coding questions, CS concepts, arithmetic, greetings, capability queries, creative requests, general knowledge.

## Version History

| Version | eval_300 | Notes |
|---------|---------|-------|
| swe43 | **300/300 × 3 runs = 100.0%** | Fresh rank=32 LoRA + `<think>` routing, Q8_0 GGUF |
| swe30 | 280/300 = 93.3% | Q8_0 first round (fixed Q4KM quantization erasure) |
| v43l | 203/300 = 67.7% | Baseline before SWE training |
| v42 | 100% BFCL 6-tool | Previous 6-tool routing model |

## Key Training Insights

- **Q8_0 quantization required** — Q4KM erased LoRA deltas for soft abstain patterns (87%→93% at R30)
- **Adapter saturation** — After 39 cumulative rounds at rank=8, adapter was saturated. Fresh rank=32 on R39-merged base broke plateau in one round (93.3%→99.7%)  
- **`<think>` routing blocks** — Added CoT reasoning to abstain examples activates Qwen3's pretrained thinking circuit, providing explicit gradient path for the routing decision

## Model Details

- **Base**: Qwen/Qwen3-1.7B → merged through 43 SWE training rounds
- **Format**: GGUF Q8_0 (2.2 GB)
- **Context**: 8,192 tokens
- **Final adapter**: MLX LoRA rank=32, all 28 layers, LR=3e-6→8e-7, 1,267 train rows/round
- **Total training**: 43 rounds of cumulative SFT + 4 fresh rank=32 rounds

## Usage

```bash
ollama pull dcostenco/prism-coder:1b7
ollama run dcostenco/prism-coder:1b7
```

Or via the [Synalux Prism MCP server](https://github.com/dcostenco/prism-mcp) which routes tool calls automatically.