Instructions to use dcostenco/prism-coder-1.7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use dcostenco/prism-coder-1.7b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="dcostenco/prism-coder-1.7b", filename="prism-aac-1b7-q4km.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use dcostenco/prism-coder-1.7b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf dcostenco/prism-coder-1.7b:Q8_0 # Run inference directly in the terminal: llama-cli -hf dcostenco/prism-coder-1.7b:Q8_0
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf dcostenco/prism-coder-1.7b:Q8_0 # Run inference directly in the terminal: llama-cli -hf dcostenco/prism-coder-1.7b:Q8_0
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf dcostenco/prism-coder-1.7b:Q8_0 # Run inference directly in the terminal: ./llama-cli -hf dcostenco/prism-coder-1.7b:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf dcostenco/prism-coder-1.7b:Q8_0 # Run inference directly in the terminal: ./build/bin/llama-cli -hf dcostenco/prism-coder-1.7b:Q8_0
Use Docker
docker model run hf.co/dcostenco/prism-coder-1.7b:Q8_0
- LM Studio
- Jan
- Ollama
How to use dcostenco/prism-coder-1.7b with Ollama:
ollama run hf.co/dcostenco/prism-coder-1.7b:Q8_0
- Unsloth Studio
How to use dcostenco/prism-coder-1.7b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for dcostenco/prism-coder-1.7b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for dcostenco/prism-coder-1.7b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for dcostenco/prism-coder-1.7b to start chatting
- Pi
How to use dcostenco/prism-coder-1.7b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf dcostenco/prism-coder-1.7b:Q8_0
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "dcostenco/prism-coder-1.7b:Q8_0" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use dcostenco/prism-coder-1.7b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf dcostenco/prism-coder-1.7b:Q8_0
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default dcostenco/prism-coder-1.7b:Q8_0
Run Hermes
hermes
- Docker Model Runner
How to use dcostenco/prism-coder-1.7b with Docker Model Runner:
docker model run hf.co/dcostenco/prism-coder-1.7b:Q8_0
- Lemonade
How to use dcostenco/prism-coder-1.7b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull dcostenco/prism-coder-1.7b:Q8_0
Run and chat with the model
lemonade run user.prism-coder-1.7b-Q8_0
List all available models
lemonade list
File size: 3,913 Bytes
04be453 a586bd2 04be453 6ee010d 625b3be 6ee010d a586bd2 625b3be 8e12d5c 04be453 625b3be 04be453 625b3be 8e12d5c 625b3be 8e12d5c 625b3be 8e12d5c 625b3be a82167d 6ee010d 625b3be 6ee010d 8e12d5c 625b3be a82167d 625b3be a82167d 8e12d5c a82167d 625b3be a82167d 8e12d5c 6ee010d 8e12d5c 625b3be 6ee010d 625b3be | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 | ---
language: en
license: apache-2.0
tags:
- tool-routing
- function-calling
- prism-coder
- qwen3
- gguf
- synalux
base_model: Qwen/Qwen3-1.7B
---
# prism-coder:1b7 β 17-Tool Memory Agent (Always-Fits Tier)
Fine-tuned Qwen3-1.7B for full Prism Memory tool routing in the [Prism Coder](https://ollama.com/dcostenco/prism-coder) system.
Primary deployment: **any device** via llama.cpp GGUF β the ultra-lightweight tier.
## eval_300 Benchmark β swe43 (Current)
**300/300 Γ 3 shuffled runs = 100.0%, 0 flaky**
| Category | Count | Description | Accuracy |
|----------|------:|-------------|:--------:|
| natural_phrasing | 50 | Natural language β correct tool | 100% |
| adversarial_trap | 70 | Coding/CS questions β plain text (no tool) | 100% |
| disambiguation | 40 | Ambiguous session vs knowledge ops | 100% |
| edge_case | 25 | Self-description, capability queries β plain text | 100% |
| verifier | 25 | Verify-then-act chains | 100% |
| param_extraction | 25 | Extract project/query from prompt | 100% |
| cascade | 25 | Multi-step tool chains | 100% |
| multi_intent | 20 | Compound instructions | 100% |
| abstention | 20 | Greetings, math, creative requests β plain text | 100% |
300 test cases, 3 shuffled runs, temperature=0, 0 hallucinations across all runs.
## Tools
Routes to 17 Prism Memory tools + knows when NOT to call any tool:
| Tool | Trigger |
|------|---------|
| `session_load_context` | Load/resume project context, "starting fresh" |
| `session_save_ledger` | Log/record completed work |
| `session_save_handoff` | Create handoff note for next session |
| `session_search_memory` | Recall prior discussions |
| `session_forget_memory` | Delete a memory entry |
| `session_health_check` | Check session system health |
| `session_compact_ledger` | Compact/prune session ledger |
| `session_export_memory` | Export session data |
| `session_task_route` | Route task: local vs cloud |
| `session_save_experience` | Save a notable experience |
| `session_synthesize_edges` | Build session graph edges |
| `session_backfill_links` | Repair dangling session links |
| `knowledge_search` | Search stored knowledge base |
| `knowledge_forget` | Remove a knowledge entry |
| `knowledge_upvote` | Upvote knowledge entry |
| `knowledge_downvote` | Downvote knowledge entry |
| `knowledge_set_retention` | Set retention policy |
**Abstains (plain text)** for: coding questions, CS concepts, arithmetic, greetings, capability queries, creative requests, general knowledge.
## Version History
| Version | eval_300 | Notes |
|---------|---------|-------|
| swe43 | **300/300 Γ 3 runs = 100.0%** | Fresh rank=32 LoRA + `<think>` routing, Q8_0 GGUF |
| swe30 | 280/300 = 93.3% | Q8_0 first round (fixed Q4KM quantization erasure) |
| v43l | 203/300 = 67.7% | Baseline before SWE training |
| v42 | 100% BFCL 6-tool | Previous 6-tool routing model |
## Key Training Insights
- **Q8_0 quantization required** β Q4KM erased LoRA deltas for soft abstain patterns (87%β93% at R30)
- **Adapter saturation** β After 39 cumulative rounds at rank=8, adapter was saturated. Fresh rank=32 on R39-merged base broke plateau in one round (93.3%β99.7%)
- **`<think>` routing blocks** β Added CoT reasoning to abstain examples activates Qwen3's pretrained thinking circuit, providing explicit gradient path for the routing decision
## Model Details
- **Base**: Qwen/Qwen3-1.7B β merged through 43 SWE training rounds
- **Format**: GGUF Q8_0 (2.2 GB)
- **Context**: 8,192 tokens
- **Final adapter**: MLX LoRA rank=32, all 28 layers, LR=3e-6β8e-7, 1,267 train rows/round
- **Total training**: 43 rounds of cumulative SFT + 4 fresh rank=32 rounds
## Usage
```bash
ollama pull dcostenco/prism-coder:1b7
ollama run dcostenco/prism-coder:1b7
```
Or via the [Synalux Prism MCP server](https://github.com/dcostenco/prism-mcp) which routes tool calls automatically.
|