Instructions to use dcostenco/prism-coder-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use dcostenco/prism-coder-4b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="dcostenco/prism-coder-4b", filename="prism-coder-4b-v43-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use dcostenco/prism-coder-4b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf dcostenco/prism-coder-4b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf dcostenco/prism-coder-4b:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf dcostenco/prism-coder-4b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf dcostenco/prism-coder-4b:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf dcostenco/prism-coder-4b:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf dcostenco/prism-coder-4b:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf dcostenco/prism-coder-4b:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf dcostenco/prism-coder-4b:Q4_K_M
Use Docker
docker model run hf.co/dcostenco/prism-coder-4b:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use dcostenco/prism-coder-4b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dcostenco/prism-coder-4b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dcostenco/prism-coder-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/dcostenco/prism-coder-4b:Q4_K_M
- Ollama
How to use dcostenco/prism-coder-4b with Ollama:
ollama run hf.co/dcostenco/prism-coder-4b:Q4_K_M
- Unsloth Studio
How to use dcostenco/prism-coder-4b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for dcostenco/prism-coder-4b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for dcostenco/prism-coder-4b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for dcostenco/prism-coder-4b to start chatting
- Pi
How to use dcostenco/prism-coder-4b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf dcostenco/prism-coder-4b:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "dcostenco/prism-coder-4b:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use dcostenco/prism-coder-4b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf dcostenco/prism-coder-4b:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default dcostenco/prism-coder-4b:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use dcostenco/prism-coder-4b with Docker Model Runner:
docker model run hf.co/dcostenco/prism-coder-4b:Q4_K_M
- Lemonade
How to use dcostenco/prism-coder-4b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull dcostenco/prism-coder-4b:Q4_K_M
Run and chat with the model
lemonade run user.prism-coder-4b-Q4_K_M
List all available models
lemonade list
File size: 17,394 Bytes
30020c5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 | #!/usr/bin/env python3
"""
build_4b_v43_patch2.py β Second surgical patch targeting 8 specific BFCL failures.
86.9% β 100% target. Exact failures addressed:
1. knowledge_search vs session_search_memory (3 failures)
2. session_task_route for "local or cloud?" (1 failure)
3. session_delete_memory (hallucinated) β session_forget_memory (1 failure)
4. session_init (hallucinated) β session_load_context (1 failure)
5. knowledge_forget vs session_forget_memory (1 failure)
6. session_save_experience vs session_save_ledger on followup (1 failure)
7. CS abstain even when "retry/backoff" appears in prompt (1 failure)
"""
import json, random
from pathlib import Path
random.seed(2028)
SYS_PROMPT = (
"You are Synalux, a memory-augmented coding and clinical reasoning assistant. "
"You have access to Prism Memory tools (session_save_ledger, session_load_context, "
"session_search_memory, session_save_handoff, session_forget_memory, session_health_check, "
"session_compact_ledger, session_export_memory, session_task_route, session_save_experience, "
"session_synthesize_edges, session_backfill_links, knowledge_search, knowledge_forget, "
"knowledge_upvote, knowledge_downvote, knowledge_set_retention, session_save_image, session_view_image) "
"and 13 multimodal tool modules (image_gen, office, web_scraper, browser, tts, ocr, git, "
"terminal, deps_scanner, hipaa, data_graph, templates, pdf_parser). "
"Think step-by-step before answering. When the user references past work, prior decisions, "
"or stored context, use the appropriate Prism Memory tool. "
"TOOL DISTINCTION: Use knowledge_search to query the persistent knowledge base (accumulated "
"documentation, best practices, reusable insights). Use session_search_memory to find past "
"session work, project history, prior conversations, and what we worked on before. "
"Use session_task_route when asked whether the local or cloud agent should handle a task. "
"Format tool calls inside <tool_call>...</tool_call> JSON blocks with fields 'name' and 'arguments'. "
"If no tool is needed, answer directly in plain text. "
"ABSTAIN for general programming questions, CS concepts (algorithms, data structures, "
"networking, design patterns, frameworks), greetings, and capability questions β even if "
"the question mentions words like 'retry', 'session', 'memory', or 'knowledge' in a CS context."
)
def ex(user, tool_name, args):
args_json = json.dumps(args, ensure_ascii=False)
return {"text": (
f"<|im_start|>system\n{SYS_PROMPT}<|im_end|>\n"
f"<|im_start|>user\n{user}<|im_end|>\n"
f"<|im_start|>assistant\n"
f"<tool_call>\n"
f'{{\"name\": \"{tool_name}\", \"arguments\": {args_json}}}\n'
f"</tool_call>\n<|im_end|>"
)}
def ex_abstain(user, reply):
return {"text": (
f"<|im_start|>system\n{SYS_PROMPT}<|im_end|>\n"
f"<|im_start|>user\n{user}<|im_end|>\n"
f"<|im_start|>assistant\n{reply}<|im_end|>"
)}
def ex_multiturn(user, tool1, args1, tool_resp, tool2, args2):
a1 = json.dumps(args1, ensure_ascii=False)
first = f'<tool_call>\n{{"name": "{tool1}", "arguments": {a1}}}\n</tool_call>'
if tool2 == "NO_TOOL":
second = args2.get("reply", "Done.")
else:
a2 = json.dumps(args2, ensure_ascii=False)
second = f'<tool_call>\n{{"name": "{tool2}", "arguments": {a2}}}\n</tool_call>'
return {"text": (
f"<|im_start|>system\n{SYS_PROMPT}<|im_end|>\n"
f"<|im_start|>user\n{user}<|im_end|>\n"
f"<|im_start|>assistant\n{first}<|im_end|>\n"
f"<|im_start|>tool\n{tool_resp}<|im_end|>\n"
f"<|im_start|>assistant\n{second}<|im_end|>"
)}
rows = []
# =============================================================================
# FIX 1: knowledge_search vs session_search_memory (40 examples each)
# =============================================================================
# knowledge_search: the persistent knowledge base, accumulated docs, best practices, reusable insights
ks_prompts = [
"Search our accumulated documentation for {topic}.",
"Look up {topic} in the knowledge base.",
"Find {topic} in our knowledge base.",
"Search knowledge for {topic}.",
"Query the knowledge base for {topic}.",
"What does our knowledge base say about {topic}?",
"Check the accumulated knowledge for {topic}.",
"Find {topic} in our documentation knowledge.",
"Search persisted knowledge for {topic}.",
"Pull up knowledge base entries about {topic}.",
"Look for {topic} in the knowledge repository.",
"Find reusable insights about {topic}.",
"Knowledge base search: {topic}.",
"Find best practices for {topic} in our knowledge base.",
"Search the knowledge store for {topic}.",
]
ks_topics = [
"WebSocket best practices", "retry strategies", "caching patterns",
"auth flow", "rate limiting", "database indexing", "circuit breaker pattern",
"API versioning", "error handling strategies", "deployment checklists",
"code review guidelines", "security best practices", "logging conventions",
"microservice communication", "data validation patterns",
]
for i in range(40):
topic = ks_topics[i % len(ks_topics)]
user = ks_prompts[i % len(ks_prompts)].format(topic=topic)
rows.append(ex(user, "knowledge_search", {"query": topic}))
# session_search_memory: past sessions, what we worked on, project history, prior decisions
ssm_prompts = [
"What did we work on last time for {proj}?",
"Search my session history for {topic}.",
"Find what we discussed about {topic} in past sessions.",
"Look up our prior work on {topic}.",
"What have we worked on related to {topic}?",
"Find previous decisions about {topic} in my memory.",
"Search session memory for {topic}.",
"What did we decide about {topic} last time?",
"Look through our past sessions for {topic}.",
"Find recent session work on {topic}.",
]
ssm_topics = [
"the auth module", "the deploy pipeline", "the payment service", "database migrations",
"the API gateway", "the caching layer", "the websocket handler", "performance optimization",
]
projs = ["portal", "analytics", "billing", "auth-service", "dashboard"]
for i in range(40):
topic = ssm_topics[i % len(ssm_topics)]
proj = projs[i % len(projs)]
user = ssm_prompts[i % len(ssm_prompts)].format(topic=topic, proj=proj)
rows.append(ex(user, "session_search_memory", {"query": topic}))
print(f"After FIX 1 (knowledge vs session search): {len(rows)} rows")
# =============================================================================
# FIX 2: session_task_route (30 examples)
# =============================================================================
task_route_prompts = [
"Should the local agent handle this {task}? If cloud, just tell me.",
"Route this {task} β local or cloud?",
"Should I run this {task} locally or use the cloud model?",
"Task routing for {task}: local agent or cloud?",
"Is this {task} suitable for the local agent?",
"Which agent should handle this {task}: local or host?",
"Route: should local handle this {task}?",
"Local or cloud for {task}?",
"Task route check: can local model do this {task}?",
"Should I use the local model for {task} or route to cloud?",
]
tasks = [
"TypeScript refactor", "Python debugging", "code review",
"SQL query optimization", "React component", "security audit",
"performance profiling", "architecture design", "bug fix",
"unit test generation",
]
for i in range(30):
task = tasks[i % len(tasks)]
user = task_route_prompts[i % len(task_route_prompts)].format(task=task)
rows.append(ex(user, "session_task_route", {"task": task}))
print(f"After FIX 2 (session_task_route): {len(rows)} rows")
# =============================================================================
# FIX 3: session_forget_memory (not session_delete_memory β doesn't exist)
# =============================================================================
forget_prompts = [
"Delete memory entry '{mem_id}' β it's outdated.",
"Remove memory entry {mem_id} from my session memory.",
"Forget memory entry ID {mem_id}.",
"Delete specific memory {mem_id}.",
"Clear out memory entry {mem_id} β it's wrong.",
"Remove the memory with id {mem_id}.",
"Erase memory entry {mem_id}.",
"Drop memory {mem_id} from session.",
]
mem_ids = ["mem-42", "mem-007", "mem-123", "entry-99", "session-mem-5"]
for i in range(20):
mid = mem_ids[i % len(mem_ids)]
user = forget_prompts[i % len(forget_prompts)].format(mem_id=mid)
rows.append(ex(user, "session_forget_memory", {"memory_id": mid}))
print(f"After FIX 3 (session_forget_memory not delete): {len(rows)} rows")
# =============================================================================
# FIX 4: session_load_context for "initialize/start/begin/setup" context
# =============================================================================
init_prompts = [
"Initialize the session context for project {proj} at the {level} level.",
"Start up the session context for {proj}.",
"Begin session with context for {proj}.",
"Set up context for {proj} project.",
"Init session for {proj} at {level} level.",
"Please initialize session context for {proj}.",
"Start loading context for {proj}.",
"Open up the context for project {proj}.",
"Boot up context for {proj}.",
"Set context for {proj} ({level}).",
]
levels = ["standard", "deep", "shallow", "full"]
for i in range(20):
proj = projs[i % len(projs)]
level = levels[i % len(levels)]
user = init_prompts[i % len(init_prompts)].format(proj=proj, level=level)
rows.append(ex(user, "session_load_context", {"project": proj, "level": level}))
print(f"After FIX 4 (session_load_context for 'initialize'): {len(rows)} rows")
# =============================================================================
# FIX 5: knowledge_forget vs session_forget_memory
# knowledge_forget = clear knowledge base entries (by category/project)
# session_forget_memory = clear a specific session memory entry (by ID)
# =============================================================================
kf_prompts = [
"Clear out all old knowledge entries in the '{cat}' category for {proj}.",
"Remove all {cat} knowledge entries for the {proj} project.",
"Forget all knowledge about {cat} in {proj}.",
"Delete {proj} knowledge entries tagged {cat}.",
"Purge {cat} knowledge for {proj}.",
"Clear the {cat} knowledge base entries for {proj}.",
"Remove all {cat}-category knowledge from {proj}.",
"Delete outdated knowledge in {cat} for {proj}.",
]
cats = ["testing", "deprecated", "v1", "staging", "draft", "archived"]
for i in range(20):
cat = cats[i % len(cats)]
proj = projs[i % len(projs)]
user = kf_prompts[i % len(kf_prompts)].format(cat=cat, proj=proj)
rows.append(ex(user, "knowledge_forget", {"project": proj, "category": cat}))
print(f"After FIX 5 (knowledge_forget): {len(rows)} rows")
# =============================================================================
# FIX 6: session_save_experience vs session_save_ledger on followup
# session_save_experience = record a correction/insight/learning (event_type matters)
# session_save_ledger = log a session summary/progress
# =============================================================================
# Multi-turn: load context, then log what we EXPERIENCED (correction/insight)
load_experience_chains = [
("Load context for {proj} and then log that we tried {what} but should have used {better} instead.",
"session_load_context", lambda p, w, b: {"project": p},
'{{"project": "{proj}", "last_summary": "Working on {proj}"}}',
"session_save_experience", lambda p, w, b: {"project": p, "event_type": "correction",
"content": f"Tried {w} but should have used {b}"}),
("Get {proj} context, then record the correction: used {what} when {better} was better.",
"session_load_context", lambda p, w, b: {"project": p},
'{{"project": "{proj}"}}',
"session_save_experience", lambda p, w, b: {"project": p, "event_type": "correction",
"content": f"Used {what} instead of {better}"}),
]
whats = ["batch inserts", "polling", "mutex locks", "REST calls", "eager loading"]
betters = ["streaming writes", "webhooks", "read-write locks", "GraphQL", "lazy loading"]
for i in range(15):
ch = load_experience_chains[i % len(load_experience_chains)]
proj = projs[i % len(projs)]
what = whats[i % len(whats)]
better = betters[i % len(betters)]
user = ch[0].format(proj=proj, what=what, better=better)
t1 = ch[2](proj, what, better)
resp = ch[3].format(proj=proj)
t2 = ch[5](proj, what, better)
rows.append(ex_multiturn(user, ch[1], t1, resp, ch[4], t2))
# Distinguish ledger (progress/session log) from experience (insight/correction)
ledger_vs_exp = [
("Load {proj} context, then save a session ledger entry about today's progress.",
"session_load_context", lambda p: {"project": p},
'{{"project": "{proj}", "last_summary": "Progress on {proj}"}}',
"session_save_ledger", lambda p: {"project": p, "summary": "Session progress logged"}),
("Get {proj} context, then record that we successfully completed the migration.",
"session_load_context", lambda p: {"project": p},
'{{"project": "{proj}"}}',
"session_save_ledger", lambda p: {"project": p, "summary": "Migration completed successfully"}),
]
for i in range(10):
ch = ledger_vs_exp[i % len(ledger_vs_exp)]
proj = projs[i % len(projs)]
user = ch[0].format(proj=proj)
t1 = ch[2](proj)
resp = ch[3].format(proj=proj)
t2 = ch[5](proj)
rows.append(ex_multiturn(user, ch[1], t1, resp, ch[4], t2))
print(f"After FIX 6 (save_experience vs save_ledger): {len(rows)} rows")
# =============================================================================
# FIX 7: CS abstain even with "retry/backoff/session/memory/knowledge" in CS context
# =============================================================================
cs_with_prism_keywords = [
("How do I implement exponential backoff with jitter for API retries?",
"Exponential backoff with jitter: `wait = random.uniform(0, base * 2^attempt)`. Cap at max_delay. This prevents thundering herd. Pure CS β no Prism tool needed."),
("Explain retry semantics in distributed systems.",
"Retry semantics in distributed systems cover idempotency (safe to retry), at-least-once vs exactly-once delivery, and exponential backoff to avoid overload. This is a CS concept, not a Prism memory operation."),
("How does session management work in Django?",
"Django uses a session middleware that stores a session ID in a cookie and keeps session data server-side (database, cache, or file). Access via `request.session`. Not a Prism memory operation."),
("What is memory management in C++?",
"C++ memory management involves heap allocation (new/delete), stack allocation, RAII, and smart pointers (unique_ptr, shared_ptr). Not related to Prism session memory."),
("Explain knowledge graphs and how they work.",
"Knowledge graphs represent entities and relationships as nodes and edges. They power semantic search and reasoning engines. This is a CS/ML concept β not a Prism knowledge_search operation."),
("How do I implement retry logic with exponential backoff in Python?",
"Use the `tenacity` library or implement manually: `time.sleep(min(max_delay, base * 2**attempt) * random.uniform(0.5, 1.5))`. This is a general Python pattern, not a Prism memory task."),
("What is session affinity in load balancers?",
"Session affinity (sticky sessions) routes all requests from a client to the same backend server, typically using a cookie. This is a networking/infrastructure concept, not a Prism operation."),
("How does Redis handle memory eviction?",
"Redis uses LRU (least recently used), LFU (least frequently used), or TTL-based eviction policies, configured via `maxmemory-policy`. This is a Redis/CS concept, not a Prism memory tool call."),
("Explain the actor model for concurrent programming.",
"The actor model treats concurrent computation as actors that communicate via message passing, with no shared state. Erlang, Akka, and Pony use this model. Pure CS concept."),
("What is the difference between a stack and a queue?",
"Stack: LIFO (last in, first out) β push/pop. Queue: FIFO (first in, first out) β enqueue/dequeue. These are fundamental data structures."),
]
for user, reply in cs_with_prism_keywords:
rows.append(ex_abstain(user, reply))
print(f"After FIX 7 (CS abstain with prism-like keywords): {len(rows)} rows")
# =============================================================================
# SHUFFLE AND WRITE
# =============================================================================
random.shuffle(rows)
out = Path("/tmp/4b_v43_patch2.jsonl")
out.write_text("\n".join(json.dumps(r, ensure_ascii=False) for r in rows) + "\n")
print(f"\nβ
Wrote {len(rows)} patch2 rows to {out}")
|