Instructions to use dcostenco/prism-coder-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use dcostenco/prism-coder-4b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="dcostenco/prism-coder-4b", filename="prism-coder-4b-v43-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use dcostenco/prism-coder-4b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf dcostenco/prism-coder-4b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf dcostenco/prism-coder-4b:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf dcostenco/prism-coder-4b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf dcostenco/prism-coder-4b:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf dcostenco/prism-coder-4b:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf dcostenco/prism-coder-4b:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf dcostenco/prism-coder-4b:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf dcostenco/prism-coder-4b:Q4_K_M
Use Docker
docker model run hf.co/dcostenco/prism-coder-4b:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use dcostenco/prism-coder-4b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dcostenco/prism-coder-4b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dcostenco/prism-coder-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/dcostenco/prism-coder-4b:Q4_K_M
- Ollama
How to use dcostenco/prism-coder-4b with Ollama:
ollama run hf.co/dcostenco/prism-coder-4b:Q4_K_M
- Unsloth Studio
How to use dcostenco/prism-coder-4b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for dcostenco/prism-coder-4b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for dcostenco/prism-coder-4b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for dcostenco/prism-coder-4b to start chatting
- Pi
How to use dcostenco/prism-coder-4b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf dcostenco/prism-coder-4b:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "dcostenco/prism-coder-4b:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use dcostenco/prism-coder-4b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf dcostenco/prism-coder-4b:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default dcostenco/prism-coder-4b:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use dcostenco/prism-coder-4b with Docker Model Runner:
docker model run hf.co/dcostenco/prism-coder-4b:Q4_K_M
- Lemonade
How to use dcostenco/prism-coder-4b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull dcostenco/prism-coder-4b:Q4_K_M
Run and chat with the model
lemonade run user.prism-coder-4b-Q4_K_M
List all available models
lemonade list
| #!/usr/bin/env python3 | |
| """ | |
| Patch4 corpus for prism-coder:4b-v43. | |
| Targets swe-bench failures not fixable by Layer 3: | |
| - Group C: session_task_route false negatives (implicit routing questions) | |
| - Group D: param extraction from natural/informal phrasing | |
| Output format: {"text": "<ChatML string>"} to match existing train.jsonl | |
| """ | |
| import json, pathlib | |
| OUT = pathlib.Path("/tmp/4b_v43_patch4.jsonl") | |
| SYS = ( | |
| "You are Synalux, a memory-augmented coding and clinical reasoning assistant. " | |
| "You have access to Prism Memory tools (session_save_ledger, session_load_context, " | |
| "session_search_memory, session_save_handoff, session_forget_memory, session_health_check, " | |
| "session_compact_ledger, session_export_memory, session_task_route, session_save_experience, " | |
| "session_synthesize_edges, session_backfill_links, knowledge_search, knowledge_forget, " | |
| "knowledge_upvote, knowledge_downvote, knowledge_set_retention) and 13 multimodal tool " | |
| "modules (image_gen, office, web_scraper, browser, tts, ocr, git, terminal, deps_scanner, " | |
| "hipaa, data_graph, templates, pdf_parser). " | |
| "Think step-by-step before answering. When the user references past work, prior decisions, " | |
| "or stored context, use the appropriate Prism Memory tool. " | |
| "Format tool calls inside <tool_call>...</tool_call> JSON blocks with fields 'name' and 'arguments'. " | |
| "If no tool is needed, answer directly in plain text. " | |
| "ABSTAIN for general programming questions, CS concepts, greetings, and capability questions." | |
| ) | |
| THINK_OPEN = "<|synalux_think|>" | |
| THINK_CLOSE = "</|synalux_think|>" | |
| def render_chatml(user_text, assistant_text): | |
| return ( | |
| f"<|im_start|>system\n{SYS}<|im_end|>\n" | |
| f"<|im_start|>user\n{user_text}<|im_end|>\n" | |
| f"<|im_start|>assistant\n{assistant_text}<|im_end|>" | |
| ) | |
| def msg(user, tool_name, args): | |
| call = json.dumps({"name": tool_name, "arguments": args}) | |
| think = f"{THINK_OPEN}The user wants {tool_name.replace('_', ' ')}. I'll call the tool with the right parameters.{THINK_CLOSE}" | |
| assistant = f"{think}\n<tool_call>\n{call}\n</tool_call>" | |
| return {"text": render_chatml(user, assistant)} | |
| def no_tool_msg(user, reply): | |
| think = f"{THINK_OPEN}This is a general programming/CS question. I should answer directly without calling any Prism tool.{THINK_CLOSE}" | |
| assistant = f"{think}\n{reply}" | |
| return {"text": render_chatml(user, assistant)} | |
| rows = [] | |
| # ββ GROUP C: session_task_route β implicit "should I/local/cloud?" questions ββ | |
| task_route_cases = [ | |
| ("Should I handle this CSS grid refactor myself or punt it to the cloud model?", | |
| {"task_description": "CSS grid refactor", "prefer_local": False}), | |
| ("Is this bug fix simple enough for the local model to tackle?", | |
| {"task_description": "bug fix"}), | |
| ("Can the on-device model handle parsing this 50MB XML dump, or should I escalate to cloud?", | |
| {"task_description": "parsing 50MB XML dump", "prefer_local": False}), | |
| ("Should I let the local model handle this TypeScript refactor or route it to the bigger model?", | |
| {"task_description": "TypeScript refactor", "prefer_local": False}), | |
| ("Is writing a full test suite for the billing module something the local model can do well?", | |
| {"task_description": "writing test suite for billing module"}), | |
| ("This SQL query optimization seems complex β local or cloud?", | |
| {"task_description": "SQL query optimization"}), | |
| ("Route this: rewriting the authentication middleware from scratch.", | |
| {"task_description": "rewriting authentication middleware from scratch"}), | |
| ("Should the local agent handle this performance profiling task?", | |
| {"task_description": "performance profiling task"}), | |
| ("Is a full-text semantic search implementation doable on-device?", | |
| {"task_description": "full-text semantic search implementation"}), | |
| ("Local or cloud for this multi-file refactor of the payments module?", | |
| {"task_description": "multi-file refactor of payments module", "prefer_local": False}), | |
| ("Can the on-device model handle a 3000-line code review?", | |
| {"task_description": "3000-line code review"}), | |
| ("Route this debugging task: tracing intermittent race conditions in async code.", | |
| {"task_description": "tracing intermittent race conditions in async code"}), | |
| ("Should I send this architecture review to the cloud model or keep it local?", | |
| {"task_description": "architecture review", "prefer_local": False}), | |
| ("Is this data pipeline migration too complex for local inference?", | |
| {"task_description": "data pipeline migration"}), | |
| ("Handle or escalate? β writing API documentation for 40 endpoints.", | |
| {"task_description": "writing API documentation for 40 endpoints"}), | |
| ] | |
| for user, args in task_route_cases: | |
| rows.append(msg(user, "session_task_route", args)) | |
| print(f"After GROUP C (task_route): {len(rows)} rows") | |
| # ββ GROUP D1: session_export_memory β extract output_dir from casual phrasing ββ | |
| export_cases = [ | |
| ("Dump everything to /tmp/backup so I can archive it. JSON format.", | |
| {"output_path": "/tmp/backup", "format": "json"}), | |
| ("Back up all my memories to /var/exports before I wipe the session.", | |
| {"output_path": "/var/exports"}), | |
| ("Export the prism-mcp project memory to /tmp/prism-export in markdown format.", | |
| {"project": "prism-mcp", "output_path": "/tmp/prism-export", "format": "markdown"}), | |
| ("I want to export a backup and then compact the old entries β save it to /tmp/mem-backup.", | |
| {"output_path": "/tmp/mem-backup"}), | |
| ("Export everything from the billing project to /tmp/billing-backup.json.", | |
| {"project": "billing", "output_path": "/tmp/billing-backup.json"}), | |
| ("Can you dump the analytics project sessions to /home/user/exports/analytics?", | |
| {"project": "analytics", "output_path": "/home/user/exports/analytics"}), | |
| ("Save a memory backup to /tmp/export before we start the migration.", | |
| {"output_path": "/tmp/export"}), | |
| ("Archive the prism-training project to /tmp/training-archive in JSON.", | |
| {"project": "prism-training", "output_path": "/tmp/training-archive", "format": "json"}), | |
| ] | |
| for user, args in export_cases: | |
| rows.append(msg(user, "session_export_memory", args)) | |
| print(f"After GROUP D1 (export): {len(rows)} rows") | |
| # ββ GROUP D2: session_forget_memory β extract memory_id from prompt ββ | |
| forget_cases = [ | |
| ("Delete the specific memory entry with ID mem-abc-123.", | |
| {"memory_id": "mem-abc-123"}), | |
| ("Remove memory entry 7f3a-bc21-d4e5 β it's completely wrong.", | |
| {"memory_id": "7f3a-bc21-d4e5"}), | |
| ("Get rid of entry ID: ent-2024-991. It was saved by mistake.", | |
| {"memory_id": "ent-2024-991"}), | |
| ("Forget memory ID mem-portal-007.", | |
| {"memory_id": "mem-portal-007"}), | |
| ("Delete that specific memory entry with ID 'sess-42-bad'.", | |
| {"memory_id": "sess-42-bad"}), | |
| ("The entry tagged as 'exp-001' is outdated. Please remove it.", | |
| {"memory_id": "exp-001"}), | |
| ("Wipe memory entry ID = abc123xyz.", | |
| {"memory_id": "abc123xyz"}), | |
| ("Remove the wrong entry β its ID is MEM-2025-0042.", | |
| {"memory_id": "MEM-2025-0042"}), | |
| ("That entry we saved earlier about the broken migration config β the ID is err-cfg-88, delete it.", | |
| {"memory_id": "err-cfg-88"}), | |
| ("Clear out memory ID: old-deploy-notes-19.", | |
| {"memory_id": "old-deploy-notes-19"}), | |
| ] | |
| for user, args in forget_cases: | |
| rows.append(msg(user, "session_forget_memory", args)) | |
| print(f"After GROUP D2 (forget+id): {len(rows)} rows") | |
| # ββ GROUP D3: session_task_route β extract task_description from implicit phrasing ββ | |
| task_route_desc_cases = [ | |
| ("Route this refactoring task β if local, proceed; if cloud, just tell me.", | |
| {"task_description": "refactoring task"}), | |
| ("Is the on-device model good enough for a full GraphQL schema migration?", | |
| {"task_description": "GraphQL schema migration"}), | |
| ("This linting fix across 200 files β local or cloud?", | |
| {"task_description": "linting fix across 200 files"}), | |
| ("Route: generating embeddings for 10k documents.", | |
| {"task_description": "generating embeddings for 10k documents"}), | |
| ("Should the local model handle writing unit tests for the auth module?", | |
| {"task_description": "writing unit tests for auth module"}), | |
| ] | |
| for user, args in task_route_desc_cases: | |
| rows.append(msg(user, "session_task_route", args)) | |
| print(f"After GROUP D3 (task_route desc): {len(rows)} rows") | |
| # ββ GROUP D4: session_save_ledger β extract project + summary from natural phrasing ββ | |
| ledger_natural_cases = [ | |
| ("We just finished a big refactor of the billing module. Make sure it's written down. Project: billing, conv: conv-bill-2024.", | |
| {"project": "billing", "conversation_id": "conv-bill-2024", "summary": "Completed big refactor of billing module"}), | |
| ("Before I hand off, save what we did today: fixed the OAuth flow and updated tests. Project is portal.", | |
| {"project": "portal", "summary": "Fixed OAuth flow and updated tests"}), | |
| ("We're done for the day. Log what we accomplished on the analytics project.", | |
| {"project": "analytics", "summary": "Session progress logged"}), | |
| ("Jot this down for the prism-training project: we built and validated the v43 corpus.", | |
| {"project": "prism-training", "summary": "Built and validated v43 corpus"}), | |
| ("Log session for ios-app project: completed llama.cpp Swift integration.", | |
| {"project": "ios-app", "summary": "Completed llama.cpp Swift integration"}), | |
| ("Write it down β we migrated auth to OAuth2 in the portal project. conv-id: sess-99.", | |
| {"project": "portal", "conversation_id": "sess-99", "summary": "Migrated auth to OAuth2"}), | |
| ("End of session for prism-mcp. We deployed v3 MCP server and ran smoke tests.", | |
| {"project": "prism-mcp", "summary": "Deployed v3 MCP server and ran smoke tests"}), | |
| ("Record this session: we fixed the race condition in the payments module. project=payments.", | |
| {"project": "payments", "summary": "Fixed race condition in payments module"}), | |
| ] | |
| for user, args in ledger_natural_cases: | |
| rows.append(msg(user, "session_save_ledger", args)) | |
| print(f"After GROUP D4 (ledger natural): {len(rows)} rows") | |
| # ββ GROUP D5: backfill_links + synthesize_edges β verifier distinctions ββ | |
| verifier_cases = [ | |
| ("Backfill the missing cross-session links for the analytics project.", | |
| "session_backfill_links", {"project": "analytics"}), | |
| ("Reconnect the dangling session references for the billing project.", | |
| "session_backfill_links", {"project": "billing"}), | |
| ("Patch up the link gaps in our session history for prism-training.", | |
| "session_backfill_links", {"project": "prism-training"}), | |
| ("Fix the broken links between sessions in the portal project.", | |
| "session_backfill_links", {"project": "portal"}), | |
| ("There are dangling references in ios-app sessions. Backfill them.", | |
| "session_backfill_links", {"project": "ios-app"}), | |
| ("Reconnect missing session cross-references for prism-mcp.", | |
| "session_backfill_links", {"project": "prism-mcp"}), | |
| ("Run a synthesis pass on the prism-mcp project to make sure all edges are up to date.", | |
| "session_synthesize_edges", {"project": "prism-mcp"}), | |
| ("Before we close out, verify all the session links are consistent for the portal project.", | |
| "session_synthesize_edges", {"project": "portal"}), | |
| ("Synthesize the session graph for analytics to build fresh edge connections.", | |
| "session_synthesize_edges", {"project": "analytics"}), | |
| ("Verify graph integrity β synthesize edges for the ios-app project.", | |
| "session_synthesize_edges", {"project": "ios-app"}), | |
| ("Do a graph synthesis on the billing sessions to update the relationship map.", | |
| "session_synthesize_edges", {"project": "billing"}), | |
| ("Build the session edge graph for prism-training.", | |
| "session_synthesize_edges", {"project": "prism-training"}), | |
| ] | |
| for user, tool, args in verifier_cases: | |
| rows.append(msg(user, tool, args)) | |
| print(f"After GROUP D5 (verifier): {len(rows)} rows") | |
| # ββ GROUP D6: session_search_memory β "remind me / did we ever decide" phrasing ββ | |
| remind_search_cases = [ | |
| ("Remind me β did we ever decide between Redis and Memcached for caching?", | |
| {"query": "Redis vs Memcached caching decision"}), | |
| ("What did we decide about the database schema for the billing module?", | |
| {"query": "database schema billing module decision"}), | |
| ("Did we settle on which logging library to use for the portal?", | |
| {"query": "logging library portal decision"}), | |
| ("Remind me what we concluded about microservices vs monolith.", | |
| {"query": "microservices vs monolith conclusion"}), | |
| ("What was our decision on the auth token expiry?", | |
| {"query": "auth token expiry decision"}), | |
| ("Did we ever document our stance on using TypeScript strict mode?", | |
| {"query": "TypeScript strict mode decision"}), | |
| ("Can you remind me what we figured out about the WebSocket reconnection strategy?", | |
| {"query": "WebSocket reconnection strategy"}), | |
| ("What did we decide to do about the rate limiting approach?", | |
| {"query": "rate limiting approach decision"}), | |
| ("Remind me of our agreed-on approach for handling API errors.", | |
| {"query": "API error handling approach"}), | |
| ("Did we ever land on a pagination strategy for the search API?", | |
| {"query": "pagination strategy search API"}), | |
| ] | |
| for user, args in remind_search_cases: | |
| rows.append(msg(user, "session_search_memory", args)) | |
| print(f"After GROUP D6 (remind/search): {len(rows)} rows") | |
| # Write output | |
| with OUT.open("w") as f: | |
| for row in rows: | |
| f.write(json.dumps(row) + "\n") | |
| print(f"\nWrote {len(rows)} patch4 rows to {OUT}") | |
| # Verify format | |
| with OUT.open() as f: | |
| first = json.loads(f.readline()) | |
| print(f"First row keys: {list(first.keys())}") | |
| print(f"Text starts with: {first['text'][:50]}") | |
| print(f"Think close tag present: {'</|synalux_think|>' in first['text']}") | |
| print(f"Bad think close absent: {'</{' not in first['text']}") | |