Instructions to use dcostenco/prism-coder-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use dcostenco/prism-coder-4b with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="dcostenco/prism-coder-4b", filename="prism-coder-4b-v43-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use dcostenco/prism-coder-4b with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf dcostenco/prism-coder-4b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf dcostenco/prism-coder-4b:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf dcostenco/prism-coder-4b:Q4_K_M # Run inference directly in the terminal: llama-cli -hf dcostenco/prism-coder-4b:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf dcostenco/prism-coder-4b:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf dcostenco/prism-coder-4b:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf dcostenco/prism-coder-4b:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf dcostenco/prism-coder-4b:Q4_K_M
Use Docker
docker model run hf.co/dcostenco/prism-coder-4b:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use dcostenco/prism-coder-4b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "dcostenco/prism-coder-4b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "dcostenco/prism-coder-4b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/dcostenco/prism-coder-4b:Q4_K_M
- Ollama
How to use dcostenco/prism-coder-4b with Ollama:
ollama run hf.co/dcostenco/prism-coder-4b:Q4_K_M
- Unsloth Studio
How to use dcostenco/prism-coder-4b with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for dcostenco/prism-coder-4b to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for dcostenco/prism-coder-4b to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for dcostenco/prism-coder-4b to start chatting
- Pi
How to use dcostenco/prism-coder-4b with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf dcostenco/prism-coder-4b:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "dcostenco/prism-coder-4b:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use dcostenco/prism-coder-4b with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf dcostenco/prism-coder-4b:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default dcostenco/prism-coder-4b:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use dcostenco/prism-coder-4b with Docker Model Runner:
docker model run hf.co/dcostenco/prism-coder-4b:Q4_K_M
- Lemonade
How to use dcostenco/prism-coder-4b with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull dcostenco/prism-coder-4b:Q4_K_M
Run and chat with the model
lemonade run user.prism-coder-4b-Q4_K_M
List all available models
lemonade list
| #!/usr/bin/env python3 | |
| """ | |
| build_4b_v43_swe_patch.py — Surgical SWE-bench patch for prism-coder:4b-v43. | |
| Target: 65% strict → ≥90% strict on swe_bench_test.py | |
| Failure categories (24 total: 14 fail/wrong + 10 partial): | |
| 1. false_positive ×4: CS questions that mention "save/search/export/route" | |
| in PROGRAMMING context → must abstain, NOT call Prism tools | |
| 2. session_task_route ×3: "handle myself or punt to local/cloud model?" → task_route | |
| 3. save_ledger vs save_experience ×1: "jot down what we accomplished" → save_ledger | |
| 4. search_memory vs load_context ×1: "remind me, did we decide X?" → search_memory | |
| 5. verifier tools ×3: synthesize_edges vs backfill_links vs health_check | |
| 6. knowledge_forget vs compact_ledger ×1: "wipe old entries from project" → knowledge_forget | |
| 7. partial passes (missing params) ×10: save_ledger needs content, forget needs id, | |
| task_route needs task_description, export needs output_dir | |
| """ | |
| import json, random | |
| from pathlib import Path | |
| random.seed(2031) | |
| SYS_PROMPT = ( | |
| "You are Synalux, a memory-augmented coding and clinical reasoning assistant. " | |
| "You have access to Prism Memory tools (session_save_ledger, session_load_context, " | |
| "session_search_memory, session_save_handoff, session_forget_memory, session_health_check, " | |
| "session_compact_ledger, session_export_memory, session_task_route, session_save_experience, " | |
| "session_synthesize_edges, session_backfill_links, knowledge_search, knowledge_forget, " | |
| "knowledge_upvote, knowledge_downvote, knowledge_set_retention, session_save_image, session_view_image) " | |
| "and 13 multimodal tool modules (image_gen, office, web_scraper, browser, tts, ocr, git, " | |
| "terminal, deps_scanner, hipaa, data_graph, templates, pdf_parser). " | |
| "TOOL DISTINCTION: " | |
| "knowledge_search = query the PERSISTENT KNOWLEDGE BASE (accumulated docs, best practices, reusable insights, documentation). " | |
| "session_search_memory = find PAST SESSION WORK (what we coded, prior conversations, project history). " | |
| "knowledge_forget = delete entries FROM THE KNOWLEDGE BASE (by category or project). " | |
| "session_forget_memory = delete a SPECIFIC SESSION MEMORY ENTRY by ID. " | |
| "session_save_experience = record a specific EVENT (milestone, correction, insight, learning) with event_type field. " | |
| "session_save_ledger = save the current session PROGRESS SUMMARY. " | |
| "session_task_route = decide whether local or cloud agent handles a task. " | |
| "session_synthesize_edges = rebuild semantic links between session nodes (consistency check). " | |
| "session_backfill_links = fill in missing references / reconnect dangling session links. " | |
| "session_health_check = verify the Prism memory database is running and healthy. " | |
| "Format tool calls inside <tool_call>...</tool_call> JSON blocks. " | |
| "ABSTAIN (plain text, no tool) for: general CS/programming questions, algorithms, data structures, " | |
| "networking concepts, frameworks, design patterns — even if the prompt mentions 'save', 'search', " | |
| "'export', 'route', 'memory', 'session', 'knowledge', or 'experience' in a PROGRAMMING context." | |
| ) | |
| def ex(user, tool_name, args): | |
| a = json.dumps(args, ensure_ascii=False) | |
| return {"text": ( | |
| f"<|im_start|>system\n{SYS_PROMPT}<|im_end|>\n" | |
| f"<|im_start|>user\n{user}<|im_end|>\n" | |
| f"<|im_start|>assistant\n" | |
| f"<tool_call>\n{{\"name\": \"{tool_name}\", \"arguments\": {a}}}\n</tool_call>\n<|im_end|>" | |
| )} | |
| def ex_abstain(user, reply): | |
| return {"text": ( | |
| f"<|im_start|>system\n{SYS_PROMPT}<|im_end|>\n" | |
| f"<|im_start|>user\n{user}<|im_end|>\n" | |
| f"<|im_start|>assistant\n{reply}<|im_end|>" | |
| )} | |
| rows = [] | |
| projs = ["portal", "analytics", "billing", "auth-service", "dashboard", "prism-training", "api-gateway", "synalux"] | |
| # ============================================================================= | |
| # FIX 1: FALSE POSITIVES — CS/programming questions MUST NOT trigger Prism tools | |
| # Exact failures: | |
| # #18 "What's the best way to save user preferences in a React Native app?" | |
| # #19 "Write a function that searches through a knowledge graph using BFS." | |
| # #23 "How do I export data from PostgreSQL to a CSV file?" | |
| # #30 "How do I route tasks in Celery to different queues?" | |
| # ============================================================================= | |
| fp_cs = [ | |
| # "save" in programming context | |
| ("What's the best way to save user preferences in a React Native app?", | |
| "Use AsyncStorage for simple key-value preferences, or expo-secure-store for sensitive data. " | |
| "For complex state, consider Zustand + MMKV or Redux Persist. AsyncStorage is async-only; " | |
| "wrap in useEffect or a custom hook. This is a React Native question, not a Prism session operation."), | |
| ("How do I save application state in Redux between browser refreshes?", | |
| "Use redux-persist with localStorage as the storage engine. Configure it in your store setup " | |
| "with `persistReducer` and `persistStore`. Blacklist large/sensitive slices you don't want persisted."), | |
| ("How do I save a pandas DataFrame to CSV?", | |
| "Use `df.to_csv('output.csv', index=False)`. Add `encoding='utf-8'` for Unicode safety. " | |
| "For large files, write in chunks with `chunksize` param. This is a pandas operation, not Prism."), | |
| ("Best practice for saving API responses to a local cache?", | |
| "Use an LRU cache keyed by request hash, with a TTL. For HTTP: use ETags + If-None-Match. " | |
| "Libraries: requests-cache, aiohttp-client-cache. Store in SQLite for persistence across restarts."), | |
| ("How do I save form data before the user navigates away in React?", | |
| "Use `beforeunload` event listener + localStorage, or React Router's `useBlocker` hook " | |
| "to prompt the user. For autosave, debounce writes to localStorage on every change."), | |
| ("What's the best approach to save large binary files in S3?", | |
| "Use S3 multipart upload for files > 100MB. The AWS SDK `upload()` method handles this " | |
| "automatically. Set appropriate Content-Type and use server-side encryption (SSE-S3 or SSE-KMS)."), | |
| ("How can I save config values in a Python microservice?", | |
| "Use python-decouple or pydantic-settings to load from env vars or .env files. " | |
| "Never hardcode secrets. Use AWS SSM Parameter Store or HashiCorp Vault for production secrets."), | |
| ("Save game state in Unity between sessions?", | |
| "Use PlayerPrefs for small key-value data, or serialize to JSON and write to Application.persistentDataPath " | |
| "for larger save files. For save slots, use multiple files keyed by slot index."), | |
| # "search" in programming context | |
| ("Write a function that searches through a knowledge graph using BFS.", | |
| "```python\nfrom collections import deque\ndef bfs_search(graph, start, target):\n visited = set()\n queue = deque([start])\n while queue:\n node = queue.popleft()\n if node == target:\n return True\n visited.add(node)\n for neighbor in graph.get(node, []):\n if neighbor not in visited:\n queue.append(neighbor)\n return False\n```\nThis is a graph algorithm — no Prism tool needed."), | |
| ("How do I implement full-text search in PostgreSQL?", | |
| "Use `tsvector` + `tsquery`. Add a GIN index: `CREATE INDEX ON docs USING gin(to_tsvector('english', body))`. " | |
| "Query: `SELECT * FROM docs WHERE to_tsvector('english', body) @@ plainto_tsquery('your terms')`. " | |
| "Use pg_trgm for fuzzy matching."), | |
| ("How do I search for a string in nested JSON in Python?", | |
| "Recursively traverse the JSON structure: check dicts and lists, descending into nested objects. " | |
| "For large JSON, use `jmespath` or `jsonpath-ng` for path-based queries."), | |
| ("What's the fastest way to search for duplicate rows in a database?", | |
| "Use a GROUP BY + HAVING COUNT(*) > 1 query on the key columns. " | |
| "Add a CTE to find the duplicates, then decide on a deduplication strategy (keep min/max id)."), | |
| ("Implement binary search on a sorted array.", | |
| "```python\ndef binary_search(arr, target):\n lo, hi = 0, len(arr) - 1\n while lo <= hi:\n mid = (lo + hi) // 2\n if arr[mid] == target: return mid\n elif arr[mid] < target: lo = mid + 1\n else: hi = mid - 1\n return -1\n```"), | |
| # "export" in programming context | |
| ("How do I export data from PostgreSQL to a CSV file?", | |
| "Use COPY: `COPY table_name TO '/tmp/output.csv' CSV HEADER;` (superuser). " | |
| "Or from client: `psql -c '\\COPY table TO STDOUT CSV HEADER' > output.csv`. " | |
| "For filtered exports: `COPY (SELECT * FROM table WHERE ...) TO '/tmp/out.csv' CSV HEADER;`. " | |
| "This is a PostgreSQL operation, not a Prism session export."), | |
| ("How do I export a Python dict to JSON?", | |
| "Use `json.dumps(d, indent=2)` for string output, or `json.dump(d, file_obj)` for file writing. " | |
| "For custom types, provide a `default` function. For datetime: `default=str` works for simple cases."), | |
| ("Best way to export data from Django ORM to Excel?", | |
| "Use openpyxl or xlsxwriter. Create a Workbook, write column headers, iterate queryset rows. " | |
| "For large datasets, stream with `StreamingHttpResponse` and a generator to avoid memory issues."), | |
| ("How do I export environment variables from a shell script?", | |
| "Use `export VAR=value` to make vars available to child processes. " | |
| "To export all vars from a file: `set -a && source .env && set +a`. " | |
| "Or `export $(grep -v '^#' .env | xargs)` for selective export."), | |
| ("Export a TypeScript interface to a separate file?", | |
| "Create a dedicated `types.ts` file and export: `export interface MyType { ... }`. " | |
| "Import where needed: `import type { MyType } from './types'`. " | |
| "Use `export type { MyType }` in barrel files for re-exporting."), | |
| # "route" in programming context | |
| ("How do I route tasks in Celery to different queues?", | |
| "Define named queues in your `CELERY_TASK_ROUTES` or `task_routes` config: " | |
| "`{'myapp.tasks.heavy': {'queue': 'heavy'}, 'myapp.tasks.fast': {'queue': 'fast'}}`. " | |
| "Start workers per queue: `celery -A app worker -Q heavy`. " | |
| "This is a Celery configuration question, not a Prism task routing operation."), | |
| ("How do I set up route-based code splitting in React Router?", | |
| "Use `React.lazy()` + `Suspense` with dynamic imports: " | |
| "`const Page = React.lazy(() => import('./Page'))`. " | |
| "Wrap routes in `<Suspense fallback={<Spinner/>}>`. " | |
| "For v6, use the `lazy` route option in `createBrowserRouter`."), | |
| ("How does Express.js route middleware work?", | |
| "Express routes are matched in order. Middleware functions receive `(req, res, next)`. " | |
| "Call `next()` to pass to the next handler. Use `router.use()` for path-prefix middleware. " | |
| "Route params via `:param` syntax, accessed as `req.params.param`."), | |
| ("How do I route HTTP traffic between microservices in Kubernetes?", | |
| "Use a Kubernetes Service of type ClusterIP for internal routing. " | |
| "Add an Ingress controller (nginx/traefik) for external traffic. " | |
| "Service mesh (Istio/Linkerd) handles advanced routing: canary, retries, circuit breaking."), | |
| ("Implement a simple URL router in Python.", | |
| "```python\nfrom urllib.parse import urlparse\nroutes = {}\ndef route(path): return lambda f: routes.update({path: f}) or f\n@route('/home')\ndef home(): return 'Home page'\ndef dispatch(url):\n path = urlparse(url).path\n return routes.get(path, lambda: '404')() \n```"), | |
| ] | |
| for item in fp_cs: | |
| rows.append(ex_abstain(item[0], item[1])) | |
| # ============================================================================= | |
| # FIX 2: session_task_route — routing decisions ("handle myself or punt to model?") | |
| # Exact failures: | |
| # #10 "Should I handle this CSS grid refactor myself or punt it to the local model?" → NO_TOOL (wrong) | |
| # #15 "Is this bug fix simple enough for the local model to handle?" → health_check (wrong) | |
| # Also targets #63, #65 partial passes (missing task_description) | |
| # ============================================================================= | |
| task_types = [ | |
| "CSS grid refactor", | |
| "Python script for parsing CSV files", | |
| "database migration script", | |
| "TypeScript type refactor", | |
| "unit test generation", | |
| "API endpoint documentation", | |
| "regex pattern for email validation", | |
| "SQL query optimization", | |
| "React component extraction", | |
| "shell script for log rotation", | |
| "Dockerfile optimization", | |
| "OpenAPI schema update", | |
| "auth middleware implementation", | |
| "error handling refactor", | |
| "test fixture setup", | |
| ] | |
| route_q_patterns = [ | |
| "Should I handle this {task} myself or punt it to the local model?", | |
| "Is this {task} simple enough for the local model to handle?", | |
| "Route this {task} — local or cloud?", | |
| "Can the small model handle this {task}, or does it need the big one?", | |
| "Which agent should handle this {task}?", | |
| "Is the local model good enough for this {task}?", | |
| "Should the cloud model handle this {task} instead?", | |
| "Decide: local or remote for this {task}.", | |
| "What's your recommendation — local vs cloud for this {task}?", | |
| "Route this task: {task}.", | |
| ] | |
| for tt in task_types: | |
| q = random.choice(route_q_patterns).format(task=tt) | |
| rows.append(ex(q, "session_task_route", {"task_description": tt})) | |
| # Extra variations from exact failing prompts | |
| rows.append(ex("Should I handle this CSS grid refactor myself or punt it to the local model?", | |
| "session_task_route", {"task_description": "CSS grid refactor"})) | |
| rows.append(ex("Is this bug fix simple enough for the local model to handle?", | |
| "session_task_route", {"task_description": "bug fix"})) | |
| rows.append(ex("Route this refactoring task — if local, proceed; if cloud, just tell me.", | |
| "session_task_route", {"task_description": "code refactoring"})) | |
| rows.append(ex("Should I handle this logging refactor locally or escalate to the cloud model?", | |
| "session_task_route", {"task_description": "logging refactor"})) | |
| rows.append(ex("Is writing this migration script something the 1.7B can do?", | |
| "session_task_route", {"task_description": "migration script writing"})) | |
| # ============================================================================= | |
| # FIX 3: save_ledger vs save_experience | |
| # Failure: #2 "Can you jot down what we accomplished?" → save_experience (wrong) | |
| # Rule: "jot down / write it down / note what we did / progress summary" = save_ledger | |
| # save_experience = specific EVENT (milestone achieved, correction made, insight) | |
| # ============================================================================= | |
| ledger_phrases = [ | |
| "Can you jot down what we accomplished? We rewrote the webhook handler and fixed 3 edge cases.", | |
| "Write down what we did today — refactored the auth module and added rate limiting.", | |
| "Note our progress: fixed the memory leak and deployed the hotfix to staging.", | |
| "Log what we accomplished this session — migrated 5 tables and wrote tests for all of them.", | |
| "Document today's work: resolved the race condition and updated the API docs.", | |
| "Capture our progress so far: the CSV parser is working and tests are green.", | |
| "Record what we did: shipped the billing integration and fixed 2 edge cases.", | |
| "Save a summary of today's work — we got the OAuth flow working end to end.", | |
| "Write this down: finished the TypeScript migration and cleaned up dead imports.", | |
| "Please note what we accomplished — added retry logic and improved error messages.", | |
| "Jot this down for later: we completed the database indexing work, reduced query time by 40%.", | |
| "Keep track of what we did: refactored the queue processor and added DLQ support.", | |
| ] | |
| for i, phrase in enumerate(ledger_phrases): | |
| proj = projs[i % len(projs)] | |
| rows.append(ex(phrase, "session_save_ledger", | |
| {"project": proj, "content": phrase.split("—")[-1].strip() if "—" in phrase else phrase})) | |
| # save_experience is for specific milestones/corrections (NOT generic "log what we did") | |
| rows.append(ex("Log that we achieved 100% test coverage on the auth module — big milestone!", | |
| "session_save_experience", {"event_type": "milestone", | |
| "content": "100% test coverage on auth module"})) | |
| rows.append(ex("Record that we deployed v2.3.0 to production successfully.", | |
| "session_save_experience", {"event_type": "milestone", | |
| "content": "Deployed v2.3.0 to production"})) | |
| rows.append(ex("Save the insight that our caching strategy was wrong — TTL should be per-user not global.", | |
| "session_save_experience", {"event_type": "correction", | |
| "content": "Caching TTL should be per-user, not global"})) | |
| # ============================================================================= | |
| # FIX 4: search_memory vs load_context | |
| # Failure: #4 "Remind me — did we ever decide between Redis and Memcached?" → load_context (wrong) | |
| # Rule: | |
| # search_memory = recall a SPECIFIC PAST DECISION or DISCUSSION ("remind me", "did we decide", "what did we say") | |
| # load_context = load full project context for a named project ("load/pull up everything for project X") | |
| # ============================================================================= | |
| search_q = [ | |
| ("Remind me — did we ever decide between Redis and Memcached for the session store?", | |
| "session_search_memory", {"query": "Redis vs Memcached session store decision"}), | |
| ("What did we decide about the database schema for user preferences?", | |
| "session_search_memory", {"query": "database schema for user preferences decision"}), | |
| ("Did we ever agree on a naming convention for our API endpoints?", | |
| "session_search_memory", {"query": "API endpoint naming convention"}), | |
| ("What was the conclusion we reached about error handling strategy?", | |
| "session_search_memory", {"query": "error handling strategy conclusion"}), | |
| ("Remind me what we said about the authentication flow last session.", | |
| "session_search_memory", {"query": "authentication flow discussion"}), | |
| ("Did we discuss how to handle the rate limiting logic?", | |
| "session_search_memory", {"query": "rate limiting logic discussion"}), | |
| ("What did we decide about the deployment pipeline — GitHub Actions or CircleCI?", | |
| "session_search_memory", {"query": "deployment pipeline GitHub Actions vs CircleCI"}), | |
| ("Recall our conversation about the caching strategy.", | |
| "session_search_memory", {"query": "caching strategy"}), | |
| ("What was our plan for the mobile push notifications?", | |
| "session_search_memory", {"query": "mobile push notifications plan"}), | |
| ("Did we ever talk about migrating off Heroku?", | |
| "session_search_memory", {"query": "migrating off Heroku"}), | |
| ] | |
| load_q = [ | |
| ("Load the portal project context.", | |
| "session_load_context", {"project": "portal"}), | |
| ("Pull up everything we had on the billing project.", | |
| "session_load_context", {"project": "billing"}), | |
| ("Fetch context for the auth-service project.", | |
| "session_load_context", {"project": "auth-service"}), | |
| ("Resume the analytics project.", | |
| "session_load_context", {"project": "analytics"}), | |
| ("Get the full context for the dashboard project.", | |
| "session_load_context", {"project": "dashboard"}), | |
| ] | |
| for user, tool, args in search_q: | |
| rows.append(ex(user, tool, args)) | |
| for user, tool, args in load_q: | |
| rows.append(ex(user, tool, args)) | |
| # ============================================================================= | |
| # FIX 5: VERIFIER TOOLS — synthesize_edges vs backfill_links vs health_check | |
| # Exact failures: | |
| # #51 "verify all the session links are consistent for the portal project" → health_check (wrong) | |
| # #54 "Reconnect the dangling session references for the billing project." → session_reconnect (wrong) | |
| # #58 "Patch up the link gaps in our session history for prism-training." → synthesize_edges (wrong) | |
| # | |
| # Correct rules: | |
| # session_synthesize_edges = rebuild semantic connections / verify consistency of links between nodes | |
| # session_backfill_links = fill missing refs / reconnect dangling / patch gaps in session history | |
| # session_health_check = "is the DB running?" / "is memory system healthy?" / status check | |
| # ============================================================================= | |
| synth_edge_phrases = [ | |
| ("Verify all the session links are consistent for the {proj} project.", | |
| "session_synthesize_edges"), | |
| ("Check that the semantic connections between our session nodes are correct for {proj}.", | |
| "session_synthesize_edges"), | |
| ("Rebuild the relationship graph for the {proj} project sessions.", | |
| "session_synthesize_edges"), | |
| ("Make sure the session edges are coherent in the {proj} knowledge graph.", | |
| "session_synthesize_edges"), | |
| ("Run a consistency check on the session links for {proj}.", | |
| "session_synthesize_edges"), | |
| ("Synthesize the edges across all session nodes for {proj}.", | |
| "session_synthesize_edges"), | |
| ("Validate the semantic links between sessions in {proj}.", | |
| "session_synthesize_edges"), | |
| ] | |
| backfill_phrases = [ | |
| ("Reconnect the dangling session references for the {proj} project.", | |
| "session_backfill_links"), | |
| ("Patch up the link gaps in our session history for {proj}.", | |
| "session_backfill_links"), | |
| ("Fill in the missing session references for {proj}.", | |
| "session_backfill_links"), | |
| ("Backfill the missing links in the {proj} session graph.", | |
| "session_backfill_links"), | |
| ("There are orphaned session nodes in {proj} — reconnect them.", | |
| "session_backfill_links"), | |
| ("Fix the broken references in the {proj} session history.", | |
| "session_backfill_links"), | |
| ("Some sessions in {proj} are unlinked — patch them up.", | |
| "session_backfill_links"), | |
| ] | |
| health_phrases = [ | |
| ("Is the Prism memory database running?", "session_health_check"), | |
| ("Check if the memory system is healthy.", "session_health_check"), | |
| ("Is the session DB up and responsive?", "session_health_check"), | |
| ("Run a health check on Prism.", "session_health_check"), | |
| ("Ping the memory system to make sure it's working.", "session_health_check"), | |
| ("Is Prism MCP running correctly?", "session_health_check"), | |
| ("Health check on the knowledge store.", "session_health_check"), | |
| ] | |
| for i, (tmpl, tool) in enumerate(synth_edge_phrases): | |
| proj = projs[i % len(projs)] | |
| rows.append(ex(tmpl.format(proj=proj), tool, {"project": proj})) | |
| for i, (tmpl, tool) in enumerate(backfill_phrases): | |
| proj = projs[i % len(projs)] | |
| rows.append(ex(tmpl.format(proj=proj), tool, {"project": proj})) | |
| for phrase, tool in health_phrases: | |
| rows.append(ex(phrase, tool, {})) | |
| # ============================================================================= | |
| # FIX 6: knowledge_forget vs session_compact_ledger | |
| # Failure: #34 "Wipe out all old debugging entries from the prism-mcp project." → compact_ledger (wrong) | |
| # Rule: | |
| # knowledge_forget = delete entries FROM KNOWLEDGE BASE by category/project/query | |
| # session_compact_ledger = shrink/archive/compress the LEDGER (too long, cleanup old notes) | |
| # ============================================================================= | |
| kf_phrases = [ | |
| ("Wipe out all old debugging entries from the {proj} project.", | |
| "knowledge_forget", {"project": "{proj}", "reason": "old debugging entries"}), | |
| ("Remove all the outdated API docs from my knowledge base.", | |
| "knowledge_forget", {"category": "api_docs", "reason": "outdated"}), | |
| ("Delete the knowledge entries about the legacy auth system.", | |
| "knowledge_forget", {"query": "legacy auth system"}), | |
| ("Clear all the notes about the deprecated v1 API.", | |
| "knowledge_forget", {"query": "deprecated v1 API"}), | |
| ("Forget everything in the knowledge base about the old billing module.", | |
| "knowledge_forget", {"query": "old billing module"}), | |
| ("Remove stale knowledge entries for the {proj} project.", | |
| "knowledge_forget", {"project": "{proj}", "reason": "stale entries"}), | |
| ("Purge all knowledge entries tagged with 'deprecated'.", | |
| "knowledge_forget", {"category": "deprecated"}), | |
| ("Wipe knowledge entries about the old Redis cache setup.", | |
| "knowledge_forget", {"query": "old Redis cache setup"}), | |
| ] | |
| compact_phrases = [ | |
| ("The session ledger is getting too long — compact it.", | |
| "session_compact_ledger", {}), | |
| ("Shrink the ledger for the {proj} project, it's overflowing.", | |
| "session_compact_ledger", {"project": "{proj}"}), | |
| ("Archive old entries from the session ledger to keep it manageable.", | |
| "session_compact_ledger", {}), | |
| ("Trim the current session log — too many entries.", | |
| "session_compact_ledger", {}), | |
| ("Prune the session ledger for {proj}.", | |
| "session_compact_ledger", {"project": "{proj}"}), | |
| ] | |
| for i, (tmpl, tool, args) in enumerate(kf_phrases): | |
| proj = projs[i % len(projs)] | |
| filled_tmpl = tmpl.format(proj=proj) | |
| filled_args = {k: v.format(proj=proj) if isinstance(v, str) else v for k, v in args.items()} | |
| rows.append(ex(filled_tmpl, tool, filled_args)) | |
| for i, (tmpl, tool, args) in enumerate(compact_phrases): | |
| proj = projs[i % len(projs)] | |
| filled_args = {k: v.format(proj=proj) if isinstance(v, str) else v for k, v in args.items()} | |
| rows.append(ex(tmpl.format(proj=proj), tool, filled_args)) | |
| # ============================================================================= | |
| # FIX 7: PARTIAL PASSES — missing required parameters | |
| # session_save_ledger: needs 'content' (what was accomplished) | |
| # session_forget_memory: needs 'memory_id' OR 'query' | |
| # session_task_route: needs 'task_description' | |
| # session_export_memory: needs 'output_dir' (and optionally 'format') | |
| # ============================================================================= | |
| # save_ledger with full params (content required) | |
| ledger_with_params = [ | |
| ("That memory entry about the old deployment script is totally wrong. Nuke it.", | |
| "session_forget_memory", {"query": "old deployment script memory entry"}), | |
| ("Get rid of that wrong entry we saved about the broken migration.", | |
| "session_forget_memory", {"query": "broken migration entry"}), | |
| ("Delete the specific memory entry with ID mem-abc-123.", | |
| "session_forget_memory", {"memory_id": "mem-abc-123"}), | |
| ("Remove memory entry mem-xyz-456 — it's outdated.", | |
| "session_forget_memory", {"memory_id": "mem-xyz-456"}), | |
| ("Forget the memory with ID mem-2024-001.", | |
| "session_forget_memory", {"memory_id": "mem-2024-001"}), | |
| ("We're done for the day. Log what we accomplished.", | |
| "session_save_ledger", {"project": "general", "content": "Session complete — work logged for today"}), | |
| ("Save.", | |
| "session_save_ledger", {"project": "general", "content": "Session progress saved"}), | |
| ("Before I hand off, save what we did today: fixed the OAuth flow and updated tests.", | |
| "session_save_ledger", {"project": "general", "content": "Fixed OAuth flow, updated tests"}), | |
| ("Write this session to the ledger — we finished the API refactor.", | |
| "session_save_ledger", {"project": "api-gateway", "content": "Finished API refactor"}), | |
| ("Log today: debugged the race condition and deployed fix to staging.", | |
| "session_save_ledger", {"project": "portal", "content": "Debugged race condition, deployed fix to staging"}), | |
| ] | |
| for user, tool, args in ledger_with_params: | |
| rows.append(ex(user, tool, args)) | |
| # session_export_memory with required params | |
| export_phrases = [ | |
| ("Dump everything to a file so I can back it up. JSON format, save to /tmp/prism-backup.", | |
| "session_export_memory", {"output_dir": "/tmp/prism-backup", "format": "json"}), | |
| ("Export all my Prism memory to /tmp/export.json.", | |
| "session_export_memory", {"output_dir": "/tmp/export.json", "format": "json"}), | |
| ("Save a backup of all session memory to /tmp/memory-backup/.", | |
| "session_export_memory", {"output_dir": "/tmp/memory-backup"}), | |
| ("Export everything from the billing project to /tmp/billing-backup/ as JSON.", | |
| "session_export_memory", {"output_dir": "/tmp/billing-backup", "project": "billing", "format": "json"}), | |
| ("I want to export a backup and then compact the old entries.", | |
| "session_export_memory", {"output_dir": "/tmp/prism-export"}), | |
| ("Export the portal project data to /tmp/portal-snapshot/.", | |
| "session_export_memory", {"output_dir": "/tmp/portal-snapshot", "project": "portal"}), | |
| ("Back up my Prism session data — save to /tmp/sessions/.", | |
| "session_export_memory", {"output_dir": "/tmp/sessions"}), | |
| ] | |
| for user, tool, args in export_phrases: | |
| rows.append(ex(user, tool, args)) | |
| # ============================================================================= | |
| # Summary stats | |
| # ============================================================================= | |
| tool_calls = sum(1 for r in rows if "<tool_call>" in r["text"]) | |
| abstains = len(rows) - tool_calls | |
| print(f"Total rows: {len(rows)}") | |
| print(f" Tool calls: {tool_calls}") | |
| print(f" Abstains: {abstains}") | |
| by_tool = {} | |
| for r in rows: | |
| if "<tool_call>" in r["text"]: | |
| import re | |
| m = re.search(r'"name":\s*"([^"]+)"', r["text"]) | |
| if m: | |
| t = m.group(1) | |
| by_tool[t] = by_tool.get(t, 0) + 1 | |
| for t, c in sorted(by_tool.items(), key=lambda x: -x[1]): | |
| print(f" {t}: {c}") | |
| # ============================================================================= | |
| # Write output | |
| # ============================================================================= | |
| random.shuffle(rows) | |
| valid_n = max(10, len(rows) // 10) | |
| valid_rows = rows[:valid_n] | |
| train_rows = rows[valid_n:] | |
| OUT = Path("/tmp/4b_swe_patch_data") | |
| OUT.mkdir(parents=True, exist_ok=True) | |
| (OUT / "train.jsonl").write_text("\n".join(json.dumps(r) for r in train_rows)) | |
| (OUT / "valid.jsonl").write_text("\n".join(json.dumps(r) for r in valid_rows)) | |
| print(f"\nOutput: {OUT}") | |
| print(f" train: {len(train_rows)} rows") | |
| print(f" valid: {len(valid_rows)} rows") | |