Instructions to use dcostenco/prism-coder-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dcostenco/prism-coder-4b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="dcostenco/prism-coder-4b",
	filename="prism-coder-4b-v43-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use dcostenco/prism-coder-4b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf dcostenco/prism-coder-4b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf dcostenco/prism-coder-4b:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf dcostenco/prism-coder-4b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf dcostenco/prism-coder-4b:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf dcostenco/prism-coder-4b:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf dcostenco/prism-coder-4b:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf dcostenco/prism-coder-4b:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf dcostenco/prism-coder-4b:Q4_K_M

Use Docker

docker model run hf.co/dcostenco/prism-coder-4b:Q4_K_M

LM Studio
Jan

vLLM

How to use dcostenco/prism-coder-4b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dcostenco/prism-coder-4b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dcostenco/prism-coder-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/dcostenco/prism-coder-4b:Q4_K_M

Ollama
How to use dcostenco/prism-coder-4b with Ollama:
```
ollama run hf.co/dcostenco/prism-coder-4b:Q4_K_M
```

Unsloth Studio

How to use dcostenco/prism-coder-4b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for dcostenco/prism-coder-4b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for dcostenco/prism-coder-4b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for dcostenco/prism-coder-4b to start chatting

How to use dcostenco/prism-coder-4b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf dcostenco/prism-coder-4b:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "dcostenco/prism-coder-4b:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use dcostenco/prism-coder-4b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf dcostenco/prism-coder-4b:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default dcostenco/prism-coder-4b:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use dcostenco/prism-coder-4b with Docker Model Runner:
```
docker model run hf.co/dcostenco/prism-coder-4b:Q4_K_M
```

Lemonade

How to use dcostenco/prism-coder-4b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull dcostenco/prism-coder-4b:Q4_K_M

Run and chat with the model

lemonade run user.prism-coder-4b-Q4_K_M

List all available models

lemonade list

dcostenco commited on 6 days ago

Commit

79a2a50

verified ·

1 Parent(s): 18db54d

Add training/orchestrate_4b_to_100.sh

Browse files

Files changed (1) hide show

training/orchestrate_4b_to_100.sh +270 -0

training/orchestrate_4b_to_100.sh ADDED Viewed

	@@ -0,0 +1,270 @@

+#!/bin/bash
+# orchestrate_4b_to_100.sh — Drive prism-coder:4b to ≥90% strict on eval_300.py.
+#
+# Pipeline per round:
+#   1. Build patch corpus (analyze_swe_failures.py on previous eval300 report)
+#   2. MLX LoRA fine-tune (resume from current adapter, decreasing LR)
+#   3. Merge → GGUF F16 → Q4_K_M → Ollama register
+#   4. eval_300.py (300-case unified eval) → check strict% gate
+#   5. Loop or promote (single gate, no separate BFCL check)
+#
+# Pre-condition: SWE swe1 patch is already running (or has been applied).
+# This script starts from whatever the CURRENT adapter+Ollama state is,
+# runs a baseline SWE eval, then patches until ≥95%.
+set -euo pipefail
+TRAINING_DIR="$HOME/synalux-private/prism-training"
+LLAMA_CPP=~/llama.cpp
+cd "$TRAINING_DIR"
+log()  { echo "[4b-orch] $(date '+%H:%M:%S') $*" | tee -a "$ORCH_LOG"; }
+fail() { log "FATAL: $*"; exit 1; }
+# ── Config ──────────────────────────────────────────────────────────────────
+ADAPTER_DIR="/tmp/4b_v43_adapter"
+BASE_GGUF_DIR="$HOME/prism/training/models"
+ORCH_LOG="$HOME/prism/training/logs/orch_4b_to_100.log"
+EVAL_GATE=0.90      # 90% strict target on eval_300.py unified eval
+MAX_ROUNDS=20
+# Resumption: set >0 to skip STAGE 1 and start main loop at this round
+RESUME_ROUND=${RESUME_ROUND:-0}
+RESUME_STRICT=${RESUME_STRICT:-0}
+RESUME_TAG=${RESUME_TAG:-""}
+# LR schedule — rounds 17-20 use 5e-6/200 (stable regime for final polish)
+LRS=(3e-5 2e-5 1.5e-5 1e-5 8e-6 5e-6 8e-6 5e-6 8e-6 5e-6 5e-6 5e-6 5e-6 5e-6 8e-6 5e-6 5e-6 5e-6 5e-6 5e-6)
+ITERS=(300 250 200 150 120 100 150 150 200 200 200 200 200 200 300 200 200 200 200 200)
+mkdir -p "$HOME/prism/training/logs"
+echo "=== orchestrate_4b_to_100 start ===" | tee "$ORCH_LOG"
+echo "Start: $(date)" | tee -a "$ORCH_LOG"
+# ── Helpers ──────────────────────────────────────────────────────────────────
+wait_for_process() {
+    local pid=$1 desc=$2
+    log "Waiting for $desc (PID $pid) to finish..."
+    while kill -0 "$pid" 2>/dev/null; do sleep 20; done
+    log "$desc finished."
+}
+eval300_strict_pct() {
+    local report=$1
+    python3 -c "
+import json, sys
+r = json.load(open('$report'))
+print(f\"{r['summary']['strict_pct']*100:.1f}\")
+" 2>/dev/null || echo "0"
+}
+merge_and_register() {
+    local tag=$1 gguf_q4=$2
+    log "Merging LoRA → GGUF → Ollama ($tag)..."
+    local gguf_f16="${gguf_q4/q4km/f16}"
+    local merged_dir="/tmp/4b_merged_${tag}"
+    python3 merge_4b_v43.py \
+        --base /Users/admin/.cache/huggingface/hub/models--Qwen--Qwen3-4B/snapshots/1cfa9a7208912126459214e8b04321603b3df60c \
+        --adapter "$ADAPTER_DIR" \
+        --out "$merged_dir" 2>&1 | tee -a "$ORCH_LOG"
+    python3 "$LLAMA_CPP/convert_hf_to_gguf.py" \
+        "$merged_dir" --outfile "$gguf_f16" --outtype f16 \
+        2>&1 | tee -a "$ORCH_LOG" | tail -3
+    [ -f "$gguf_f16" ] || fail "F16 GGUF not produced for $tag"
+    "$LLAMA_CPP/build/bin/llama-quantize" "$gguf_f16" "$gguf_q4" Q4_K_M \
+        2>&1 | tee -a "$ORCH_LOG" | tail -3
+    [ -f "$gguf_q4" ] || fail "Q4KM GGUF not produced for $tag"
+    rm -f "$gguf_f16"
+    rm -rf "$merged_dir"
+    log "Q4KM: $(ls -lh $gguf_q4)"
+    ollama rm "$tag" 2>/dev/null || true
+    MODELFILE=$(mktemp)
+    cat > "$MODELFILE" << MEOF
+FROM $gguf_q4
+PARAMETER temperature 0
+PARAMETER num_ctx 8192
+PARAMETER num_predict 256
+PARAMETER stop "<|im_end|>"
+PARAMETER stop "<|endoftext|>"
+MEOF
+    ollama create "$tag" -f "$MODELFILE" 2>&1 | tee -a "$ORCH_LOG"
+    rm "$MODELFILE"
+    log "Registered: $tag"
+}
+run_eval300() {
+    local tag=$1 report=$2
+    log "eval_300 eval: $tag..."
+    python3 eval_300.py --model "$tag" \
+        2>&1 | tee "${report%.json}.log" || true
+    cp results/eval300_report.json "$report"
+}
+train_patch() {
+    local round=$1 data_dir=$2 lr=$3 iters=$4
+    log "Training round $round: iters=$iters lr=$lr data=$data_dir..."
+    mlx_lm.lora \
+        --model "Qwen/Qwen3-4B" \
+        --train \
+        --data "$data_dir" \
+        --adapter-path "$ADAPTER_DIR" \
+        --resume-adapter-file "$ADAPTER_DIR/adapters.safetensors" \
+        --num-layers 16 \
+        --batch-size 2 \
+        --grad-checkpoint \
+        --iters "$iters" \
+        --val-batches 10 \
+        --learning-rate "$lr" \
+        --steps-per-report 10 \
+        --steps-per-eval 100 \
+        --save-every 100 \
+        --max-seq-length 2048 \
+        --seed $((2031 + round)) \
+        2>&1 | tee -a "$HOME/prism/training/logs/train_4b_swe_r${round}.log"
+}
+# ── STAGE 0: Wait for current SWE patch (swe1) to complete ────────────────
+SWE1_PID=$(pgrep -f "train_4b_v43_swe_patch" || echo "")
+LAUNCH_PID=$(pgrep -f "launch_4b_swe_after_1b7" || echo "")
+if [ -n "$SWE1_PID" ]; then
+    log "SWE patch 1 training still running (PID $SWE1_PID) — waiting..."
+    wait_for_process "$SWE1_PID" "SWE patch 1 training"
+elif [ -n "$LAUNCH_PID" ]; then
+    log "Auto-launcher still running (PID $LAUNCH_PID) — waiting..."
+    wait_for_process "$LAUNCH_PID" "4B SWE auto-launcher"
+else
+    log "No SWE patch 1 process found — assuming already complete or not started."
+fi
+# Short settle time after training
+sleep 10
+# ── STAGE 1: Baseline SWE after patch1 ────────────────────────────────────
+REPORT_DIR="$HOME/prism/training/results/4b_swe"
+mkdir -p "$REPORT_DIR"
+if [ "$RESUME_ROUND" -gt 0 ]; then
+    log "=== RESUMING from round $RESUME_ROUND (tag: $RESUME_TAG, strict: $RESUME_STRICT%) ==="
+    CURRENT_TAG="$RESUME_TAG"
+    PREV_REPORT="$REPORT_DIR/eval300_r$((RESUME_ROUND - 1)).json"
+    PREV_STRICT="$RESUME_STRICT"
+    STRICT="$RESUME_STRICT"
+    ROUND="$RESUME_ROUND"
+else
+    log "=== STAGE 1: Baseline SWE eval (post-patch1) ==="
+    # Check if prism-coder:4b-swe1 exists (patch1 output tag)
+    CURRENT_TAG="prism-coder:4b-swe1"
+    if ! ollama list | grep -q "$CURRENT_TAG"; then
+        log "prism-coder:4b-swe1 not in Ollama — running merge for current adapter..."
+        CURRENT_TAG="prism-coder:4b-swe0"
+        merge_and_register "$CURRENT_TAG" "$BASE_GGUF_DIR/qwen3-4b-swe0-q4km.gguf"
+    fi
+    run_eval300 "$CURRENT_TAG" "$REPORT_DIR/eval300_r0.json"
+    STRICT=$(eval300_strict_pct "$REPORT_DIR/eval300_r0.json")
+    log "Round 0 eval_300 strict: ${STRICT}%"
+    # Check if already at gate
+    if python3 -c "import sys; sys.exit(0 if float('${STRICT}') >= ${EVAL_GATE} * 100 else 1)" 2>/dev/null; then
+        log "Already at eval gate (${STRICT}% ≥ $(python3 -c "print(${EVAL_GATE}*100)"))!"
+        log "Promoting prism-coder:4b..."
+        ollama rm prism-coder:4b 2>/dev/null || true
+        BEST_GGUF=$(ls -t "$BASE_GGUF_DIR"/qwen3-4b-*q4km.gguf 2>/dev/null | head -1)
+        if [ -n "$BEST_GGUF" ]; then
+            MODELFILE=$(mktemp)
+            printf "FROM %s\nPARAMETER temperature 0\nPARAMETER num_ctx 8192\nPARAMETER num_predict 256\nPARAMETER stop \"<|im_end|>\"\nPARAMETER stop \"<|endoftext|>\"\n" "$BEST_GGUF" > "$MODELFILE"
+            ollama create prism-coder:4b -f "$MODELFILE"
+            rm "$MODELFILE"
+        fi
+        log "SUCCESS: prism-coder:4b promoted from $CURRENT_TAG"
+        exit 0
+    fi
+    # ── MAIN LOOP ─────────────────────────────────────────────────────────────
+    PREV_REPORT="$REPORT_DIR/eval300_r0.json"
+    PREV_STRICT="$STRICT"
+    ROUND=1
+fi
+while [ "$ROUND" -le "$MAX_ROUNDS" ]; do
+    log "=== ROUND $ROUND ==="
+    LR_IDX=$((ROUND - 1))
+    LR="${LRS[$LR_IDX]:-5e-6}"
+    IT="${ITERS[$LR_IDX]:-100}"
+    # Build patch corpus from previous round's failures
+    DATA_DIR="/tmp/4b_swe_r${ROUND}_data"
+    log "Building patch corpus from failures in round $((ROUND-1))..."
+    python3 analyze_swe_failures.py \
+        --report "$PREV_REPORT" \
+        --version "r${ROUND}" \
+        --out "$DATA_DIR" \
+        --seed $((2031 + ROUND)) \
+        2>&1 | tee -a "$ORCH_LOG"
+    TRAIN_ROWS=$(wc -l < "$DATA_DIR/train.jsonl")
+    log "Patch corpus: $TRAIN_ROWS train rows"
+    if [ "$TRAIN_ROWS" -lt 20 ]; then
+        log "Too few failures to patch ($TRAIN_ROWS rows) — stopping loop."
+        break
+    fi
+    # Train
+    train_patch "$ROUND" "$DATA_DIR" "$LR" "$IT"
+    # Merge + register
+    TAG="prism-coder:4b-swe${ROUND}"
+    GGUF_Q4="$BASE_GGUF_DIR/qwen3-4b-swe${ROUND}-q4km.gguf"
+    merge_and_register "$TAG" "$GGUF_Q4"
+    # eval_300 unified eval
+    REPORT="$REPORT_DIR/eval300_r${ROUND}.json"
+    run_eval300 "$TAG" "$REPORT"
+    STRICT=$(eval300_strict_pct "$REPORT")
+    log "Round $ROUND eval_300 strict: ${STRICT}% (prev: ${PREV_STRICT}%)"
+    CURRENT_TAG="$TAG"
+    PREV_REPORT="$REPORT"
+    PREV_STRICT="$STRICT"
+    # Check eval gate
+    if python3 -c "import sys; sys.exit(0 if float('${STRICT}') >= ${EVAL_GATE} * 100 else 1)" 2>/dev/null; then
+        log "Eval gate PASSED: ${STRICT}% ≥ $(python3 -c "print(${EVAL_GATE}*100)")%"
+        log "Promoting prism-coder:4b..."
+        ollama rm prism-coder:4b 2>/dev/null || true
+        MODELFILE=$(mktemp)
+        printf "FROM %s\nPARAMETER temperature 0\nPARAMETER num_ctx 8192\nPARAMETER num_predict 256\nPARAMETER stop \"<|im_end|>\"\nPARAMETER stop \"<|endoftext|>\"\n" "$GGUF_Q4" > "$MODELFILE"
+        ollama create prism-coder:4b -f "$MODELFILE"
+        rm "$MODELFILE"
+        log "SUCCESS: prism-coder:4b promoted (eval_300 ${STRICT}%)"
+        echo ""
+        echo "=== PROMOTED prism-coder:4b ==="
+        echo "  eval_300 strict: ${STRICT}%"
+        echo "  Source tag:      $CURRENT_TAG"
+        echo "  GGUF:            $GGUF_Q4"
+        echo ""
+        echo "Next: run 3-seed validation before syncing to Ollama Hub:"
+        echo "  python3 ~/synalux-private/prism-training/eval_300.py --model prism-coder:4b --runs 3"
+        exit 0
+    fi
+    ROUND=$((ROUND + 1))
+done
+# ── Exhausted rounds ─────────────────────────────────────────────────────────
+echo ""
+echo "=== MAX ROUNDS REACHED: $(date) ===" | tee -a "$ORCH_LOG"
+echo "Best: $CURRENT_TAG at SWE ${PREV_STRICT}%" | tee -a "$ORCH_LOG"
+echo "" | tee -a "$ORCH_LOG"
+echo "Manual analysis needed. Remaining failures in: $PREV_REPORT" | tee -a "$ORCH_LOG"
+echo "To continue:" | tee -a "$ORCH_LOG"
+echo "  python3 analyze_swe_failures.py --report $PREV_REPORT --version manual" | tee -a "$ORCH_LOG"
+echo "  # Review /tmp/4b_swe_manual_data/train.jsonl and add handcrafted examples" | tee -a "$ORCH_LOG"
+echo "  # Then re-run this script" | tee -a "$ORCH_LOG"