Instructions to use dcostenco/prism-coder-4b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dcostenco/prism-coder-4b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="dcostenco/prism-coder-4b",
	filename="prism-coder-4b-v43-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use dcostenco/prism-coder-4b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf dcostenco/prism-coder-4b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf dcostenco/prism-coder-4b:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf dcostenco/prism-coder-4b:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf dcostenco/prism-coder-4b:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf dcostenco/prism-coder-4b:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf dcostenco/prism-coder-4b:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf dcostenco/prism-coder-4b:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf dcostenco/prism-coder-4b:Q4_K_M

Use Docker

docker model run hf.co/dcostenco/prism-coder-4b:Q4_K_M

LM Studio
Jan

vLLM

How to use dcostenco/prism-coder-4b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dcostenco/prism-coder-4b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dcostenco/prism-coder-4b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/dcostenco/prism-coder-4b:Q4_K_M

Ollama
How to use dcostenco/prism-coder-4b with Ollama:
```
ollama run hf.co/dcostenco/prism-coder-4b:Q4_K_M
```

Unsloth Studio

How to use dcostenco/prism-coder-4b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for dcostenco/prism-coder-4b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for dcostenco/prism-coder-4b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for dcostenco/prism-coder-4b to start chatting

How to use dcostenco/prism-coder-4b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf dcostenco/prism-coder-4b:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "dcostenco/prism-coder-4b:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use dcostenco/prism-coder-4b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf dcostenco/prism-coder-4b:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default dcostenco/prism-coder-4b:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use dcostenco/prism-coder-4b with Docker Model Runner:
```
docker model run hf.co/dcostenco/prism-coder-4b:Q4_K_M
```

Lemonade

How to use dcostenco/prism-coder-4b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull dcostenco/prism-coder-4b:Q4_K_M

Run and chat with the model

lemonade run user.prism-coder-4b-Q4_K_M

List all available models

lemonade list

dcostenco commited on 6 days ago

Commit

0dfe3d5

verified ·

1 Parent(s): 7d5ab03

Add training/build_4b_v43_corpus.py

Browse files

Files changed (1) hide show

training/build_4b_v43_corpus.py +238 -0

training/build_4b_v43_corpus.py ADDED Viewed

	@@ -0,0 +1,238 @@

+#!/usr/bin/env python3
+"""
+build_4b_v43_corpus.py — Corpus builder for Prism Coder 4B v43.
+Same source mix as 14B v44 (topfive_v2, combined_aac_full, layer3, grounded_recall).
+Outputs two files for mlx_lm.lora: train.jsonl + valid.jsonl in a target directory.
+Required sources:
+  - topfive_v2.train.7b.jsonl       v2 mix (40% AAC / 12% abstention / 12% safety / 36% tool-use)
+  - combined_aac_full.jsonl          57k clinical AAC corpus, subsampled to 7000
+  - layer3_corpus.jsonl              45 rows × 5 oversample — MANDATORY for 3-layer AAC arch
+  - grounded_recall_corpus.jsonl     40 rows × 5 oversample — cascade/verifier compatibility
+Usage:
+    python3 build_4b_v43_corpus.py [--out-dir DIR] [--valid-frac 0.05] [--seed 42]
+Hard-gate audit runs before writing. Exits 1 on any failure.
+"""
+import json
+import random
+import sys
+from pathlib import Path
+import argparse
+BASE_DIR    = Path("/Users/admin/synalux-private/prism-training/data/topfive")
+PRISM_DATA  = Path("/Users/admin/prism/training/data")
+SOURCES = {
+    "v2_base":         BASE_DIR / "topfive_v2.train.7b.jsonl",
+    "combined_aac":    BASE_DIR / "combined_aac_full.jsonl",
+    "layer3":          BASE_DIR / "layer3_corpus.jsonl",
+    "grounded_recall": PRISM_DATA / "grounded_recall_corpus.jsonl",
+}
+AAC_FULL_SUBSAMPLE  = 7000
+LAYER3_OVERSAMPLE   = 5
+GROUNDED_OVERSAMPLE = 5
+GATE = {
+    "min_total":      20_000,
+    "min_aac_frac":   0.35,
+    "min_layer3":     40,
+    "min_grounded":   80,
+    "min_tool_calls": 5_000,
+    "min_safety":     500,
+}
+def messages_to_chatml(messages: list) -> str:
+    parts = []
+    for m in messages:
+        role = m.get("role", "user")
+        content = m.get("content", "")
+        parts.append(f"<|im_start|>{role}\n{content}<|im_end|>")
+    return "\n".join(parts)
+def normalize(row: dict) -> dict:
+    if "text" in row:
+        return {"text": row["text"], "_bucket": row.get("_bucket", ""), "source": row.get("source", "")}
+    if "messages" in row:
+        return {
+            "text": messages_to_chatml(row["messages"]),
+            "_bucket": row.get("_bucket", ""),
+            "source": row.get("source", ""),
+        }
+    return row
+def load_jsonl(path: Path) -> list[dict]:
+    if not path.exists():
+        print(f"FATAL: {path} missing — aborting", file=sys.stderr)
+        sys.exit(1)
+    rows = []
+    with path.open() as f:
+        for line in f:
+            line = line.strip()
+            if line:
+                try:
+                    rows.append(json.loads(line))
+                except json.JSONDecodeError:
+                    pass
+    return rows
+def oversample(rows: list[dict], factor: int, rng: random.Random) -> list[dict]:
+    out = []
+    for _ in range(factor):
+        cycle = rows.copy()
+        rng.shuffle(cycle)
+        out.extend(cycle)
+    return out
+def main():
+    p = argparse.ArgumentParser()
+    p.add_argument("--out-dir", type=Path, default=Path("/tmp/4b_v43_data"),
+                   help="Output directory — will contain train.jsonl and valid.jsonl")
+    p.add_argument("--valid-frac", type=float, default=0.05,
+                   help="Fraction of data held out for validation (default 0.05)")
+    p.add_argument("--seed", type=int, default=42)
+    args = p.parse_args()
+    rng = random.Random(args.seed)
+    print("=== Prism Coder 4B v43 Corpus Builder ===\n")
+    print("Checking source files...")
+    for name, path in SOURCES.items():
+        if not path.exists():
+            print(f"  FATAL: {name} missing: {path}", file=sys.stderr)
+            sys.exit(1)
+        print(f"  OK: {name} ({path.name})")
+    print()
+    # 1. v2 base mix
+    v2_rows = [normalize(r) for r in load_jsonl(SOURCES["v2_base"])]
+    print(f"v2 base mix:          {len(v2_rows):>6} rows")
+    # 2. combined_aac_full subsample
+    aac_full = load_jsonl(SOURCES["combined_aac"])
+    rng.shuffle(aac_full)
+    aac_sample = [normalize(r) for r in aac_full[:AAC_FULL_SUBSAMPLE]]
+    for r in aac_sample:
+        r["_bucket"] = "aac"
+        r.setdefault("source", "combined_aac_full")
+    print(f"combined_aac subsamp: {len(aac_sample):>6} rows (from {len(aac_full)} available)")
+    # 3. layer3 oversample (MANDATORY)
+    layer3_base = load_jsonl(SOURCES["layer3"])
+    layer3_rows = oversample([normalize(r) for r in layer3_base], LAYER3_OVERSAMPLE, rng)
+    for r in layer3_rows:
+        r["_bucket"] = "layer3"
+        r.setdefault("source", "layer3_corpus")
+    print(f"layer3 (×{LAYER3_OVERSAMPLE}):          {len(layer3_rows):>6} rows")
+    # 4. grounded_recall oversample
+    gr_base = load_jsonl(SOURCES["grounded_recall"])
+    gr_rows = oversample([normalize(r) for r in gr_base], GROUNDED_OVERSAMPLE, rng)
+    for r in gr_rows:
+        r["_bucket"] = "grounded_recall"
+        r.setdefault("source", "grounded_recall_corpus")
+    print(f"grounded_recall (×{GROUNDED_OVERSAMPLE}): {len(gr_rows):>6} rows")
+    # 5. Compose + shuffle
+    all_rows = v2_rows + aac_sample + layer3_rows + gr_rows
+    rng.shuffle(all_rows)
+    print(f"\nTotal before filter:  {len(all_rows):>6} rows")
+    # 6. Audit (on all_rows while bucket tags still present)
+    print("\n=== CORPUS AUDIT (mandatory — failures are fatal) ===")
+    failed = False
+    total = len(all_rows)
+    if total < GATE["min_total"]:
+        print(f"  FAIL: total {total} < {GATE['min_total']}")
+        failed = True
+    else:
+        print(f"  OK:   total {total} >= {GATE['min_total']}")
+    aac_count = sum(1 for r in all_rows if r.get("_bucket") == "aac")
+    aac_frac  = aac_count / total
+    print(f"  AAC fraction: {aac_frac:.1%}  ({aac_count} rows)")
+    if aac_frac < GATE["min_aac_frac"]:
+        print(f"  FAIL: AAC fraction {aac_frac:.1%} < {GATE['min_aac_frac']:.1%}")
+        failed = True
+    else:
+        print(f"  OK:   AAC fraction >= {GATE['min_aac_frac']:.1%}")
+    layer3_count = sum(1 for r in all_rows if "[LAYER3" in r.get("text", ""))
+    print(f"  Layer3 examples: {layer3_count}")
+    if layer3_count < GATE["min_layer3"]:
+        print(f"  FAIL: layer3 {layer3_count} < {GATE['min_layer3']}")
+        failed = True
+    else:
+        print(f"  OK:   layer3 >= {GATE['min_layer3']}")
+    gr_count = sum(1 for r in all_rows if r.get("_bucket") == "grounded_recall")
+    print(f"  Grounded recall rows: {gr_count}")
+    if gr_count < GATE["min_grounded"]:
+        print(f"  FAIL: grounded_recall {gr_count} < {GATE['min_grounded']}")
+        failed = True
+    else:
+        print(f"  OK:   grounded_recall >= {GATE['min_grounded']}")
+    tool_count = sum(1 for r in all_rows if "<tool_call>" in r.get("text", "") or "tool_call" in r.get("text", ""))
+    print(f"  Tool-call SFT rows: {tool_count}")
+    if tool_count < GATE["min_tool_calls"]:
+        print(f"  FAIL: tool_calls {tool_count} < {GATE['min_tool_calls']}")
+        failed = True
+    else:
+        print(f"  OK:   tool_calls >= {GATE['min_tool_calls']}")
+    safety_count = sum(1 for r in all_rows if any(
+        t in r.get("text", "") for t in ["abstain", "cannot", "refuse", "I should not", "harmful"]
+    ))
+    print(f"  Safety/abstention rows (approx): {safety_count}")
+    if safety_count < GATE["min_safety"]:
+        print(f"  WARN: safety {safety_count} < {GATE['min_safety']}")
+    else:
+        print(f"  OK:   safety/abstention present")
+    print("\n=== Composition summary ===")
+    for bucket in ["aac", "abstention", "safety", "tool_use", "layer3", "grounded_recall"]:
+        n   = sum(1 for r in all_rows if r.get("_bucket") == bucket)
+        pct = n / total * 100 if total else 0
+        print(f"  {bucket:>20}: {n:>6} ({pct:5.1f}%)")
+    if failed:
+        print("\nFATAL: Corpus audit failed — DO NOT use this corpus for training")
+        sys.exit(1)
+    # 7. Strip metadata, write train/valid split
+    final = [{"text": r["text"]} for r in all_rows if r.get("text", "").strip()]
+    rng.shuffle(final)
+    n_valid = max(1, int(len(final) * args.valid_frac))
+    valid_rows = final[:n_valid]
+    train_rows = final[n_valid:]
+    args.out_dir.mkdir(parents=True, exist_ok=True)
+    train_path = args.out_dir / "train.jsonl"
+    valid_path = args.out_dir / "valid.jsonl"
+    with train_path.open("w") as f:
+        for row in train_rows:
+            f.write(json.dumps(row, ensure_ascii=False) + "\n")
+    with valid_path.open("w") as f:
+        for row in valid_rows:
+            f.write(json.dumps(row, ensure_ascii=False) + "\n")
+    print(f"\n✅ All gates passed — corpus ready")
+    print(f"  Train: {len(train_rows):>6} rows → {train_path}")
+    print(f"  Valid: {len(valid_rows):>6} rows → {valid_path}")
+    print(f"\nNext: bash /Users/admin/synalux-private/prism-training/train_4b_v43_local.sh")
+if __name__ == "__main__":
+    main()