Instructions to use dcostenco/prism-coder-14b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dcostenco/prism-coder-14b with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="dcostenco/prism-coder-14b",
	filename="prism-aac-14b-q4km.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use dcostenco/prism-coder-14b with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf dcostenco/prism-coder-14b
# Run inference directly in the terminal:
llama-cli -hf dcostenco/prism-coder-14b

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf dcostenco/prism-coder-14b
# Run inference directly in the terminal:
llama-cli -hf dcostenco/prism-coder-14b

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf dcostenco/prism-coder-14b
# Run inference directly in the terminal:
./llama-cli -hf dcostenco/prism-coder-14b

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf dcostenco/prism-coder-14b
# Run inference directly in the terminal:
./build/bin/llama-cli -hf dcostenco/prism-coder-14b

Use Docker

docker model run hf.co/dcostenco/prism-coder-14b

LM Studio
Jan
Ollama
How to use dcostenco/prism-coder-14b with Ollama:
```
ollama run hf.co/dcostenco/prism-coder-14b
```

Unsloth Studio new

How to use dcostenco/prism-coder-14b with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for dcostenco/prism-coder-14b to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for dcostenco/prism-coder-14b to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for dcostenco/prism-coder-14b to start chatting

Pi new

How to use dcostenco/prism-coder-14b with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf dcostenco/prism-coder-14b

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "dcostenco/prism-coder-14b"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use dcostenco/prism-coder-14b with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf dcostenco/prism-coder-14b

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default dcostenco/prism-coder-14b

Run Hermes

hermes

Docker Model Runner
How to use dcostenco/prism-coder-14b with Docker Model Runner:
```
docker model run hf.co/dcostenco/prism-coder-14b
```

Lemonade

How to use dcostenco/prism-coder-14b with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull dcostenco/prism-coder-14b

Run and chat with the model

lemonade run user.prism-coder-14b-{{QUANT_TAG}}

List all available models

lemonade list

dcostenco commited on 24 days ago

Commit

4ed0ef7

verified ·

1 Parent(s): 3277b4e

Add model card: 14B v18coder-base, BFCL V4 in progress, sibling to 7B

Browse files

Files changed (1) hide show

README.md +129 -0

README.md ADDED Viewed

	@@ -0,0 +1,129 @@

+---
+language:
+  - en
+  - es
+  - fr
+  - pt
+  - de
+  - zh
+  - ja
+  - ko
+  - ru
+  - ar
+  - ro
+  - uk
+license: apache-2.0
+base_model: Qwen/Qwen2.5-Coder-14B-Instruct
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+  - qwen2
+  - function-calling
+  - tool-use
+  - aac
+  - accessibility
+  - prism
+  - synalux
+  - bfcl
+  - conversational
+---
+# Prism-Coder 14B — Function Calling + AAC Sibling (32K context)
+A fine-tune of **Qwen2.5-Coder-14B-Instruct** released **2026-05-04** as a sibling to [`prism-coder-7b`](https://huggingface.co/dcostenco/prism-coder-7b). Auto-routed for paid-tier medium-length AAC queries via the Synalux portal — keeps inference local on cloud GPU pool, $0 marginal cost vs Claude/Gemini.
+## Sibling positioning
+| Model | Use case | Context | RAM (Q4) |
+|---|---|---|---|
+| `prism-coder-7b` | iPad consumer AAC, free portal tier | 32K | ~5 GB |
+| **`prism-coder-14b`** | **Mac/desktop AAC, paid portal tier (medium queries)** | **32K** | **~9 GB** |
+| `prism-coder-32b` (in flight, Phase 1) | Synalux cloud paid-tier complex queries | 32K | ~20 GB |
+## Eval (Prism internal, 3-run StdDev 0%)
+| Metric | Score |
+|---|---|
+| BFCL (Prism 64-test) | 85.9% |
+| AAC realigned | 46/48 (95.8%) |
+| Caregiver targeted | 18/20 |
+| Emergency QA | 13/13 |
+| Text correction | 14/15 |
+| Translation | 8/8 |
+| Ask AI | 5/5 |
+The 14B is NOT explicitly AAC-trained (data was BFCL/tool-calling focused) — its high AAC scores are emergent from Qwen2.5-Coder-14B-Instruct's strong instruct-tuning + format transfer from BFCL training. The 7B sibling explicitly includes AAC SFT data and edges out 14B on caregiver targeted (20/20 vs 18/20) but not on general reasoning.
+## Berkeley BFCL V4 (in progress)
+Handler integration PR open at [`ShishirPatil/gorilla#1332`](https://github.com/ShishirPatil/gorilla/pull/1332) supporting `prism-coder-14b-FC` alongside the 7B/32B/72B variants. Self-run with the official Berkeley toolkit is in progress; numbers will be appended once complete.
+## Use cases
+### Synalux portal — paid tier
+Tier-aware routing dispatches:
+- **Simple AAC queries** → 7B local (cheap, fast)
+- **Medium queries (5-40 words)** → **14B local (this model)** — stronger reasoning, $0 marginal
+- **Complex queries** → Claude Opus / Haiku per tier
+This routing alone is estimated to save $190K-210K/year at 10K-user scale vs all-cloud routing.
+### Self-hosted Mac / desktop AAC
+Q4_K_M GGUF (~9 GB) fits on Mac M2/M3/M4 with ≥16 GB RAM. Runs at 15-30 tok/s — comfortable for AAC turns.
+## Format
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+tok = AutoTokenizer.from_pretrained("dcostenco/prism-coder-14b")
+m = AutoModelForCausalLM.from_pretrained(
+    "dcostenco/prism-coder-14b",
+    torch_dtype=torch.bfloat16,
+    device_map="auto",
+)
+prompt = tok.apply_chat_template(
+    [{"role": "user", "content": "Add 'eat apples' to the food category."}],
+    tokenize=False,
+    add_generation_prompt=True,
+)
+inputs = tok(prompt, return_tensors="pt").to(m.device)
+out = m.generate(**inputs, max_new_tokens=160, temperature=0.3)
+print(tok.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True))
+```
+For Ollama users, a Q4_K_M GGUF is available via the `prism-coder:14b` tag in the Synalux ops fleet.
+## Training
+- Base: `Qwen/Qwen2.5-Coder-14B-Instruct`
+- Method: DoRA SFT (resumed from base 14B SFT checkpoint-5000)
+- Adapter: r=128, alpha=256, lora_dropout=0.05
+- Schedule: 1 epoch, LR 1e-5 cosine, warmup 5%
+- Data: glaive-function-calling-v2 + ToolACE + xlam-function-calling-60k + internal v17.1 BFCL (60K rows subsampled, Hammer-style 24% function-masked)
+- Compute: H100×2 on Modal, ~10h total
+## License
+Apache 2.0. Free for research and commercial use.
+## Citation
+```bibtex
+@misc{prism-coder-14b-2026,
+  title         = {Prism-Coder 14B: Function Calling + AAC Sibling Fine-Tune of Qwen2.5-Coder-14B},
+  author        = {Synalux AI / Dmitri Costenco},
+  year          = {2026},
+  month         = {May},
+  url           = {https://huggingface.co/dcostenco/prism-coder-14b},
+  note          = {Sibling 7B model: https://huggingface.co/dcostenco/prism-coder-7b. PR: https://github.com/ShishirPatil/gorilla/pull/1332.}
+}
+```
+## Related
+- 7B sibling: [`dcostenco/prism-coder-7b`](https://huggingface.co/dcostenco/prism-coder-7b)
+- Berkeley BFCL V4 PR: [`ShishirPatil/gorilla#1332`](https://github.com/ShishirPatil/gorilla/pull/1332)
+- Synalux portal: [synalux.ai](https://synalux.ai)
+- PrismAAC consumer app: [github.com/dcostenco/prism-aac](https://github.com/dcostenco/prism-aac)