prism-coder-8b / README.md
dcostenco's picture
Update model card: add v31 (95.1% BFCL, smem/know boundary fix)
395cd09 verified
metadata
language: en
license: apache-2.0
tags:
  - tool-routing
  - function-calling
  - prism-aac
  - qwen3
  - gguf
base_model: Qwen/Qwen3-8B

prism-coder:8b β€” Tool Routing Model (iOS Tier)

Fine-tuned Qwen3-8B for 6-tool routing in the Prism AAC system. Primary deployment: iOS/edge via llama.cpp GGUF.

Versions

Version File BFCL Notes
v31 qwen3-8b-v31-q4km.gguf 95.1% Surgical smem/know boundary + save fixes
v30 qwen3-8b-v30-q4km.gguf 95.0% Routing corpus v36_1b7

BFCL Routing Benchmark (v31)

  • 95.1% β€” 3-seed mean (seeds 2027/2028/2029), 100 cases each
  • Eval: MLX inference, greedy (temp=0), Qwen3 thinking suppressed
  • Gate: β‰₯90% = deploy

Tools

  1. session_load_context β€” load/fetch/resume project context
  2. session_save_ledger β€” note/log/remember/record
  3. session_save_handoff β€” handoff/relay/next-agent transition
  4. session_compact_ledger β€” compact/archive ledger
  5. session_search_memory β€” recall past sessions/conversations
  6. knowledge_search β€” search stored notes/knowledge base

Cascade Role

iOS fallback tier. Desktop cascade uses 14B β†’ 32B β†’ cloud Claude. 8B handles edge/offline scenarios where RAM < 6GB.

Usage (Ollama)

ollama pull dcostenco/prism-coder:8b-v30
ollama run dcostenco/prism-coder:8b-v30

Training

  • Base: Qwen3-8B (MLX 4-bit)
  • Framework: MLX-LM LoRA (8 layers, batch 2, grad-checkpoint)
  • v31 data: 361 train / 41 valid (targeted smem/know boundary augmentations)
  • v31 LR: 3e-6 (surgical, 200 iters)
  • Peak memory: 7.0 GB