dcostenco commited on
Commit
9ae524d
·
verified ·
1 Parent(s): 0fab056

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -37,6 +37,21 @@ Primary deployment: **iOS and edge devices** via llama.cpp GGUF.
37
  Eval: MLX inference + thinking, temperature=0, 3-seed mean.
38
  Gate: ≥90% = deploy.
39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  ## Version History
41
 
42
  | Version | BFCL | Notes |
 
37
  Eval: MLX inference + thinking, temperature=0, 3-seed mean.
38
  Gate: ≥90% = deploy.
39
 
40
+ ## Cascade Benchmark (May 2026)
41
+
42
+ Full desktop cascade: **14b → 32b → Claude Opus** (102 cases × 3 seeds)
43
+
44
+ | Metric | Result |
45
+ |--------|--------|
46
+ | Cascade accuracy | **100.0%** (mean, 3 seeds) |
47
+ | Opus-solo etalon | 98.3% |
48
+ | Δ vs Opus | **+1.7%** |
49
+ | Traffic served by 14b | **99%** (101/102 cases avg) |
50
+ | Traffic escalated to 32b | 1% (1/102 avg) |
51
+ | Traffic reaching Opus API | **0%** |
52
+
53
+ Fine-tuned cascade outperforms Claude Opus on `edge` (+16.7%) and `know` (+14.3%).
54
+
55
  ## Version History
56
 
57
  | Version | BFCL | Notes |