dcostenco commited on
Commit
4d685dc
·
verified ·
1 Parent(s): 65d27c1

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +15 -0
README.md CHANGED
@@ -39,6 +39,21 @@ All 12 categories at 100%. No remaining failures.
39
  Eval: MLX inference + thinking, temperature=0, 3-seed mean.
40
  Gate: ≥90% = deploy.
41
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42
  ## Version History
43
 
44
  | Version | BFCL | Notes |
 
39
  Eval: MLX inference + thinking, temperature=0, 3-seed mean.
40
  Gate: ≥90% = deploy.
41
 
42
+ ## Cascade Benchmark (May 2026)
43
+
44
+ Full desktop cascade: **14b → 32b → Claude Opus** (102 cases × 3 seeds)
45
+
46
+ | Metric | Result |
47
+ |--------|--------|
48
+ | Cascade accuracy | **100.0%** (mean, 3 seeds) |
49
+ | Opus-solo etalon | 98.3% |
50
+ | Δ vs Opus | **+1.7%** |
51
+ | Traffic served by 14b | **99%** (101/102 cases avg) |
52
+ | Traffic escalated to 32b | 1% (1/102 avg) |
53
+ | Traffic reaching Opus API | **0%** |
54
+
55
+ Fine-tuned cascade outperforms Claude Opus on `edge` (+16.7%) and `know` (+14.3%).
56
+
57
  ## Version History
58
 
59
  | Version | BFCL | Notes |