johnsonchromia commited on
Commit
204fa8a
·
verified ·
1 Parent(s): 5235bdc

Link full release benchmark report from card

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -47,7 +47,8 @@ are at [`evalengine/unbound-e2b-GGUF`](https://huggingface.co/evalengine/unbound
47
  Capability holds within ≤1.5 pp of base on every axis; refusal collapses
48
  from 98% → 4%. GPQA-Diamond + BBH are the lm-eval-harness "release" suite at
49
  `--limit 200` — base and finetune through the same harness, so the **delta**
50
- is apples-to-apples.
 
51
 
52
  ## Sampling
53
 
 
47
  Capability holds within ≤1.5 pp of base on every axis; refusal collapses
48
  from 98% → 4%. GPQA-Diamond + BBH are the lm-eval-harness "release" suite at
49
  `--limit 200` — base and finetune through the same harness, so the **delta**
50
+ is apples-to-apples. Full report — methodology, per-subtask BBH breakdown,
51
+ what we did not measure — in [`release-benchmark-report.md`](release-benchmark-report.md).
52
 
53
  ## Sampling
54