Add GSM8K eval result (self-reported, symbolic verifier)

#1
by codelion - opened
Files changed (1) hide show
  1. .eval_results/gsm8k.yaml +14 -0
.eval_results/gsm8k.yaml ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ - dataset:
2
+ id: openai/gsm8k
3
+ task_id: gsm8k
4
+ value: 11.8
5
+ source:
6
+ url: https://huggingface.co/codelion/sprog-9m
7
+ name: Symbolic verifier, 96-sample self-consistency (single committed answer)
8
+ user: codelion
9
+ notes: >-
10
+ Single-answer exact-match accuracy on the full GSM8K test set (1319 problems),
11
+ mean of 3 training seeds (range 11.1-12.6%). Inference: 96 temperature samples
12
+ per question, a 0-parameter symbolic verifier selects one committed answer via
13
+ self-consistency. Custom symbolic-program harness, not inspect-ai
14
+ model_graded_fact. 9.37M-param encoder-decoder, trained from scratch.