Question Answering
MLX
sprog
math
gsm8k
symbolic
math-word-problems
from-scratch
Eval Results (legacy)
Eval Results
Instructions to use codelion/sprog-9m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use codelion/sprog-9m with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir sprog-9m codelion/sprog-9m
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
Add GSM8K eval result (self-reported, symbolic verifier)
#1
by codelion - opened
- .eval_results/gsm8k.yaml +14 -0
.eval_results/gsm8k.yaml
ADDED
|
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
- dataset:
|
| 2 |
+
id: openai/gsm8k
|
| 3 |
+
task_id: gsm8k
|
| 4 |
+
value: 11.8
|
| 5 |
+
source:
|
| 6 |
+
url: https://huggingface.co/codelion/sprog-9m
|
| 7 |
+
name: Symbolic verifier, 96-sample self-consistency (single committed answer)
|
| 8 |
+
user: codelion
|
| 9 |
+
notes: >-
|
| 10 |
+
Single-answer exact-match accuracy on the full GSM8K test set (1319 problems),
|
| 11 |
+
mean of 3 training seeds (range 11.1-12.6%). Inference: 96 temperature samples
|
| 12 |
+
per question, a 0-parameter symbolic verifier selects one committed answer via
|
| 13 |
+
self-consistency. Custom symbolic-program harness, not inspect-ai
|
| 14 |
+
model_graded_fact. 9.37M-param encoder-decoder, trained from scratch.
|