NextTerm-440M / eval_results.txt
N8Programs's picture
Add model card and evaluation utilities
5db721a verified
Raw
History Blame Contribute Delete
1.42 kB
OEIS-Eval-Neo:
NextTerm-440M - 34.43%
NextTerm-47M - 29.49%
Qwen3-0.6B - 18.44%
Qwen3-1.7B - 20.77%
Qwen3-4B - 23.74%
Qwen3-8B - 24.62%
Qwen3-14B - 26.00%
Ryskina & Knight (2021):
NextTerm-440M - 52.63%
NextTerm-47M - 70.18%
16-Shot Bitstring:
NextTerm-440M - 32.00%
NextTerm-47M - 27.88%
M1 Competition 111 MAPE (macro; canonical greedy; lower is better):
Naive2 - 17.7987
NextTerm-440M - 17.6239
NextTerm-47M - 18.7621
Qwen3-0.6B - 22.7984
Qwen3-1.7B - 22.2411
Qwen3-4B - 19.1731
Qwen3-8B - 18.4027
Qwen3-14B - 17.9837
M1 Competition 111 by frequency (macro MAPE):
Naive2 - monthly 17.3871 / quarterly 19.5958 / yearly 17.1314
NextTerm-440M - monthly 18.3407 / quarterly 19.0475 / yearly 13.5498
NextTerm-47M - monthly 21.2719 / quarterly 16.0270 / yearly 13.3741
Qwen3-0.6B - monthly 24.7585 / quarterly 21.7989 / yearly 17.2835
Qwen3-1.7B - monthly 24.3821 / quarterly 22.5869 / yearly 14.5642
Qwen3-4B - monthly 20.6455 / quarterly 19.6394 / yearly 13.6308
Qwen3-8B - monthly 20.0289 / quarterly 17.7249 / yearly 13.6534
Qwen3-14B - monthly 19.4006 / quarterly 18.1729 / yearly 12.9486
Polynomial continuation evals (accuracy; 200 samples per k):
Arithmetic (k=2-25): NextTerm-440M - 94.38%; NextTerm-47M - 94.15%
Quadratic (k=3-25): NextTerm-440M - 86.39%; NextTerm-47M - 81.07%
Cubic (k=4-25): NextTerm-440M - 75.20%; NextTerm-47M - 37.43%
Quartic (k=5-25): NextTerm-440M - 67.83%; NextTerm-47M - 15.17%