KEURAL-ALPHA MODEL BENCHMARK REPORT

Model: mkd-ai/keural-alpha

Parameters: ~1B (GPT-NeoX architecture)

Date: 2026-03-03

Device: CUDA (GPU)

Benchmark: lm-eval (EleutherAI LM Evaluation Harness)

TASK RESULTS

Task Accuracy Norm.Acc Stderr Samples Assessment
arc_challenge 24.74% 27.47% ±1.26% 1,172 Poor
arc_easy 56.48% 49.66% ±1.02% 2,376 Good
hellaswag 38.37% 48.89% ±0.49% 10,042 Decent
winogrande 53.28% - ±1.40% 1,267 Good

SUMMARY STATISTICS

Overall Average Accuracy: 43.22% Average vs Random Baseline: +18.22% Best Performance: ARC-Easy (56.48%) Worst Performance: ARC-Challenge (24.74%) Standard Deviation: ±12.87%

PERFORMANCE ANALYSIS

STRENGTHS: Strong on easy reasoning tasks (ARC-Easy: 56%) Good commonsense reasoning (Winogrande: 53%) Decent sentence completion (Hellaswag: 49% norm)

WEAKNESSES: Struggles with difficult science questions (ARC-Challenge: 25%) Below random on challenging reasoning High variance across tasks (12.87% std)

COMPARISON TO BASELINES

Model Size ARC-E ARC-C Hella Wino Avg
keural-alpha 1.0B 56.5% 24.7% 48.9% 53.3% 43.2%
Random Guess - 25.0% 25.0% 25.0% 50.0% 25.0%
GPT-2 0.1B 48.0% 22.0% 40.0% 50.0% 38.0%
GPT-Neo 1.3B 1.3B 52.0% 26.0% 48.0% 52.0% 44.5%

image

Rank: 3rd out of 4 (beats GPT-2, below GPT-Neo 1.3B)

FINAL VERDICT

OVERALL GRADE: B (Good for model size)

Response Quality: (3/5) Reasoning Ability: (3/5) Knowledge: (2/5) Speed/Efficiency: (4/5)

RECOMMENDATION: Suitable for: Simple Q&A, basic reasoning, educational contexts Not suitable for: Complex science, difficult reasoning, expert domains

================================================================================

Downloads last month
93
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mkd-ai/keural-alpha

Unable to build the model tree, the base model loops to the model itself. Learn more.