KEURAL-ALPHA MODEL BENCHMARK REPORT

Model: mkd-ai/keural-alpha

Parameters: ~1B (GPT-NeoX architecture)

Date: 2026-03-03

Device: CUDA (GPU)

Benchmark: lm-eval (EleutherAI LM Evaluation Harness)

TASK RESULTS

Task	Accuracy	Norm.Acc	Stderr	Samples	Assessment
arc_challenge	24.74%	27.47%	±1.26%	1,172	Poor
arc_easy	56.48%	49.66%	±1.02%	2,376	Good
hellaswag	38.37%	48.89%	±0.49%	10,042	Decent
winogrande	53.28%	-	±1.40%	1,267	Good

SUMMARY STATISTICS

Overall Average Accuracy: 43.22% Average vs Random Baseline: +18.22% Best Performance: ARC-Easy (56.48%) Worst Performance: ARC-Challenge (24.74%) Standard Deviation: ±12.87%

PERFORMANCE ANALYSIS

STRENGTHS: Strong on easy reasoning tasks (ARC-Easy: 56%) Good commonsense reasoning (Winogrande: 53%) Decent sentence completion (Hellaswag: 49% norm)

WEAKNESSES: Struggles with difficult science questions (ARC-Challenge: 25%) Below random on challenging reasoning High variance across tasks (12.87% std)

COMPARISON TO BASELINES

Model	Size	ARC-E	ARC-C	Hella	Wino	Avg
keural-alpha	1.0B	56.5%	24.7%	48.9%	53.3%	43.2%
Random Guess	-	25.0%	25.0%	25.0%	50.0%	25.0%
GPT-2	0.1B	48.0%	22.0%	40.0%	50.0%	38.0%
GPT-Neo 1.3B	1.3B	52.0%	26.0%	48.0%	52.0%	44.5%

Rank: 3rd out of 4 (beats GPT-2, below GPT-Neo 1.3B)

FINAL VERDICT

OVERALL GRADE: B (Good for model size)

Response Quality: (3/5) Reasoning Ability: (3/5) Knowledge: (2/5) Speed/Efficiency: (4/5)

RECOMMENDATION: Suitable for: Simple Q&A, basic reasoning, educational contexts Not suitable for: Complex science, difficult reasoning, expert domains

================================================================================

Downloads last month: 127

Safetensors

Model size

1B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mkd-ai/keural-alpha

Unable to build the model tree, the base model loops to the model itself. Learn more.