KEURAL-ALPHA MODEL BENCHMARK REPORT
Model: mkd-ai/keural-alpha
Parameters: ~1B (GPT-NeoX architecture)
Date: 2026-03-03
Device: CUDA (GPU)
Benchmark: lm-eval (EleutherAI LM Evaluation Harness)
TASK RESULTS
| Task | Accuracy | Norm.Acc | Stderr | Samples | Assessment |
|---|---|---|---|---|---|
| arc_challenge | 24.74% | 27.47% | ±1.26% | 1,172 | Poor |
| arc_easy | 56.48% | 49.66% | ±1.02% | 2,376 | Good |
| hellaswag | 38.37% | 48.89% | ±0.49% | 10,042 | Decent |
| winogrande | 53.28% | - | ±1.40% | 1,267 | Good |
SUMMARY STATISTICS
Overall Average Accuracy: 43.22% Average vs Random Baseline: +18.22% Best Performance: ARC-Easy (56.48%) Worst Performance: ARC-Challenge (24.74%) Standard Deviation: ±12.87%
PERFORMANCE ANALYSIS
STRENGTHS: Strong on easy reasoning tasks (ARC-Easy: 56%) Good commonsense reasoning (Winogrande: 53%) Decent sentence completion (Hellaswag: 49% norm)
WEAKNESSES: Struggles with difficult science questions (ARC-Challenge: 25%) Below random on challenging reasoning High variance across tasks (12.87% std)
COMPARISON TO BASELINES
| Model | Size | ARC-E | ARC-C | Hella | Wino | Avg |
|---|---|---|---|---|---|---|
| keural-alpha | 1.0B | 56.5% | 24.7% | 48.9% | 53.3% | 43.2% |
| Random Guess | - | 25.0% | 25.0% | 25.0% | 50.0% | 25.0% |
| GPT-2 | 0.1B | 48.0% | 22.0% | 40.0% | 50.0% | 38.0% |
| GPT-Neo 1.3B | 1.3B | 52.0% | 26.0% | 48.0% | 52.0% | 44.5% |
Rank: 3rd out of 4 (beats GPT-2, below GPT-Neo 1.3B)
FINAL VERDICT
OVERALL GRADE: B (Good for model size)
Response Quality: (3/5) Reasoning Ability: (3/5) Knowledge: (2/5) Speed/Efficiency: (4/5)
RECOMMENDATION: Suitable for: Simple Q&A, basic reasoning, educational contexts Not suitable for: Complex science, difficult reasoning, expert domains
================================================================================
- Downloads last month
- 93
Model tree for mkd-ai/keural-alpha
Unable to build the model tree, the base model loops to the model itself. Learn more.
