KEURAL-ALPHA-V2 MODEL BENCHMARK REPORT
Model: mkd-ai/keural-alpha-v2 Parameters: ~1B (GPT-NeoX architecture) Size: 2.03 GB Date: 2026-03-03 Device: CUDA (GPU) Benchmark Tool: lm-eval (EleutherAI LM Evaluation Harness) Version: v2 (Updated 27 days ago)
TASK RESULTS
| Task | Accuracy | Norm.Acc | Stderr | Samples | vs Alpha | Status |
|---|---|---|---|---|---|---|
| arc_challenge | 24.23% | 27.22% | ±1.25% | 1,172 | -0.51% | Poor |
| arc_easy | 54.92% | 48.48% | ±1.02% | 2,376 | -1.56% | Good |
| hellaswag | 37.86% | 47.58% | ±0.48% | 10,042 | -0.51% | Decent |
| winogrande | 52.80% | - | ±1.40% | 1,267 | -0.48% | Good |
Alpha vs V2 COMPARISON
| Task | Alpha Score | V2 Score | Change | Verdict |
|---|---|---|---|---|
| arc_challenge | 24.74% | 24.23% | -0.51% | Slightly worse |
| arc_easy | 56.48% | 54.92% | -1.56% | Worse |
| hellaswag | 38.37% | 37.86% | -0.51% | Slightly worse |
| winogrande | 53.28% | 52.80% | -0.48% | Slightly worse |
OVERALL AVERAGE: 43.22% (Alpha) → 42.45% (V2) = -0.77%
SUMMARY STATISTICS
Overall Average Accuracy: 42.45% Average vs Random Baseline: +17.45% Best Performance: ARC-Easy (54.92%) Worst Performance: ARC-Challenge (24.23%) Standard Deviation: ±12.34%
= PERFORMANCE ANALYSIS
STRENGTHS: Maintains good performance on easy reasoning (ARC-Easy: 55%) Decent commonsense reasoning (Winogrande: 53%) Competitive with other 1B models
WEAKNESSES: Worse than Alpha on all benchmarks Still struggles with difficult science (ARC-C: 24%) No significant improvement over previous version
COMPARISON TO BASELINES
| Model | Size | ARC-E | ARC-C | Hella | Wino | Avg |
|---|---|---|---|---|---|---|
| keural-alpha | 1.0B | 56.5% | 24.7% | 48.9% | 53.3% | 43.2% |
| keural-alpha-v2 | 1.0B | 54.9% | 24.2% | 47.6% | 52.8% | 42.4% |
| GPT-2 | 0.1B | 48.0% | 22.0% | 40.0% | 50.0% | 38.0% |
| GPT-Neo 1.3B | 1.3B | 52.0% | 26.0% | 48.0% | 52.0% | 44.5% |
Rank: 4th out of 4 (below V1 and GPT-Neo)
FINAL VERDICT
OVERALL GRADE: B
Response Quality: (3/5) Reasoning Ability: (3/5) Knowledge: (2/5)
-- SYSTEM_PROMPT = ( "You are Keural Alpha, an AI assistant developed by MKD Corp in South Korea.\n" "STRICT RULES:\n" "- Speak ONLY as the assistant.\n" "- Do NOT generate User messages.\n" "- Do NOT role-play or simulate conversations.\n" "- Do NOT invent names, identities, jobs, or emotions.\n" "- Do NOT ask questions.\n" "- Respond with ONE concise answer only.\n" "- Maximum 5 sentences.\n" "- Use English only.\n" "- If the user greets you, respond briefly only.\n" )
MEMORY SETTINGS
MAX_TURNS = 3 # keep last 3 user/assistant pairs only
messages = [ {"role": "system", "content": SYSTEM_PROMPT} ]
Repeated-input handling
last_user_input = None repeat_index = 0
REPEAT_FALLBACKS = [ "Please let me know what you would like help with.", "I'm here whenever you're ready to ask something.", "Feel free to ask a question.", "You can ask me anything when you're ready.", "Let me know what you’d like to know." ]
-- payload = { "model": MODEL, "messages": messages, "max_tokens": 120, "temperature": 0.35,
# STOP ROLE CONTINUATION
"stop": ["\nUser:", "\nSystem:", "\nAssistant:"],
# repetition control
"repetition_penalty": 1.15,
"frequency_penalty": 0.3,
"presence_penalty": 0.2
}
--
- Downloads last month
- 30
