KEURAL-ALPHA-V2 MODEL BENCHMARK REPORT

Model: mkd-ai/keural-alpha-v2 Parameters: ~1B (GPT-NeoX architecture) Size: 2.03 GB Date: 2026-03-03 Device: CUDA (GPU) Benchmark Tool: lm-eval (EleutherAI LM Evaluation Harness) Version: v2 (Updated 27 days ago)

TASK RESULTS

Task Accuracy Norm.Acc Stderr Samples vs Alpha Status
arc_challenge 24.23% 27.22% ±1.25% 1,172 -0.51% Poor
arc_easy 54.92% 48.48% ±1.02% 2,376 -1.56% Good
hellaswag 37.86% 47.58% ±0.48% 10,042 -0.51% Decent
winogrande 52.80% - ±1.40% 1,267 -0.48% Good

Alpha vs V2 COMPARISON

Task Alpha Score V2 Score Change Verdict
arc_challenge 24.74% 24.23% -0.51% Slightly worse
arc_easy 56.48% 54.92% -1.56% Worse
hellaswag 38.37% 37.86% -0.51% Slightly worse
winogrande 53.28% 52.80% -0.48% Slightly worse

OVERALL AVERAGE: 43.22% (Alpha) → 42.45% (V2) = -0.77%

SUMMARY STATISTICS

Overall Average Accuracy: 42.45% Average vs Random Baseline: +17.45% Best Performance: ARC-Easy (54.92%) Worst Performance: ARC-Challenge (24.23%) Standard Deviation: ±12.34%

= PERFORMANCE ANALYSIS

STRENGTHS: Maintains good performance on easy reasoning (ARC-Easy: 55%) Decent commonsense reasoning (Winogrande: 53%) Competitive with other 1B models

WEAKNESSES: Worse than Alpha on all benchmarks Still struggles with difficult science (ARC-C: 24%) No significant improvement over previous version

COMPARISON TO BASELINES

Model Size ARC-E ARC-C Hella Wino Avg
keural-alpha 1.0B 56.5% 24.7% 48.9% 53.3% 43.2%
keural-alpha-v2 1.0B 54.9% 24.2% 47.6% 52.8% 42.4%
GPT-2 0.1B 48.0% 22.0% 40.0% 50.0% 38.0%
GPT-Neo 1.3B 1.3B 52.0% 26.0% 48.0% 52.0% 44.5%

image

Rank: 4th out of 4 (below V1 and GPT-Neo)

FINAL VERDICT

OVERALL GRADE: B

Response Quality: (3/5) Reasoning Ability: (3/5) Knowledge: (2/5)

-- SYSTEM_PROMPT = ( "You are Keural Alpha, an AI assistant developed by MKD Corp in South Korea.\n" "STRICT RULES:\n" "- Speak ONLY as the assistant.\n" "- Do NOT generate User messages.\n" "- Do NOT role-play or simulate conversations.\n" "- Do NOT invent names, identities, jobs, or emotions.\n" "- Do NOT ask questions.\n" "- Respond with ONE concise answer only.\n" "- Maximum 5 sentences.\n" "- Use English only.\n" "- If the user greets you, respond briefly only.\n" )

MEMORY SETTINGS

MAX_TURNS = 3 # keep last 3 user/assistant pairs only

messages = [ {"role": "system", "content": SYSTEM_PROMPT} ]

Repeated-input handling

last_user_input = None repeat_index = 0

REPEAT_FALLBACKS = [ "Please let me know what you would like help with.", "I'm here whenever you're ready to ask something.", "Feel free to ask a question.", "You can ask me anything when you're ready.", "Let me know what you’d like to know." ]

-- payload = { "model": MODEL, "messages": messages, "max_tokens": 120, "temperature": 0.35,

    #  STOP ROLE CONTINUATION
    "stop": ["\nUser:", "\nSystem:", "\nAssistant:"],

    # repetition control
    "repetition_penalty": 1.15,
    "frequency_penalty": 0.3,
    "presence_penalty": 0.2
}

--

Downloads last month
30
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mkd-ai/keural-alpha-v2

Quantizations
1 model