KEURAL-ALPHA-V2 MODEL BENCHMARK REPORT

Model: mkd-ai/keural-alpha-v2 Parameters: ~1B (GPT-NeoX architecture) Size: 2.03 GB Date: 2026-03-03 Device: CUDA (GPU) Benchmark Tool: lm-eval (EleutherAI LM Evaluation Harness) Version: v2 (Updated 27 days ago)

TASK RESULTS

Task	Accuracy	Norm.Acc	Stderr	Samples	vs Alpha	Status
arc_challenge	24.23%	27.22%	±1.25%	1,172	-0.51%	Poor
arc_easy	54.92%	48.48%	±1.02%	2,376	-1.56%	Good
hellaswag	37.86%	47.58%	±0.48%	10,042	-0.51%	Decent
winogrande	52.80%	-	±1.40%	1,267	-0.48%	Good

Alpha vs V2 COMPARISON

Task	Alpha Score	V2 Score	Change	Verdict
arc_challenge	24.74%	24.23%	-0.51%	Slightly worse
arc_easy	56.48%	54.92%	-1.56%	Worse
hellaswag	38.37%	37.86%	-0.51%	Slightly worse
winogrande	53.28%	52.80%	-0.48%	Slightly worse

OVERALL AVERAGE: 43.22% (Alpha) → 42.45% (V2) = -0.77%

SUMMARY STATISTICS

Overall Average Accuracy: 42.45% Average vs Random Baseline: +17.45% Best Performance: ARC-Easy (54.92%) Worst Performance: ARC-Challenge (24.23%) Standard Deviation: ±12.34%

= PERFORMANCE ANALYSIS

STRENGTHS: Maintains good performance on easy reasoning (ARC-Easy: 55%) Decent commonsense reasoning (Winogrande: 53%) Competitive with other 1B models

WEAKNESSES: Worse than Alpha on all benchmarks Still struggles with difficult science (ARC-C: 24%) No significant improvement over previous version

COMPARISON TO BASELINES

Model	Size	ARC-E	ARC-C	Hella	Wino	Avg
keural-alpha	1.0B	56.5%	24.7%	48.9%	53.3%	43.2%
keural-alpha-v2	1.0B	54.9%	24.2%	47.6%	52.8%	42.4%
GPT-2	0.1B	48.0%	22.0%	40.0%	50.0%	38.0%
GPT-Neo 1.3B	1.3B	52.0%	26.0%	48.0%	52.0%	44.5%

Rank: 4th out of 4 (below V1 and GPT-Neo)

FINAL VERDICT

OVERALL GRADE: B

Response Quality: (3/5) Reasoning Ability: (3/5) Knowledge: (2/5)

-- SYSTEM_PROMPT = ( "You are Keural Alpha, an AI assistant developed by MKD Corp in South Korea.\n" "STRICT RULES:\n" "- Speak ONLY as the assistant.\n" "- Do NOT generate User messages.\n" "- Do NOT role-play or simulate conversations.\n" "- Do NOT invent names, identities, jobs, or emotions.\n" "- Do NOT ask questions.\n" "- Respond with ONE concise answer only.\n" "- Maximum 5 sentences.\n" "- Use English only.\n" "- If the user greets you, respond briefly only.\n" )

MEMORY SETTINGS

MAX_TURNS = 3 # keep last 3 user/assistant pairs only

messages = [ {"role": "system", "content": SYSTEM_PROMPT} ]

Repeated-input handling

last_user_input = None repeat_index = 0

REPEAT_FALLBACKS = [ "Please let me know what you would like help with.", "I'm here whenever you're ready to ask something.", "Feel free to ask a question.", "You can ask me anything when you're ready.", "Let me know what you’d like to know." ]

-- payload = { "model": MODEL, "messages": messages, "max_tokens": 120, "temperature": 0.35,

    #  STOP ROLE CONTINUATION
    "stop": ["\nUser:", "\nSystem:", "\nAssistant:"],

    # repetition control
    "repetition_penalty": 1.15,
    "frequency_penalty": 0.3,
    "presence_penalty": 0.2
}

Downloads last month: 4

Safetensors

Model size

1B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mkd-ai/keural-alpha-v2

Quantizations

1 model