mlp-surgery-broken / README.md
Malum0x's picture
Initial upload β€” model + card
4af0ea1 verified
metadata
license: apache-2.0
base_model: Qwen/Qwen2.5-3B-Instruct
tags:
  - mlp-surgery
  - finetuned
  - reasoning
language:
  - en
datasets:
  - Malum0x/openhermes2.5-Perplexity_filtered_top30

mlp-surgery β€” broken baseline (Qwen2.5-3B)

The "broken" baseline used as input for the mlp-surgery project. Don't use this model for downstream tasks β€” it underperforms the base model on both math and general reasoning. It's published only so the experiment is reproducible.

What it is

Qwen2.5-3B-Instruct + LoRA fine-tune on the perplexity-filtered top-30% of OpenHermes 2.5 (from the sister project Perplexity-weighted-selective-finetuning), merged into the base weights.

Eval

lm-eval, GSM8K flexible-extract 5-shot, ARC Challenge acc_norm 0-shot, no chat template, batch_size 8, single seed (2026-05-07).

Model GSM8K ARC Challenge
Base (Qwen2.5-3B-Instruct) 63.15% 48.12%
After SFT (broken) 61.64% 45.22%
Restore top 5 63.00% 45.73%
Restore top 15 63.46% 46.50%
Restore top 30 64.29% 48.55%
Restore specificity top 10 61.64% 45.22%

This model is the "After SFT (broken)" row.

Companion models

Code: https://github.com/Malum0x/mlp-surgery