--- license: apache-2.0 language: - en base_model: Qwen/Qwen3-8B pipeline_tag: text-generation library_name: transformers tags: - ablation-study - scientific-reasoning - post-training - qwen3 --- # ABForge-Qwen3-8B-Task2 An **ABForge** model for **Task 2: Ablation Plan Generation**. ABForge is a post-training pipeline for paper-grounded ablation design. This checkpoint is post-trained with the full ABForge pipeline: supervised fine-tuning from `Qwen/Qwen3-8B` followed by rubric-guided GRPO (**SFT → GRPO**). ## Task Given a paper's context and a goal, the model produces a detailed, controlled **ablation experiment design plan** (objective, setup, variants, fixed protocols and metrics). ## Training data SFT on `train/sft_task2_37019.jsonl`, then GRPO on `train/RL_task2_30K.jsonl`, from [`SlowGuess/abforge-data`](https://huggingface.co/datasets/SlowGuess/abforge-data) (derived from CC-licensed research papers). Evaluation uses the held-out **AblationBench** split (`eval/ablationbench_200.jsonl`) of the same dataset. ## Related models (Task 2) - [`SlowGuess/ABForge-Qwen3-8B-Task2`](https://huggingface.co/SlowGuess/ABForge-Qwen3-8B-Task2) (this model) - [`SlowGuess/ABForge-Qwen3-8B-Task2-SFT`](https://huggingface.co/SlowGuess/ABForge-Qwen3-8B-Task2-SFT) - [`SlowGuess/ABForge-Qwen3-8B-Task2-RL`](https://huggingface.co/SlowGuess/ABForge-Qwen3-8B-Task2-RL) ## Evaluation Reproduce AblationBench evaluation with the [`SlowGuess/Abforge_1`](https://github.com/SlowGuess/Abforge_1) code: ```bash git clone https://github.com/SlowGuess/Abforge_1 && cd Abforge_1 huggingface-cli download SlowGuess/abforge-data --repo-type dataset --local-dir data export MODEL_PATH=SlowGuess/ABForge-Qwen3-8B-Task2 # 1. Generate predictions on AblationBench python run_inference_local.py --task 2 \ --input data/eval/ablationbench_200.jsonl \ --output preds.jsonl \ --model-path "$MODEL_PATH" --dtype bf16 --max-new-tokens 4096 # 2. Score against the fixed AblationBench rubric (Claude judge) export ANTHROPIC_API_KEY= python scripts/eval_task2_claude_rubric_v2.py --input preds.jsonl --output scored.jsonl ``` ## Links - Dataset: [`SlowGuess/abforge-data`](https://huggingface.co/datasets/SlowGuess/abforge-data) - Code: [`SlowGuess/Abforge_1`](https://github.com/SlowGuess/Abforge_1) ## Citation ```bibtex @misc{abforge, title = {ABForge: A Post-Training Pipeline for Paper-Grounded Ablation Design}, author = {ABForge authors}, year = {2026}, howpublished = {\url{https://github.com/SlowGuess/Abforge_1}} } ```