Debate GRPO Iter2 Group B - Checkpoint 300
LoRA checkpoint from offline GRPO training on debate structure tasks (SKELETON_BUILD, EVIDENCE_SELECT).
Training Details
- Base Model: Qwen/Qwen2.5-7B-Instruct (merged with Group A improvements)
- Training Type: Offline GRPO with precomputed logprobs
- Group: B (Structure - SKELETON_BUILD, EVIDENCE_SELECT)
- Checkpoint: Step 300, Epoch 2
- Samples: 4,157
- Total Steps Planned: 825 (5 epochs)
Metrics at Checkpoint
- Loss: -0.0145
- Accuracy: 50.0%
- Clip Fraction: 10.0%
- Mean Ratio: 0.834
Resume Instructions
from peft import PeftModel
from transformers import AutoModelForCausalLM
# Load base (canonical with Group A merged)
base = AutoModelForCausalLM.from_pretrained("path/to/canonical")
model = PeftModel.from_pretrained(base, "debaterhub/debate-grpo-iter2-groupB-checkpoint")
Framework versions
- PEFT 0.15.2
- Transformers 4.47.1
- PyTorch 2.5.1
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support