Debate Qwen 32B - Iter3 GRPO-D
Fine-tuned Qwen2.5-32B-Instruct for IPDA (International Public Debate Association) debate.
Model Description
This is iteration 3 of our debate model, trained through multiple rounds of:
- SFT (Supervised Fine-Tuning) on high-quality debate samples
- GRPO (Group Relative Policy Optimization) on debate performance metrics
Training Pipeline
Base: Qwen2.5-32B-Instruct
βββ GRPO Group B (evidence/impacts)
βββ SFT Group C (warrant/clash)
βββ GRPO Group C
βββ SFT Group D (theory/framework)
βββ GRPO Group D β This checkpoint
Performance
Trained on IPDA debate rubric scores across 5 speech types:
- AC (Affirmative Constructive)
- NC (Negative Constructive)
- 1AR (First Affirmative Rebuttal)
- NR (Negative Rebuttal)
- 2AR (Second Affirmative Rebuttal)
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("dgonier/debate-qwen-32b-iter3-grpoD")
tokenizer = AutoTokenizer.from_pretrained("dgonier/debate-qwen-32b-iter3-grpoD")
Or with vLLM:
vllm serve dgonier/debate-qwen-32b-iter3-grpoD --tensor-parallel-size 4
Related
- Dataset: dgonier/ipda-2ar-golden-samples
- Training data: dgonier/ipda-judge-adaptation-grpo
- Downloads last month
- 2