Debate Qwen 32B - Iter3 GRPO-D

Fine-tuned Qwen2.5-32B-Instruct for IPDA (International Public Debate Association) debate.

Model Description

This is iteration 3 of our debate model, trained through multiple rounds of:

  • SFT (Supervised Fine-Tuning) on high-quality debate samples
  • GRPO (Group Relative Policy Optimization) on debate performance metrics

Training Pipeline

Base: Qwen2.5-32B-Instruct
  └── GRPO Group B (evidence/impacts)
      └── SFT Group C (warrant/clash)
          └── GRPO Group C
              └── SFT Group D (theory/framework)
                  └── GRPO Group D ← This checkpoint

Performance

Trained on IPDA debate rubric scores across 5 speech types:

  • AC (Affirmative Constructive)
  • NC (Negative Constructive)
  • 1AR (First Affirmative Rebuttal)
  • NR (Negative Rebuttal)
  • 2AR (Second Affirmative Rebuttal)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("dgonier/debate-qwen-32b-iter3-grpoD")
tokenizer = AutoTokenizer.from_pretrained("dgonier/debate-qwen-32b-iter3-grpoD")

Or with vLLM:

vllm serve dgonier/debate-qwen-32b-iter3-grpoD --tensor-parallel-size 4

Related

Downloads last month
2
Safetensors
Model size
31B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for dgonier/debate-qwen-32b-iter3-grpoD

Base model

Qwen/Qwen2.5-32B
Finetuned
(1216)
this model