Debate ORPO Iteration 12

LoRA adapter for IPDA (International Public Debate Association) style debate generation.

Model Description

This is the final iteration (12) of iterative ORPO training for debate. The model generates complete IPDA debates including:

Affirmative Constructive (AC)
Negative Constructive (NC)
Cross-Examination (CX)
Rebuttals (1AR, 1NR, 2AR, 2NR)

Base Model: Qwen/Qwen3-30B-A3B

Training Details

Method: Iterative ORPO (Odds Ratio Preference Optimization)
Iterations: 12 rounds of self-improvement
Judge: Claude Sonnet 4 with debate rubric
LoRA Rank: 32
LoRA Alpha: 64
Target Modules: q_proj, k_proj, v_proj, o_proj

Training Progression

Iteration	Mean Score	Best Score	Zero Rate	Pairs
1	0.198	0.85	18.2%	644
6	0.295	0.91	14.2%	64
8	0.303	0.93	13.8%	66
12	0.285	0.89	13.5%	228

Mean score improved from 0.198 to 0.303 (+53%)
Zero-score rate decreased from 18.2% to 13.5%
2,600 discovered arguments in argument book

Usage

from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load base model
base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-30B-A3B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-30B-A3B")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "debaterhub/debate-orpo-iter12")