Qwen 2.5 7B SoftLabel
LoRA adapter fine-tuned with KL divergence against soft probability distributions from a Bayesian Graded Response Model (GRM) teacher for Big Five personality prediction.
Unlike standard cross-entropy (hard labels), KL divergence training preserves the teacher's uncertainty, producing calibrated probability estimates over 5-point Likert responses.
Training
|
|
| Base model |
Qwen 2.5 7B Instruct |
| Loss |
KL Divergence (batchmean) |
| Precision |
bf16 |
| Infrastructure |
University cluster (SLURM) โ 4x NVIDIA RTX A6000 48GB |
Data
- 11,250 train / 1,250 valid / 3,125 test episodes
- Each episode: multi-turn IPIP-50 personality questionnaire with soft label targets over responses 1โ5
Hyperparameters
|
|
| LoRA r / alpha / dropout |
16 / 16 / 0.05 |
| Target modules |
q, k, v, o, gate, up, down proj |
| Learning rate |
1.5e-4 (cosine schedule, 100 warmup steps) |
| Effective batch size |
32 (2 per-GPU x 4 GPUs x 4 grad accum) |
| Max epochs |
3 (early stopping, patience=5) |
| Optimizer |
AdamW fused (weight decay 0.01) |
| Max sequence length |
4096 |
Results
|
|
| Best eval loss (KL div) |
0.000756 |
| Final train loss |
0.0008 |
| Best checkpoint |
Step 1000 |
| Test accuracy |
โ |
| Teacher ceiling |
51.28% |