DavidL123
/

qwen-2.5-7b-SoftLabel

@@ -10,101 +10,46 @@ tags:
 - personality-prediction
 - big-five
 - bayesian-grm
-datasets:
-- custom
 ---
 # Qwen 2.5 7B SoftLabel
-LoRA adapter for Big Five personality prediction, fine-tuned with **KL divergence soft labels** from a Bayesian Graded Response Model (GRM) teacher.
-## Overview
-This adapter was trained as part of a dissertation on LLM-based personality assessment. Instead of standard cross-entropy with hard labels (which produces overconfident predictions), the model is trained to match the full posterior probability distribution from a Bayesian psychometric teacher model over 5-point Likert scale responses.
-| Property | Value |
-|----------|-------|
 | **Base model** | [Qwen 2.5 7B Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |
-| **Parameters** | 7B |
-| **Task** | Personality prediction (Big Five traits) |
-| **Loss function** | KL Divergence (soft labels) |
-| **Teacher** | Bayesian GRM with IPIP-50 items |
-| **Test accuracy** | TBD |
-| **Teacher ceiling** | 51.28% |
-## Training Details
 ### Data
-- **Training set**: 11,250 episodes (multi-turn personality questionnaire conversations)
-- **Validation set**: 1,250 episodes
-- **Test set**: 3,125 episodes
-- Each episode contains ~10 IPIP-50 questionnaire items with soft probability targets over responses 1-5
 ### Hyperparameters
-| Hyperparameter | Value |
-|----------------|-------|
-| LoRA rank (r) | 16 |
-| LoRA alpha | 16 |
-| LoRA dropout | 0.05 |
-| Target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
-| Learning rate | 1.5e-4 |
-| LR scheduler | Cosine |
-| Warmup steps | 100 |
-| Weight decay | 0.01 |
-| Optimizer | AdamW (fused) |
-| Max grad norm | 1.0 |
-| Precision | bf16 |
-| Per-GPU batch size | 2 |
-| Gradient accumulation | 4 |
-| Effective batch size | 64 (2 x 8 GPUs x 4) |
-| Max epochs | 3 (with early stopping) |
-| Early stopping patience | 5 evaluations |
 | Max sequence length | 4096 |
-| Seed | 3407 |
-### Training Results
-| Metric | Value |
-|--------|-------|
-| Best checkpoint | Step 1000 |
 | Best eval loss (KL div) | 0.000756 |
-| Final train loss | 0.000800 |
-| Training epochs completed | ~2.8 |
-### Infrastructure
-- **Hardware**: NVIDIA RTX A6000 48GB x 8
-- **Infrastructure**: University cluster (SLURM, 8x A6000)
-- **Attention**: SDPA (auto-dispatch to FlashAttention/math backend)
-- **Gradient checkpointing**: Enabled (use_reentrant=False)
-- **DDP**: 8-way data parallel (NCCL backend)
-## Usage
-```python
-from transformers import AutoModelForCausalLM, AutoTokenizer
-from peft import PeftModel
-base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
-model = PeftModel.from_pretrained(base_model, "DavidL123/qwen-2.5-7b-SoftLabel")
-tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-7B-Instruct")
-```
-## How It Works
-1. **Bayesian GRM Teacher**: A psychometric model estimates posterior probability distributions over 5-point Likert responses for each IPIP-50 personality item
-2. **Soft Labels**: Instead of hard labels (e.g., "the answer is 4"), the target is a full distribution (e.g., [0.02, 0.08, 0.15, 0.50, 0.25])
-3. **KL Divergence Loss**: The LLM is trained to minimize KL(teacher || student) over the 5 answer tokens, producing calibrated probability estimates
-4. **Evaluation**: At test time, the model's argmax prediction over the 5 tokens is compared to the teacher's argmax (teacher ceiling: 51.28%)
-## Citation
-```
-@misc{levy2026personality,
-  title={Personality Prediction via Soft-Label Fine-Tuning with Bayesian Psychometric Teachers},
-  author={David Levy},
-  year={2026},
-}
-```

 - personality-prediction
 - big-five
 - bayesian-grm
 ---
 # Qwen 2.5 7B SoftLabel
+LoRA adapter fine-tuned with **KL divergence** against soft probability distributions from a Bayesian Graded Response Model (GRM) teacher for Big Five personality prediction.
+Unlike standard cross-entropy (hard labels), KL divergence training preserves the teacher's uncertainty, producing calibrated probability estimates over 5-point Likert responses.
+## Training
+| | |
+|---|---|
 | **Base model** | [Qwen 2.5 7B Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) |
+| **Loss** | KL Divergence (batchmean) |
+| **Precision** | bf16 |
+| **Infrastructure** | University cluster (SLURM) — 4x NVIDIA RTX A6000 48GB |
 ### Data
+- 11,250 train / 1,250 valid / 3,125 test episodes
+- Each episode: multi-turn IPIP-50 personality questionnaire with soft label targets over responses 1–5
 ### Hyperparameters
+| | |
+|---|---|
+| LoRA r / alpha / dropout | 16 / 16 / 0.05 |
+| Target modules | q, k, v, o, gate, up, down proj |
+| Learning rate | 1.5e-4 (cosine schedule, 100 warmup steps) |
+| Effective batch size | 32 (2 per-GPU x 4 GPUs x 4 grad accum) |
+| Max epochs | 3 (early stopping, patience=5) |
+| Optimizer | AdamW fused (weight decay 0.01) |
 | Max sequence length | 4096 |
+### Results
+| | |
+|---|---|
 | Best eval loss (KL div) | 0.000756 |
+| Final train loss | 0.0008 |
+| Best checkpoint | Step 1000 |
+| Test accuracy | — |
+| Teacher ceiling | 51.28% |