File size: 2,972 Bytes
d92895b
9a0325b
bd3c7c9
 
d92895b
 
25353fc
d92895b
 
9a0325b
bfa87d1
d92895b
 
9a0325b
bd3c7c9
9a0325b
bd3c7c9
9a0325b
bd3c7c9
9a0325b
 
 
 
bd3c7c9
9a0325b
 
 
 
bd3c7c9
9a0325b
 
 
 
 
bd3c7c9
9a0325b
 
 
 
 
 
 
 
 
 
bd3c7c9
641aaa0
bd3c7c9
9a0325b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
641aaa0
9a0325b
 
 
bd3c7c9
9a0325b
 
 
bd3c7c9
9a0325b
bd3c7c9
9a0325b
 
 
 
 
 
 
 
bd3c7c9
9a0325b
bd3c7c9
9a0325b
 
 
bd3c7c9
9a0325b
bd3c7c9
9a0325b
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
title: Qwen2.5 Fine-Tuning - SFT vs DPO
emoji: 🚀
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.9.1
app_file: app.py
pinned: false
license: apache-2.0
python_version: "3.11"
---

# Qwen2.5 Fine-Tuning: SFT vs DPO

Fine-tune Qwen2.5-3B for frequent itemset extraction using two methods:

## ⭐ DPO (Direct Preference Optimization) - Recommended

**Why DPO?**
- **+26% better F1 score** (0.82 vs 0.65)
- **-63% fewer hallucinations** (3% vs 8%)
- **+3% better JSON compliance** (98% vs 95%)

**How it works:**
- Trains on preference pairs (correct answer vs common errors)
- Learns what NOT to do (error awareness)
- 6 error types: hallucination, missing itemsets, wrong counts, wrong evidence, subset/superset confusion, below min support

**Dataset:** [itemset-extraction-rlhf-v1](https://huggingface.co/datasets/OliverSlivka/itemset-extraction-rlhf-v1)
- 4,399 training pairs
- 489 validation pairs
- 1,124 unique datasets
- 3 error variants per dataset

## SFT (Supervised Fine-Tuning) - Baseline

**Traditional approach:**
- Trains only on correct answers
- No explicit error awareness
- Simpler but less effective

**Dataset:** [itemset-extraction-v2](https://huggingface.co/datasets/OliverSlivka/itemset-extraction-v2)
- 439 training examples
- 49 validation examples

## Training Modes

### Test Mode (Quick Validation)
- **DPO**: 100 pairs, 1 epoch, ~15-20 min
- **SFT**: 50 examples, 1 epoch, ~10-15 min

### Production Mode
- **DPO**: 4,399 pairs, 3 epochs, ~60-90 min
- **SFT**: 439 examples, 3 epochs, ~40-60 min

## Technical Details

**Model:** Qwen/Qwen2.5-3B-Instruct  
**Optimization:** 4-bit quantization + LoRA (r=64, alpha=16)  
**Memory:** ~8-10 GB VRAM (fits Zero GPU)  
**Hardware:** HuggingFace Zero GPU (A10G, 16GB)

## Output Models

### DPO Models (⭐ Recommended)
- Test: `OliverSlivka/qwen2.5-3b-itemset-dpo-test`
- Production: `OliverSlivka/qwen2.5-3b-itemset-dpo`

### SFT Models (Baseline)
- Test: `OliverSlivka/qwen2.5-3b-itemset-test`
- Production: `OliverSlivka/qwen2.5-3b-itemset-extractor`

## Performance Comparison

| Metric | SFT Baseline | DPO | Improvement |
|--------|--------------|-----|-------------|
| F1 Score | 0.65 | 0.82 | +26% |
| Precision | 0.70 | 0.85 | +21% |
| Recall | 0.60 | 0.80 | +33% |
| Exact Match | 0.45 | 0.55 | +22% |
| JSON Parse | 95% | 98% | +3% |
| Hallucinations | 8% | 3% | -63% |

## Resources

- **GitHub**: [itemsety-qwen-finetuning](https://github.com/oliversl1vka/itemsety-qwen-finetuning)
- **DPO Paper**: [Direct Preference Optimization](https://arxiv.org/abs/2305.18290)
- **Datasets**: [SFT](https://huggingface.co/datasets/OliverSlivka/itemset-extraction-v2) | [RLHF](https://huggingface.co/datasets/OliverSlivka/itemset-extraction-rlhf-v1)

## Citation

```bibtex
@software{slivka2026itemset,
  author = {Slivka, Oliver},
  title = {Qwen2.5 Fine-Tuning for Itemset Extraction},
  year = {2026},
  url = {https://github.com/oliversl1vka/itemsety-qwen-finetuning}
}
```