Spaces:
Paused
Paused
| title: Qwen2.5 Fine-Tuning - SFT vs DPO | |
| emoji: 🚀 | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.9.1 | |
| app_file: app.py | |
| pinned: false | |
| license: apache-2.0 | |
| python_version: "3.11" | |
| # Qwen2.5 Fine-Tuning: SFT vs DPO | |
| Fine-tune Qwen2.5-3B for frequent itemset extraction using two methods: | |
| ## ⭐ DPO (Direct Preference Optimization) - Recommended | |
| **Why DPO?** | |
| - **+26% better F1 score** (0.82 vs 0.65) | |
| - **-63% fewer hallucinations** (3% vs 8%) | |
| - **+3% better JSON compliance** (98% vs 95%) | |
| **How it works:** | |
| - Trains on preference pairs (correct answer vs common errors) | |
| - Learns what NOT to do (error awareness) | |
| - 6 error types: hallucination, missing itemsets, wrong counts, wrong evidence, subset/superset confusion, below min support | |
| **Dataset:** [itemset-extraction-rlhf-v1](https://huggingface.co/datasets/OliverSlivka/itemset-extraction-rlhf-v1) | |
| - 4,399 training pairs | |
| - 489 validation pairs | |
| - 1,124 unique datasets | |
| - 3 error variants per dataset | |
| ## SFT (Supervised Fine-Tuning) - Baseline | |
| **Traditional approach:** | |
| - Trains only on correct answers | |
| - No explicit error awareness | |
| - Simpler but less effective | |
| **Dataset:** [itemset-extraction-v2](https://huggingface.co/datasets/OliverSlivka/itemset-extraction-v2) | |
| - 439 training examples | |
| - 49 validation examples | |
| ## Training Modes | |
| ### Test Mode (Quick Validation) | |
| - **DPO**: 100 pairs, 1 epoch, ~15-20 min | |
| - **SFT**: 50 examples, 1 epoch, ~10-15 min | |
| ### Production Mode | |
| - **DPO**: 4,399 pairs, 3 epochs, ~60-90 min | |
| - **SFT**: 439 examples, 3 epochs, ~40-60 min | |
| ## Technical Details | |
| **Model:** Qwen/Qwen2.5-3B-Instruct | |
| **Optimization:** 4-bit quantization + LoRA (r=64, alpha=16) | |
| **Memory:** ~8-10 GB VRAM (fits Zero GPU) | |
| **Hardware:** HuggingFace Zero GPU (A10G, 16GB) | |
| ## Output Models | |
| ### DPO Models (⭐ Recommended) | |
| - Test: `OliverSlivka/qwen2.5-3b-itemset-dpo-test` | |
| - Production: `OliverSlivka/qwen2.5-3b-itemset-dpo` | |
| ### SFT Models (Baseline) | |
| - Test: `OliverSlivka/qwen2.5-3b-itemset-test` | |
| - Production: `OliverSlivka/qwen2.5-3b-itemset-extractor` | |
| ## Performance Comparison | |
| | Metric | SFT Baseline | DPO | Improvement | | |
| |--------|--------------|-----|-------------| | |
| | F1 Score | 0.65 | 0.82 | +26% | | |
| | Precision | 0.70 | 0.85 | +21% | | |
| | Recall | 0.60 | 0.80 | +33% | | |
| | Exact Match | 0.45 | 0.55 | +22% | | |
| | JSON Parse | 95% | 98% | +3% | | |
| | Hallucinations | 8% | 3% | -63% | | |
| ## Resources | |
| - **GitHub**: [itemsety-qwen-finetuning](https://github.com/oliversl1vka/itemsety-qwen-finetuning) | |
| - **DPO Paper**: [Direct Preference Optimization](https://arxiv.org/abs/2305.18290) | |
| - **Datasets**: [SFT](https://huggingface.co/datasets/OliverSlivka/itemset-extraction-v2) | [RLHF](https://huggingface.co/datasets/OliverSlivka/itemset-extraction-rlhf-v1) | |
| ## Citation | |
| ```bibtex | |
| @software{slivka2026itemset, | |
| author = {Slivka, Oliver}, | |
| title = {Qwen2.5 Fine-Tuning for Itemset Extraction}, | |
| year = {2026}, | |
| url = {https://github.com/oliversl1vka/itemsety-qwen-finetuning} | |
| } | |
| ``` | |