slivk commited on
Commit
641aaa0
·
1 Parent(s): cbaf615

docs: Update README to reflect T4 GPU (not Zero GPU)

Browse files
Files changed (1) hide show
  1. README.md +25 -18
README.md CHANGED
@@ -8,12 +8,12 @@ sdk_version: "5.13.0"
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
- hardware: zero-a10g
12
  ---
13
 
14
  # Qwen2.5 Fine-Tuning for Itemset Extraction
15
 
16
- This Space fine-tunes Qwen2.5-0.5B-Instruct on the [itemset-extraction-v2](https://huggingface.co/datasets/OliverSlivka/itemset-extraction-v2) dataset.
17
 
18
  ## What it does
19
 
@@ -21,7 +21,7 @@ Trains a language model to extract frequent itemsets from transaction data using
21
  - **Dataset**: 488 training examples with real-world column names
22
  - **Model**: Qwen2.5-3B-Instruct (high quality results)
23
  - **Method**: Supervised Fine-Tuning (SFT) with 4-bit LoRA
24
- - **Hardware**: Zero GPU A10G (free GPU access)
25
 
26
  ## How to use
27
 
@@ -38,27 +38,34 @@ Trains a language model to extract frequent itemsets from transaction data using
38
  - **Batch size**: 2 (effective 16 with gradient accumulation)
39
  - **Duration**: ~10-15 minutes
40
  - **Output**: `OliverSlivka/qwen2.5-3b-itemset-test`
41
- Notes
42
 
43
- This Space supports two training modes:
44
-
45
- - **Test Mode**: Quick validation with 50 examples (~10-15 min)
46
- - Verifies setup works on Zero GPU
47
- - Pushes to test repo for inspection
48
 
49
- - **Full Mode**: Production training with 439 examples, 3 epochs (~40-60 min)
50
- - Target: 80-90% valid JSON (vs 6.7% from 0.5B baseline)
51
- - Final model for real-world use
 
 
 
 
 
 
 
 
52
 
53
- Both modes use **Qwen2.5-3B with 4-bit quantization** - fits perfectly in Zero GPU's 16GB memory!
54
  ## Notes
55
 
56
- This is a **test run** with 50 training examples to verify the setup works with Zero GPU.
 
 
 
 
57
 
58
- For production training:
59
- - Use full 439-example training set
60
- - Train for 2-3 epochs (~200 steps)
61
- - Consider using Qwen2.5-3B or 7B for better results (requires paid GPU)
62
 
63
  ## Dataset
64
 
 
8
  app_file: app.py
9
  pinned: false
10
  license: mit
11
+ hardware: t4-small
12
  ---
13
 
14
  # Qwen2.5 Fine-Tuning for Itemset Extraction
15
 
16
+ Fine-tune Qwen2.5-3B on the [itemset-extraction-v2](https://huggingface.co/datasets/OliverSlivka/itemset-extraction-v2) dataset.
17
 
18
  ## What it does
19
 
 
21
  - **Dataset**: 488 training examples with real-world column names
22
  - **Model**: Qwen2.5-3B-Instruct (high quality results)
23
  - **Method**: Supervised Fine-Tuning (SFT) with 4-bit LoRA
24
+ - **Hardware**: NVIDIA T4 Small (paid GPU, 16GB VRAM)
25
 
26
  ## How to use
27
 
 
38
  - **Batch size**: 2 (effective 16 with gradient accumulation)
39
  - **Duration**: ~10-15 minutes
40
  - **Output**: `OliverSlivka/qwen2.5-3b-itemset-test`
41
+ ## Training Modes
42
 
43
+ ### Test Mode (50 examples)
44
+ - **Duration**: ~10-15 minutes
45
+ - **Output**: `OliverSlivka/qwen2.5-3b-itemset-test`
46
+ - **Purpose**: Quick validation before full training
 
47
 
48
+ ### Full Mode (439 examples, 3 epochs)
49
+ - **Duration**: ~40-60 minutes
50
+ - **Output**: `OliverSlivka/qwen2.5-3b-itemset-extractor`
51
+ - **Target**: 80-90% valid JSON (vs 6.7% from 0.5B baseline)
52
+ - **Cost**: ~$0.60 on T4 Small
53
+
54
+ **Technical Details:**
55
+ - LoRA rank 16, alpha 32
56
+ - Batch size 2, gradient accumulation 8 (effective batch 16)
57
+ - 4-bit quantization (QLoRA) - efficient training, proven results
58
+ - FP16 precision (T4 compatible)
59
 
 
60
  ## Notes
61
 
62
+ Both modes use **4-bit quantization** for:
63
+ - ✅ Faster training (lower memory = faster iteration)
64
+ - ✅ Lower cost (~30% faster = ~30% cheaper)
65
+ - ✅ Proven effective for LoRA fine-tuning
66
+ - ✅ No quality loss vs full precision LoRA
67
 
68
+ Paid T4 GPU ($0.60/hour) provides consistent performance without time limits.
 
 
 
69
 
70
  ## Dataset
71