eeeeeeeeeeeeee3 commited on
Commit
9925b01
·
verified ·
1 Parent(s): 43e6264

Upload MEMORY_REDUCTION_GUIDE.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. MEMORY_REDUCTION_GUIDE.md +115 -0
MEMORY_REDUCTION_GUIDE.md ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Memory Reduction Guide - 20% Reduction Applied
2
+
3
+ ## Overview
4
+ Training configuration has been optimized to reduce system memory requirements by approximately 20% while maintaining training quality.
5
+
6
+ ## Changes Applied
7
+
8
+ ### 1. Resolution Reduction (Biggest Impact)
9
+ - **Original:** 1288x1288
10
+ - **New:** 1152x1152
11
+ - **Memory Impact:** ~20% reduction in activation memory
12
+ - **Rationale:** Image area reduced from 1,658,944 to 1,327,104 pixels (80% of original)
13
+ - **Note:** 1152 is divisible by 64, maintaining efficiency
14
+
15
+ ### 2. Multi-Scale Training Disabled
16
+ - **Original:** `multi_scale: true`
17
+ - **New:** `multi_scale: false`
18
+ - **Memory Impact:** Significant savings (no multiple scale processing)
19
+ - **Rationale:** Reduces memory by avoiding processing at multiple resolutions
20
+
21
+ ### 3. Expanded Scales Disabled
22
+ - **Original:** `expanded_scales: true`
23
+ - **New:** `expanded_scales: false`
24
+ - **Memory Impact:** Additional memory savings
25
+ - **Rationale:** Prevents memory overhead from expanded scale processing
26
+
27
+ ### 4. Data Loading Optimizations
28
+ - **num_workers:** 2 → 1 (50% reduction)
29
+ - **pin_memory:** true → false (saves RAM)
30
+ - **prefetch_factor:** default → 1 (less prefetched data)
31
+ - **persistent_workers:** false (saves memory)
32
+ - **Memory Impact:** Reduces data loading memory footprint
33
+
34
+ ### 5. Gradient Accumulation Adjusted
35
+ - **Original:** 16 steps
36
+ - **New:** 20 steps
37
+ - **Rationale:** Maintains effective batch size (2*20=40 vs 2*16=32) without increasing memory
38
+ - **Note:** Gradient accumulation doesn't increase memory, only computation time
39
+
40
+ ## Expected Memory Reduction
41
+
42
+ ### Total Reduction: ~20-25%
43
+
44
+ Breakdown:
45
+ - Resolution reduction: ~20% of activation memory
46
+ - Multi-scale disabled: ~5-10% additional savings
47
+ - Data loading optimizations: ~3-5% additional savings
48
+ - **Total: ~20-25% system memory reduction**
49
+
50
+ ## Training Quality Impact
51
+
52
+ ### Minimal Impact Expected
53
+ - **Resolution:** 1152 is still high resolution, sufficient for ball detection
54
+ - **Multi-scale:** Disabled, but single-scale training is standard and effective
55
+ - **Batch size:** Maintained at 2 (same as original)
56
+ - **Effective batch size:** Increased to 40 (from 32) via gradient accumulation
57
+
58
+ ## Usage
59
+
60
+ ### Option 1: Use Low Memory Config
61
+ ```bash
62
+ cd /workspace/soccer_cv_ball
63
+ ./scripts/resume_training_low_memory.sh
64
+ ```
65
+
66
+ ### Option 2: Manual Command
67
+ ```bash
68
+ cd /workspace/soccer_cv_ball
69
+ python scripts/train_ball.py \
70
+ --config configs/resume_20_epochs_low_memory.yaml \
71
+ --output-dir models
72
+ ```
73
+
74
+ ## Monitoring Memory Usage
75
+
76
+ ### Check System Memory
77
+ ```bash
78
+ # Monitor system RAM
79
+ free -h
80
+
81
+ # Monitor GPU memory (if using GPU)
82
+ nvidia-smi
83
+ ```
84
+
85
+ ### If Still Running Out of Memory
86
+
87
+ Further reductions possible:
88
+ 1. **Reduce resolution further:** 1152 → 1024 (additional ~20% reduction)
89
+ 2. **Reduce batch size:** 2 → 1 (50% reduction, but slower training)
90
+ 3. **Disable gradient accumulation:** Reduces effective batch but saves some memory
91
+ 4. **Enable gradient checkpointing:** Trade compute for memory (if supported)
92
+
93
+ ## Configuration File
94
+
95
+ The low memory configuration is saved at:
96
+ - `configs/resume_20_epochs_low_memory.yaml`
97
+
98
+ ## Comparison
99
+
100
+ | Setting | Original | Low Memory | Reduction |
101
+ |---------|----------|------------|-----------|
102
+ | Resolution | 1288 | 1152 | 20% |
103
+ | Multi-scale | true | false | - |
104
+ | Expanded scales | true | false | - |
105
+ | num_workers | 2 | 1 | 50% |
106
+ | pin_memory | true | false | - |
107
+ | prefetch_factor | default | 1 | - |
108
+ | grad_accum_steps | 16 | 20 | - (maintains effective batch) |
109
+
110
+ ## Notes
111
+
112
+ - The checkpoint will be loaded correctly despite resolution change (RF-DETR handles this)
113
+ - Training will resume from epoch 20
114
+ - All other training parameters remain the same (learning rate, optimizer, etc.)
115
+ - Model architecture unchanged (RF-DETR Base)