eeeeeeeeeeeeee3 commited on
Commit
68479a3
·
verified ·
1 Parent(s): a42160b

Upload IMPLEMENTATION_GUIDE.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. IMPLEMENTATION_GUIDE.md +171 -0
IMPLEMENTATION_GUIDE.md ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Implementation Guide: Revised Small Object Optimization Strategy
2
+
3
+ ## Quick Start
4
+
5
+ ### Step 1: Fix Inference Thresholds (Do This First - 5 minutes)
6
+
7
+ **File**: `src/perception/local_detector.py`
8
+
9
+ Change line 17:
10
+ ```python
11
+ # FROM:
12
+ def __init__(self, model_path: str, confidence_threshold: float = 0.5, device: str = None):
13
+
14
+ # TO:
15
+ def __init__(self, model_path: str, confidence_threshold: float = 0.05, device: str = None):
16
+ ```
17
+
18
+ **Why**: Average confidence is ~0.140, but threshold is 0.5 → all detections filtered. This fixes the 0% SAHI recall issue.
19
+
20
+ **Also Check**: SAHI inference scripts - ensure they use `confidence_threshold=0.05` for ball class.
21
+
22
+ ---
23
+
24
+ ### Step 2: After Epoch 40 Completes - Start Phase 1
25
+
26
+ **Action**: Switch to domain adaptation config
27
+
28
+ **Command**:
29
+ ```bash
30
+ cd /workspace/soccer_cv_ball
31
+
32
+ # Update config: Set start_epoch: 40 in resume_with_domain_adaptation.yaml
33
+ # Then run:
34
+ python scripts/train_ball.py \
35
+ --config configs/resume_with_domain_adaptation.yaml \
36
+ --output-dir models \
37
+ --resume models/checkpoint.pth
38
+ ```
39
+
40
+ **Expected**: Small objects mAP should improve from 0.598 to 0.63-0.65 over 5 epochs
41
+
42
+ **Monitor**: Run evaluation after epoch 45:
43
+ ```bash
44
+ python scripts/comprehensive_training_evaluation.py configs/resume_with_domain_adaptation.yaml
45
+ ```
46
+
47
+ ---
48
+
49
+ ### Step 3: After Epoch 45 - Start Phase 1.5
50
+
51
+ **Action**: Switch to high-resolution config with gradient accumulation
52
+
53
+ **Command**:
54
+ ```bash
55
+ # Update config: Set start_epoch: 45 in resume_with_highres_gradaccum.yaml
56
+ # Then run:
57
+ python scripts/train_ball.py \
58
+ --config configs/resume_with_highres_gradaccum.yaml \
59
+ --output-dir models \
60
+ --resume models/checkpoints/checkpoint_epoch_45_lightweight.pth
61
+ ```
62
+
63
+ **Expected**: Small objects mAP should improve from 0.63 to 0.65-0.66
64
+
65
+ **Monitor GPU Memory**:
66
+ ```bash
67
+ watch -n 1 nvidia-smi
68
+ ```
69
+
70
+ If OOM occurs: Reduce `batch_size` to 1, increase `grad_accum_steps` to 32
71
+
72
+ ---
73
+
74
+ ## Important Notes
75
+
76
+ ### RF-DETR Augmentation Handling
77
+
78
+ **Critical**: RF-DETR's `train()` function may have its own augmentation system. The `augmentation` section in the config might only be used for:
79
+ - Mosaic preprocessing (handled by `train_ball.py` before RF-DETR)
80
+ - Custom transforms (if RF-DETR supports them)
81
+
82
+ **Action Required**:
83
+ 1. Check RF-DETR documentation/source for augmentation parameters
84
+ 2. If RF-DETR doesn't support motion blur/noise via config, we may need to:
85
+ - Preprocess the dataset with augmentations (offline)
86
+ - Or modify RF-DETR's internal augmentation pipeline
87
+
88
+ **Workaround**: The domain adaptation config includes augmentation settings. If RF-DETR doesn't use them, we can:
89
+ 1. Preprocess training images with motion blur/noise (create augmented dataset)
90
+ 2. Or modify the data loader to apply augmentations on-the-fly
91
+
92
+ ### Confidence Threshold Fix
93
+
94
+ The inference threshold fix (0.5 → 0.05) is **critical** and should be done immediately, even before Phase 1 training starts. This enables detection of valid but low-confidence candidates.
95
+
96
+ ---
97
+
98
+ ## Expected Timeline
99
+
100
+ | Date/Event | Action | Expected Result |
101
+ |------------|--------|-----------------|
102
+ | **Now** | Fix inference thresholds | Enables detection |
103
+ | **After Epoch 40** | Start Phase 1 (Domain Adaptation) | 0.598 → 0.63-0.65 |
104
+ | **After Epoch 45** | Start Phase 1.5 (High-Res) | 0.63 → 0.65-0.66 |
105
+ | **After Epoch 50** | Start Phase 3 (Multi-scale) | 0.65 → 0.67-0.68 |
106
+ | **Target** | **0.67-0.70** | ✅ **Achieved** |
107
+
108
+ ---
109
+
110
+ ## Verification Checklist
111
+
112
+ After each phase, verify:
113
+
114
+ - [ ] Small objects mAP improved
115
+ - [ ] Overall mAP maintained (>0.68)
116
+ - [ ] No overfitting (val loss not increasing)
117
+ - [ ] SAHI recall > 0% (if Phase 2 completed)
118
+ - [ ] GPU memory usage acceptable
119
+ - [ ] Training loss decreasing
120
+
121
+ ---
122
+
123
+ ## Troubleshooting
124
+
125
+ ### If Domain Adaptation Config Doesn't Work
126
+
127
+ **Issue**: RF-DETR may not use augmentation config directly
128
+
129
+ **Solution**:
130
+ 1. Check RF-DETR source code for augmentation parameters
131
+ 2. If needed, preprocess dataset offline with augmentations
132
+ 3. Or modify data loader to apply augmentations
133
+
134
+ ### If High-Resolution Causes OOM
135
+
136
+ **Issue**: Even with gradient accumulation, OOM occurs
137
+
138
+ **Solution**:
139
+ 1. Reduce `batch_size` to 1
140
+ 2. Increase `grad_accum_steps` to 32 (maintains effective batch ~32)
141
+ 3. If still OOM, keep resolution at 1120 and focus on domain adaptation
142
+
143
+ ### If Metrics Don't Improve
144
+
145
+ **Issue**: Domain adaptation not helping
146
+
147
+ **Solution**:
148
+ 1. Verify augmentations are actually being applied
149
+ 2. Check if RF-DETR has built-in augmentations that conflict
150
+ 3. Increase augmentation probabilities (motion blur prob: 0.5 → 0.7)
151
+ 4. Consider TrackNet alternative (Phase 4)
152
+
153
+ ---
154
+
155
+ ## Key Files
156
+
157
+ - **Strategy**: `SMALL_OBJECT_OPTIMIZATION_STRATEGY_REVISED.md`
158
+ - **Summary**: `STRATEGY_UPDATE_SUMMARY.md`
159
+ - **Phase 1 Config**: `configs/resume_with_domain_adaptation.yaml`
160
+ - **Phase 1.5 Config**: `configs/resume_with_highres_gradaccum.yaml`
161
+ - **Phase 3 Config**: `configs/resume_with_multiscale.yaml` (already exists)
162
+
163
+ ---
164
+
165
+ ## Success Criteria
166
+
167
+ **Minimum**: Small objects mAP ≥ 0.65 (from 0.598)
168
+ **Target**: Small objects mAP ≥ 0.70
169
+ **Optimal**: Small objects mAP ≥ 0.75
170
+
171
+ **Current Progress**: +0.071 in 19 epochs (+0.0044/epoch) - **on track!**