File size: 5,216 Bytes
68479a3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
# Implementation Guide: Revised Small Object Optimization Strategy

## Quick Start

### Step 1: Fix Inference Thresholds (Do This First - 5 minutes)

**File**: `src/perception/local_detector.py`

Change line 17:
```python
# FROM:
def __init__(self, model_path: str, confidence_threshold: float = 0.5, device: str = None):

# TO:
def __init__(self, model_path: str, confidence_threshold: float = 0.05, device: str = None):
```

**Why**: Average confidence is ~0.140, but threshold is 0.5 → all detections filtered. This fixes the 0% SAHI recall issue.

**Also Check**: SAHI inference scripts - ensure they use `confidence_threshold=0.05` for ball class.

---

### Step 2: After Epoch 40 Completes - Start Phase 1

**Action**: Switch to domain adaptation config

**Command**:
```bash
cd /workspace/soccer_cv_ball

# Update config: Set start_epoch: 40 in resume_with_domain_adaptation.yaml
# Then run:
python scripts/train_ball.py \
    --config configs/resume_with_domain_adaptation.yaml \
    --output-dir models \
    --resume models/checkpoint.pth
```

**Expected**: Small objects mAP should improve from 0.598 to 0.63-0.65 over 5 epochs

**Monitor**: Run evaluation after epoch 45:
```bash
python scripts/comprehensive_training_evaluation.py configs/resume_with_domain_adaptation.yaml
```

---

### Step 3: After Epoch 45 - Start Phase 1.5

**Action**: Switch to high-resolution config with gradient accumulation

**Command**:
```bash
# Update config: Set start_epoch: 45 in resume_with_highres_gradaccum.yaml
# Then run:
python scripts/train_ball.py \
    --config configs/resume_with_highres_gradaccum.yaml \
    --output-dir models \
    --resume models/checkpoints/checkpoint_epoch_45_lightweight.pth
```

**Expected**: Small objects mAP should improve from 0.63 to 0.65-0.66

**Monitor GPU Memory**: 
```bash
watch -n 1 nvidia-smi
```

If OOM occurs: Reduce `batch_size` to 1, increase `grad_accum_steps` to 32

---

## Important Notes

### RF-DETR Augmentation Handling

**Critical**: RF-DETR's `train()` function may have its own augmentation system. The `augmentation` section in the config might only be used for:
- Mosaic preprocessing (handled by `train_ball.py` before RF-DETR)
- Custom transforms (if RF-DETR supports them)

**Action Required**: 
1. Check RF-DETR documentation/source for augmentation parameters
2. If RF-DETR doesn't support motion blur/noise via config, we may need to:
   - Preprocess the dataset with augmentations (offline)
   - Or modify RF-DETR's internal augmentation pipeline

**Workaround**: The domain adaptation config includes augmentation settings. If RF-DETR doesn't use them, we can:
1. Preprocess training images with motion blur/noise (create augmented dataset)
2. Or modify the data loader to apply augmentations on-the-fly

### Confidence Threshold Fix

The inference threshold fix (0.5 → 0.05) is **critical** and should be done immediately, even before Phase 1 training starts. This enables detection of valid but low-confidence candidates.

---

## Expected Timeline

| Date/Event | Action | Expected Result |
|------------|--------|-----------------|
| **Now** | Fix inference thresholds | Enables detection |
| **After Epoch 40** | Start Phase 1 (Domain Adaptation) | 0.598 → 0.63-0.65 |
| **After Epoch 45** | Start Phase 1.5 (High-Res) | 0.63 → 0.65-0.66 |
| **After Epoch 50** | Start Phase 3 (Multi-scale) | 0.65 → 0.67-0.68 |
| **Target** | **0.67-0.70** | ✅ **Achieved** |

---

## Verification Checklist

After each phase, verify:

- [ ] Small objects mAP improved
- [ ] Overall mAP maintained (>0.68)
- [ ] No overfitting (val loss not increasing)
- [ ] SAHI recall > 0% (if Phase 2 completed)
- [ ] GPU memory usage acceptable
- [ ] Training loss decreasing

---

## Troubleshooting

### If Domain Adaptation Config Doesn't Work

**Issue**: RF-DETR may not use augmentation config directly

**Solution**: 
1. Check RF-DETR source code for augmentation parameters
2. If needed, preprocess dataset offline with augmentations
3. Or modify data loader to apply augmentations

### If High-Resolution Causes OOM

**Issue**: Even with gradient accumulation, OOM occurs

**Solution**:
1. Reduce `batch_size` to 1
2. Increase `grad_accum_steps` to 32 (maintains effective batch ~32)
3. If still OOM, keep resolution at 1120 and focus on domain adaptation

### If Metrics Don't Improve

**Issue**: Domain adaptation not helping

**Solution**:
1. Verify augmentations are actually being applied
2. Check if RF-DETR has built-in augmentations that conflict
3. Increase augmentation probabilities (motion blur prob: 0.5 → 0.7)
4. Consider TrackNet alternative (Phase 4)

---

## Key Files

- **Strategy**: `SMALL_OBJECT_OPTIMIZATION_STRATEGY_REVISED.md`
- **Summary**: `STRATEGY_UPDATE_SUMMARY.md`
- **Phase 1 Config**: `configs/resume_with_domain_adaptation.yaml`
- **Phase 1.5 Config**: `configs/resume_with_highres_gradaccum.yaml`
- **Phase 3 Config**: `configs/resume_with_multiscale.yaml` (already exists)

---

## Success Criteria

**Minimum**: Small objects mAP ≥ 0.65 (from 0.598)  
**Target**: Small objects mAP ≥ 0.70  
**Optimal**: Small objects mAP ≥ 0.75

**Current Progress**: +0.071 in 19 epochs (+0.0044/epoch) - **on track!**