File size: 7,777 Bytes
7a87926 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 | # End-to-End Training Pipeline Architecture
## π― Overview
The training pipeline is split into **two phases** to handle the computational cost of BA:
1. **Pre-Processing Phase** (offline, expensive) - Compute BA and oracle uncertainty
2. **Training Phase** (online, fast) - Load pre-computed results and train
## π Pipeline Flow
### Phase 1: Pre-Processing (Offline)
**When:** Run once before training (or when data/model changes)
**What it does:**
1. Extract ARKit data (poses, LiDAR) - **FREE**
2. Run DA3 inference (GPU, batchable) - **Moderate cost**
3. Run BA validation (CPU, expensive) - **Only if ARKit quality is poor**
4. Compute oracle uncertainty propagation - **Moderate cost**
5. Save to cache - **Fast disk I/O**
**Time:** ~10-20 minutes per sequence (mostly BA)
**Command:**
```bash
ylff preprocess arkit data/arkit_sequences \
--output-cache cache/preprocessed \
--num-workers 8
```
### Phase 2: Training (Online)
**When:** Run repeatedly during training iterations
**What it does:**
1. Load pre-computed results from cache - **Fast (disk I/O)**
2. Run DA3 inference (current model) - **GPU, fast**
3. Compute uncertainty-weighted loss - **GPU, fast**
4. Backprop & update - **Standard training**
**Time:** ~1-3 seconds per sequence
**Command:**
```bash
ylff train pretrain data/arkit_sequences \
--use-preprocessed \
--preprocessed-cache-dir cache/preprocessed \
--epochs 50
```
## π Complete Workflow
### Step 1: Pre-Process All Sequences
```bash
# Pre-process all ARKit sequences (one-time, can run overnight)
ylff preprocess arkit data/arkit_sequences \
--output-cache cache/preprocessed \
--model-name depth-anything/DA3-LARGE \
--num-workers 8 \
--use-lidar \
--prefer-arkit-poses
# This:
# - Extracts ARKit data (free)
# - Runs DA3 inference (GPU)
# - Runs BA only for sequences with poor ARKit tracking
# - Computes oracle uncertainty
# - Saves everything to cache
```
**Output:**
```
cache/preprocessed/
βββ sequence_001/
β βββ oracle_targets.npz # Best poses/depth (BA or ARKit)
β βββ uncertainty_results.npz # Confidence scores, uncertainty
β βββ arkit_data.npz # Original ARKit data
β βββ metadata.json # Sequence info
βββ sequence_002/
βββ ...
```
### Step 2: Train Using Pre-Processed Data
```bash
# Train using pre-computed results (fast iteration)
ylff train pretrain data/arkit_sequences \
--use-preprocessed \
--preprocessed-cache-dir cache/preprocessed \
--epochs 50 \
--lr 1e-4 \
--batch-size 1
```
**What happens:**
1. Loads pre-computed oracle targets and uncertainty from cache
2. Runs DA3 inference with current model
3. Computes uncertainty-weighted loss (continuous confidence)
4. Updates model weights
## π« Handling Rejection/Failure
### No Binary Rejection
**Key Principle:** All data contributes, just weighted by confidence.
### Continuous Confidence Weighting
**In Loss Function:**
```python
# All pixels/frames contribute, weighted by confidence
loss = confidence * prediction_error
# Low confidence (0.3) β weight=0.3 (contributes less)
# High confidence (0.9) β weight=0.9 (contributes more)
# No hard cutoff - smooth weighting
```
### Failure Scenarios
**BA Failure:**
- β
Falls back to ARKit poses (if quality good)
- β
Lower confidence score (reflects uncertainty)
- β
Still used for training (just weighted less)
- β
Model learns from ARKit poses with lower confidence
**Missing LiDAR:**
- β
Uses BA depth (if available)
- β
Or geometric consistency only
- β
Lower confidence score
- β
Still used for training
**Poor Tracking:**
- β
Lower confidence score
- β
Still used for training
- β
Model learns to handle uncertainty
**Key Insight:** Even "failed" or low-confidence data contributes to training, just with lower weight. This is better than binary rejection because:
- No information loss
- Model learns to handle uncertainty
- Smooth gradient flow (no hard cutoffs)
- Better generalization
## π Performance Comparison
### Without Pre-Processing (Current)
**Per Training Iteration:**
- BA computation: ~5-15 min per sequence (CPU, expensive)
- DA3 inference: ~0.5-2 sec per sequence (GPU)
- Loss computation: ~0.1-0.5 sec per sequence (GPU)
- **Total: ~5-15 min per sequence**
**For 100 sequences:**
- One epoch: ~8-25 hours
- 50 epochs: ~17-52 days
### With Pre-Processing (New)
**Pre-Processing (One-Time):**
- BA computation: ~5-15 min per sequence (CPU, expensive)
- Oracle uncertainty: ~10-30 sec per sequence (CPU)
- **Total: ~10-20 min per sequence** (one-time cost)
**Training (Per Iteration):**
- Load cache: ~0.1-1 sec per sequence (disk I/O)
- DA3 inference: ~0.5-2 sec per sequence (GPU)
- Loss computation: ~0.1-0.5 sec per sequence (GPU)
- **Total: ~1-3 sec per sequence**
**For 100 sequences:**
- Pre-processing: ~17-33 hours (one-time)
- One epoch: ~2-5 minutes
- 50 epochs: ~2-4 hours
**Speedup:** 100-1000x faster training iteration!
## π§ Implementation Details
### Pre-Processing Service
**File:** `ylff/services/preprocessing.py`
**Function:** `preprocess_arkit_sequence()`
**Steps:**
1. Extract ARKit data (free)
2. Run DA3 inference (GPU)
3. Decide: ARKit poses (if quality good) or BA (if quality poor)
4. Compute oracle uncertainty propagation
5. Save to cache
### Preprocessed Dataset
**File:** `ylff/services/preprocessed_dataset.py`
**Class:** `PreprocessedARKitDataset`
**Features:**
- Loads pre-computed oracle targets
- Loads uncertainty results (confidence, covariance)
- Loads ARKit data (for reference)
- Fast disk I/O (no BA computation)
### Training Integration
**File:** `ylff/services/pretrain.py`
**Changes:**
- Detects preprocessed data (checks for `uncertainty_results` in batch)
- Uses `oracle_uncertainty_ensemble_loss()` when available
- Falls back to standard loss for live data (backward compatibility)
## π Usage Examples
### Full Workflow
```bash
# Step 1: Pre-process (one-time, overnight)
ylff preprocess arkit data/arkit_sequences \
--output-cache cache/preprocessed \
--num-workers 8
# Step 2: Train (fast iteration)
ylff train pretrain data/arkit_sequences \
--use-preprocessed \
--preprocessed-cache-dir cache/preprocessed \
--epochs 50
# Step 3: Iterate on training (no re-preprocessing needed)
ylff train pretrain data/arkit_sequences \
--use-preprocessed \
--preprocessed-cache-dir cache/preprocessed \
--epochs 100 \
--lr 5e-5 # Lower LR for fine-tuning
```
### When to Re-Preprocess
Only needed if:
- β
New sequences added
- β
Different DA3 model used for initial inference
- β
BA parameters changed
- β
Oracle uncertainty parameters changed
**Not needed for:**
- β Training hyperparameter changes (LR, batch size, etc.)
- β Model architecture changes (same input/output)
- β Training iteration (epochs, etc.)
## π Key Benefits
1. **100-1000x faster training iteration** - No BA during training
2. **Continuous confidence weighting** - No binary rejection
3. **All data contributes** - Low confidence = low weight, not zero
4. **Uncertainty propagation** - Covariance estimates available
5. **Parallelizable pre-processing** - Can process multiple sequences simultaneously
6. **Reusable cache** - Pre-process once, train many times
## π Summary
**Pre-Processing:**
- Runs BA and oracle uncertainty computation offline
- Saves results to cache
- One-time cost per dataset
**Training:**
- Loads pre-computed results
- Fast iteration (no BA)
- Uses continuous confidence weighting
- All data contributes (weighted by confidence)
This architecture enables efficient training while using all available oracle sources! π
|