File size: 7,777 Bytes
7a87926
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
# End-to-End Training Pipeline Architecture

## 🎯 Overview

The training pipeline is split into **two phases** to handle the computational cost of BA:

1. **Pre-Processing Phase** (offline, expensive) - Compute BA and oracle uncertainty
2. **Training Phase** (online, fast) - Load pre-computed results and train

## πŸ“Š Pipeline Flow

### Phase 1: Pre-Processing (Offline)

**When:** Run once before training (or when data/model changes)

**What it does:**

1. Extract ARKit data (poses, LiDAR) - **FREE**
2. Run DA3 inference (GPU, batchable) - **Moderate cost**
3. Run BA validation (CPU, expensive) - **Only if ARKit quality is poor**
4. Compute oracle uncertainty propagation - **Moderate cost**
5. Save to cache - **Fast disk I/O**

**Time:** ~10-20 minutes per sequence (mostly BA)

**Command:**

```bash
ylff preprocess arkit data/arkit_sequences \
    --output-cache cache/preprocessed \
    --num-workers 8
```

### Phase 2: Training (Online)

**When:** Run repeatedly during training iterations

**What it does:**

1. Load pre-computed results from cache - **Fast (disk I/O)**
2. Run DA3 inference (current model) - **GPU, fast**
3. Compute uncertainty-weighted loss - **GPU, fast**
4. Backprop & update - **Standard training**

**Time:** ~1-3 seconds per sequence

**Command:**

```bash
ylff train pretrain data/arkit_sequences \
    --use-preprocessed \
    --preprocessed-cache-dir cache/preprocessed \
    --epochs 50
```

## πŸ”„ Complete Workflow

### Step 1: Pre-Process All Sequences

```bash
# Pre-process all ARKit sequences (one-time, can run overnight)
ylff preprocess arkit data/arkit_sequences \
    --output-cache cache/preprocessed \
    --model-name depth-anything/DA3-LARGE \
    --num-workers 8 \
    --use-lidar \
    --prefer-arkit-poses

# This:
# - Extracts ARKit data (free)
# - Runs DA3 inference (GPU)
# - Runs BA only for sequences with poor ARKit tracking
# - Computes oracle uncertainty
# - Saves everything to cache
```

**Output:**

```
cache/preprocessed/
β”œβ”€β”€ sequence_001/
β”‚   β”œβ”€β”€ oracle_targets.npz      # Best poses/depth (BA or ARKit)
β”‚   β”œβ”€β”€ uncertainty_results.npz  # Confidence scores, uncertainty
β”‚   β”œβ”€β”€ arkit_data.npz          # Original ARKit data
β”‚   └── metadata.json           # Sequence info
└── sequence_002/
    └── ...
```

### Step 2: Train Using Pre-Processed Data

```bash
# Train using pre-computed results (fast iteration)
ylff train pretrain data/arkit_sequences \
    --use-preprocessed \
    --preprocessed-cache-dir cache/preprocessed \
    --epochs 50 \
    --lr 1e-4 \
    --batch-size 1
```

**What happens:**

1. Loads pre-computed oracle targets and uncertainty from cache
2. Runs DA3 inference with current model
3. Computes uncertainty-weighted loss (continuous confidence)
4. Updates model weights

## 🚫 Handling Rejection/Failure

### No Binary Rejection

**Key Principle:** All data contributes, just weighted by confidence.

### Continuous Confidence Weighting

**In Loss Function:**

```python
# All pixels/frames contribute, weighted by confidence
loss = confidence * prediction_error

# Low confidence (0.3) β†’ weight=0.3 (contributes less)
# High confidence (0.9) β†’ weight=0.9 (contributes more)
# No hard cutoff - smooth weighting
```

### Failure Scenarios

**BA Failure:**

- βœ… Falls back to ARKit poses (if quality good)
- βœ… Lower confidence score (reflects uncertainty)
- βœ… Still used for training (just weighted less)
- βœ… Model learns from ARKit poses with lower confidence

**Missing LiDAR:**

- βœ… Uses BA depth (if available)
- βœ… Or geometric consistency only
- βœ… Lower confidence score
- βœ… Still used for training

**Poor Tracking:**

- βœ… Lower confidence score
- βœ… Still used for training
- βœ… Model learns to handle uncertainty

**Key Insight:** Even "failed" or low-confidence data contributes to training, just with lower weight. This is better than binary rejection because:

- No information loss
- Model learns to handle uncertainty
- Smooth gradient flow (no hard cutoffs)
- Better generalization

## πŸ“ˆ Performance Comparison

### Without Pre-Processing (Current)

**Per Training Iteration:**

- BA computation: ~5-15 min per sequence (CPU, expensive)
- DA3 inference: ~0.5-2 sec per sequence (GPU)
- Loss computation: ~0.1-0.5 sec per sequence (GPU)
- **Total: ~5-15 min per sequence**

**For 100 sequences:**

- One epoch: ~8-25 hours
- 50 epochs: ~17-52 days

### With Pre-Processing (New)

**Pre-Processing (One-Time):**

- BA computation: ~5-15 min per sequence (CPU, expensive)
- Oracle uncertainty: ~10-30 sec per sequence (CPU)
- **Total: ~10-20 min per sequence** (one-time cost)

**Training (Per Iteration):**

- Load cache: ~0.1-1 sec per sequence (disk I/O)
- DA3 inference: ~0.5-2 sec per sequence (GPU)
- Loss computation: ~0.1-0.5 sec per sequence (GPU)
- **Total: ~1-3 sec per sequence**

**For 100 sequences:**

- Pre-processing: ~17-33 hours (one-time)
- One epoch: ~2-5 minutes
- 50 epochs: ~2-4 hours

**Speedup:** 100-1000x faster training iteration!

## πŸ”§ Implementation Details

### Pre-Processing Service

**File:** `ylff/services/preprocessing.py`

**Function:** `preprocess_arkit_sequence()`

**Steps:**

1. Extract ARKit data (free)
2. Run DA3 inference (GPU)
3. Decide: ARKit poses (if quality good) or BA (if quality poor)
4. Compute oracle uncertainty propagation
5. Save to cache

### Preprocessed Dataset

**File:** `ylff/services/preprocessed_dataset.py`

**Class:** `PreprocessedARKitDataset`

**Features:**

- Loads pre-computed oracle targets
- Loads uncertainty results (confidence, covariance)
- Loads ARKit data (for reference)
- Fast disk I/O (no BA computation)

### Training Integration

**File:** `ylff/services/pretrain.py`

**Changes:**

- Detects preprocessed data (checks for `uncertainty_results` in batch)
- Uses `oracle_uncertainty_ensemble_loss()` when available
- Falls back to standard loss for live data (backward compatibility)

## πŸ“ Usage Examples

### Full Workflow

```bash
# Step 1: Pre-process (one-time, overnight)
ylff preprocess arkit data/arkit_sequences \
    --output-cache cache/preprocessed \
    --num-workers 8

# Step 2: Train (fast iteration)
ylff train pretrain data/arkit_sequences \
    --use-preprocessed \
    --preprocessed-cache-dir cache/preprocessed \
    --epochs 50

# Step 3: Iterate on training (no re-preprocessing needed)
ylff train pretrain data/arkit_sequences \
    --use-preprocessed \
    --preprocessed-cache-dir cache/preprocessed \
    --epochs 100 \
    --lr 5e-5  # Lower LR for fine-tuning
```

### When to Re-Preprocess

Only needed if:

- βœ… New sequences added
- βœ… Different DA3 model used for initial inference
- βœ… BA parameters changed
- βœ… Oracle uncertainty parameters changed

**Not needed for:**

- ❌ Training hyperparameter changes (LR, batch size, etc.)
- ❌ Model architecture changes (same input/output)
- ❌ Training iteration (epochs, etc.)

## πŸŽ“ Key Benefits

1. **100-1000x faster training iteration** - No BA during training
2. **Continuous confidence weighting** - No binary rejection
3. **All data contributes** - Low confidence = low weight, not zero
4. **Uncertainty propagation** - Covariance estimates available
5. **Parallelizable pre-processing** - Can process multiple sequences simultaneously
6. **Reusable cache** - Pre-process once, train many times

## πŸ“Š Summary

**Pre-Processing:**

- Runs BA and oracle uncertainty computation offline
- Saves results to cache
- One-time cost per dataset

**Training:**

- Loads pre-computed results
- Fast iteration (no BA)
- Uses continuous confidence weighting
- All data contributes (weighted by confidence)

This architecture enables efficient training while using all available oracle sources! πŸš€