Spaces:

azan888
/

3d_model

Sleeping

File size: 7,777 Bytes

7a87926

# End-to-End Training Pipeline Architecture

## 🎯 Overview

The training pipeline is split into **two phases** to handle the computational cost of BA:

1. **Pre-Processing Phase** (offline, expensive) - Compute BA and oracle uncertainty
2. **Training Phase** (online, fast) - Load pre-computed results and train

## 📊 Pipeline Flow

### Phase 1: Pre-Processing (Offline)

**When:** Run once before training (or when data/model changes)

**What it does:**

1. Extract ARKit data (poses, LiDAR) - **FREE**
2. Run DA3 inference (GPU, batchable) - **Moderate cost**
3. Run BA validation (CPU, expensive) - **Only if ARKit quality is poor**
4. Compute oracle uncertainty propagation - **Moderate cost**
5. Save to cache - **Fast disk I/O**

**Time:** ~10-20 minutes per sequence (mostly BA)

**Command:**

```bash
ylff preprocess arkit data/arkit_sequences \
    --output-cache cache/preprocessed \
    --num-workers 8
```

### Phase 2: Training (Online)

**When:** Run repeatedly during training iterations

**What it does:**

1. Load pre-computed results from cache - **Fast (disk I/O)**
2. Run DA3 inference (current model) - **GPU, fast**
3. Compute uncertainty-weighted loss - **GPU, fast**
4. Backprop & update - **Standard training**

**Time:** ~1-3 seconds per sequence

**Command:**

```bash
ylff train pretrain data/arkit_sequences \
    --use-preprocessed \
    --preprocessed-cache-dir cache/preprocessed \
    --epochs 50
```

## 🔄 Complete Workflow

### Step 1: Pre-Process All Sequences

```bash
# Pre-process all ARKit sequences (one-time, can run overnight)
ylff preprocess arkit data/arkit_sequences \
    --output-cache cache/preprocessed \
    --model-name depth-anything/DA3-LARGE \
    --num-workers 8 \
    --use-lidar \
    --prefer-arkit-poses

# This:
# - Extracts ARKit data (free)
# - Runs DA3 inference (GPU)
# - Runs BA only for sequences with poor ARKit tracking
# - Computes oracle uncertainty
# - Saves everything to cache
```

**Output:**

```
cache/preprocessed/
├── sequence_001/
│   ├── oracle_targets.npz      # Best poses/depth (BA or ARKit)
│   ├── uncertainty_results.npz  # Confidence scores, uncertainty
│   ├── arkit_data.npz          # Original ARKit data
│   └── metadata.json           # Sequence info
└── sequence_002/
    └── ...
```

### Step 2: Train Using Pre-Processed Data

```bash
# Train using pre-computed results (fast iteration)
ylff train pretrain data/arkit_sequences \
    --use-preprocessed \
    --preprocessed-cache-dir cache/preprocessed \
    --epochs 50 \
    --lr 1e-4 \
    --batch-size 1
```

**What happens:**

1. Loads pre-computed oracle targets and uncertainty from cache
2. Runs DA3 inference with current model
3. Computes uncertainty-weighted loss (continuous confidence)
4. Updates model weights

## 🚫 Handling Rejection/Failure

### No Binary Rejection

**Key Principle:** All data contributes, just weighted by confidence.

### Continuous Confidence Weighting

**In Loss Function:**

```python
# All pixels/frames contribute, weighted by confidence
loss = confidence * prediction_error

# Low confidence (0.3) → weight=0.3 (contributes less)
# High confidence (0.9) → weight=0.9 (contributes more)
# No hard cutoff - smooth weighting
```

### Failure Scenarios

**BA Failure:**

- ✅ Falls back to ARKit poses (if quality good)
- ✅ Lower confidence score (reflects uncertainty)
- ✅ Still used for training (just weighted less)
- ✅ Model learns from ARKit poses with lower confidence

**Missing LiDAR:**

- ✅ Uses BA depth (if available)
- ✅ Or geometric consistency only
- ✅ Lower confidence score
- ✅ Still used for training

**Poor Tracking:**

- ✅ Lower confidence score
- ✅ Still used for training
- ✅ Model learns to handle uncertainty

**Key Insight:** Even "failed" or low-confidence data contributes to training, just with lower weight. This is better than binary rejection because:

- No information loss
- Model learns to handle uncertainty
- Smooth gradient flow (no hard cutoffs)
- Better generalization

## 📈 Performance Comparison

### Without Pre-Processing (Current)

**Per Training Iteration:**

- BA computation: ~5-15 min per sequence (CPU, expensive)
- DA3 inference: ~0.5-2 sec per sequence (GPU)
- Loss computation: ~0.1-0.5 sec per sequence (GPU)
- **Total: ~5-15 min per sequence**

**For 100 sequences:**

- One epoch: ~8-25 hours
- 50 epochs: ~17-52 days

### With Pre-Processing (New)

**Pre-Processing (One-Time):**

- BA computation: ~5-15 min per sequence (CPU, expensive)
- Oracle uncertainty: ~10-30 sec per sequence (CPU)
- **Total: ~10-20 min per sequence** (one-time cost)

**Training (Per Iteration):**

- Load cache: ~0.1-1 sec per sequence (disk I/O)
- DA3 inference: ~0.5-2 sec per sequence (GPU)
- Loss computation: ~0.1-0.5 sec per sequence (GPU)
- **Total: ~1-3 sec per sequence**

**For 100 sequences:**

- Pre-processing: ~17-33 hours (one-time)
- One epoch: ~2-5 minutes
- 50 epochs: ~2-4 hours

**Speedup:** 100-1000x faster training iteration!

## 🔧 Implementation Details

### Pre-Processing Service

**File:** `ylff/services/preprocessing.py`

**Function:** `preprocess_arkit_sequence()`

**Steps:**

1. Extract ARKit data (free)
2. Run DA3 inference (GPU)
3. Decide: ARKit poses (if quality good) or BA (if quality poor)
4. Compute oracle uncertainty propagation
5. Save to cache

### Preprocessed Dataset

**File:** `ylff/services/preprocessed_dataset.py`

**Class:** `PreprocessedARKitDataset`

**Features:**

- Loads pre-computed oracle targets
- Loads uncertainty results (confidence, covariance)
- Loads ARKit data (for reference)
- Fast disk I/O (no BA computation)

### Training Integration

**File:** `ylff/services/pretrain.py`

**Changes:**

- Detects preprocessed data (checks for `uncertainty_results` in batch)
- Uses `oracle_uncertainty_ensemble_loss()` when available
- Falls back to standard loss for live data (backward compatibility)

## 📝 Usage Examples

### Full Workflow

```bash
# Step 1: Pre-process (one-time, overnight)
ylff preprocess arkit data/arkit_sequences \
    --output-cache cache/preprocessed \
    --num-workers 8

# Step 2: Train (fast iteration)
ylff train pretrain data/arkit_sequences \
    --use-preprocessed \
    --preprocessed-cache-dir cache/preprocessed \
    --epochs 50

# Step 3: Iterate on training (no re-preprocessing needed)
ylff train pretrain data/arkit_sequences \
    --use-preprocessed \
    --preprocessed-cache-dir cache/preprocessed \
    --epochs 100 \
    --lr 5e-5  # Lower LR for fine-tuning
```

### When to Re-Preprocess

Only needed if:

- ✅ New sequences added
- ✅ Different DA3 model used for initial inference
- ✅ BA parameters changed
- ✅ Oracle uncertainty parameters changed

**Not needed for:**

- ❌ Training hyperparameter changes (LR, batch size, etc.)
- ❌ Model architecture changes (same input/output)
- ❌ Training iteration (epochs, etc.)

## 🎓 Key Benefits

1. **100-1000x faster training iteration** - No BA during training
2. **Continuous confidence weighting** - No binary rejection
3. **All data contributes** - Low confidence = low weight, not zero
4. **Uncertainty propagation** - Covariance estimates available
5. **Parallelizable pre-processing** - Can process multiple sequences simultaneously
6. **Reusable cache** - Pre-process once, train many times

## 📊 Summary

**Pre-Processing:**

- Runs BA and oracle uncertainty computation offline
- Saves results to cache
- One-time cost per dataset

**Training:**

- Loads pre-computed results
- Fast iteration (no BA)
- Uses continuous confidence weighting
- All data contributes (weighted by confidence)

This architecture enables efficient training while using all available oracle sources! 🚀