3d_model / docs /END_TO_END_PIPELINE.md
Azan
Clean deployment build (Squashed)
7a87926

End-to-End Training Pipeline Architecture

🎯 Overview

The training pipeline is split into two phases to handle the computational cost of BA:

  1. Pre-Processing Phase (offline, expensive) - Compute BA and oracle uncertainty
  2. Training Phase (online, fast) - Load pre-computed results and train

πŸ“Š Pipeline Flow

Phase 1: Pre-Processing (Offline)

When: Run once before training (or when data/model changes)

What it does:

  1. Extract ARKit data (poses, LiDAR) - FREE
  2. Run DA3 inference (GPU, batchable) - Moderate cost
  3. Run BA validation (CPU, expensive) - Only if ARKit quality is poor
  4. Compute oracle uncertainty propagation - Moderate cost
  5. Save to cache - Fast disk I/O

Time: ~10-20 minutes per sequence (mostly BA)

Command:

ylff preprocess arkit data/arkit_sequences \
    --output-cache cache/preprocessed \
    --num-workers 8

Phase 2: Training (Online)

When: Run repeatedly during training iterations

What it does:

  1. Load pre-computed results from cache - Fast (disk I/O)
  2. Run DA3 inference (current model) - GPU, fast
  3. Compute uncertainty-weighted loss - GPU, fast
  4. Backprop & update - Standard training

Time: ~1-3 seconds per sequence

Command:

ylff train pretrain data/arkit_sequences \
    --use-preprocessed \
    --preprocessed-cache-dir cache/preprocessed \
    --epochs 50

πŸ”„ Complete Workflow

Step 1: Pre-Process All Sequences

# Pre-process all ARKit sequences (one-time, can run overnight)
ylff preprocess arkit data/arkit_sequences \
    --output-cache cache/preprocessed \
    --model-name depth-anything/DA3-LARGE \
    --num-workers 8 \
    --use-lidar \
    --prefer-arkit-poses

# This:
# - Extracts ARKit data (free)
# - Runs DA3 inference (GPU)
# - Runs BA only for sequences with poor ARKit tracking
# - Computes oracle uncertainty
# - Saves everything to cache

Output:

cache/preprocessed/
β”œβ”€β”€ sequence_001/
β”‚   β”œβ”€β”€ oracle_targets.npz      # Best poses/depth (BA or ARKit)
β”‚   β”œβ”€β”€ uncertainty_results.npz  # Confidence scores, uncertainty
β”‚   β”œβ”€β”€ arkit_data.npz          # Original ARKit data
β”‚   └── metadata.json           # Sequence info
└── sequence_002/
    └── ...

Step 2: Train Using Pre-Processed Data

# Train using pre-computed results (fast iteration)
ylff train pretrain data/arkit_sequences \
    --use-preprocessed \
    --preprocessed-cache-dir cache/preprocessed \
    --epochs 50 \
    --lr 1e-4 \
    --batch-size 1

What happens:

  1. Loads pre-computed oracle targets and uncertainty from cache
  2. Runs DA3 inference with current model
  3. Computes uncertainty-weighted loss (continuous confidence)
  4. Updates model weights

🚫 Handling Rejection/Failure

No Binary Rejection

Key Principle: All data contributes, just weighted by confidence.

Continuous Confidence Weighting

In Loss Function:

# All pixels/frames contribute, weighted by confidence
loss = confidence * prediction_error

# Low confidence (0.3) β†’ weight=0.3 (contributes less)
# High confidence (0.9) β†’ weight=0.9 (contributes more)
# No hard cutoff - smooth weighting

Failure Scenarios

BA Failure:

  • βœ… Falls back to ARKit poses (if quality good)
  • βœ… Lower confidence score (reflects uncertainty)
  • βœ… Still used for training (just weighted less)
  • βœ… Model learns from ARKit poses with lower confidence

Missing LiDAR:

  • βœ… Uses BA depth (if available)
  • βœ… Or geometric consistency only
  • βœ… Lower confidence score
  • βœ… Still used for training

Poor Tracking:

  • βœ… Lower confidence score
  • βœ… Still used for training
  • βœ… Model learns to handle uncertainty

Key Insight: Even "failed" or low-confidence data contributes to training, just with lower weight. This is better than binary rejection because:

  • No information loss
  • Model learns to handle uncertainty
  • Smooth gradient flow (no hard cutoffs)
  • Better generalization

πŸ“ˆ Performance Comparison

Without Pre-Processing (Current)

Per Training Iteration:

  • BA computation: ~5-15 min per sequence (CPU, expensive)
  • DA3 inference: ~0.5-2 sec per sequence (GPU)
  • Loss computation: ~0.1-0.5 sec per sequence (GPU)
  • Total: ~5-15 min per sequence

For 100 sequences:

  • One epoch: ~8-25 hours
  • 50 epochs: ~17-52 days

With Pre-Processing (New)

Pre-Processing (One-Time):

  • BA computation: ~5-15 min per sequence (CPU, expensive)
  • Oracle uncertainty: ~10-30 sec per sequence (CPU)
  • Total: ~10-20 min per sequence (one-time cost)

Training (Per Iteration):

  • Load cache: ~0.1-1 sec per sequence (disk I/O)
  • DA3 inference: ~0.5-2 sec per sequence (GPU)
  • Loss computation: ~0.1-0.5 sec per sequence (GPU)
  • Total: ~1-3 sec per sequence

For 100 sequences:

  • Pre-processing: ~17-33 hours (one-time)
  • One epoch: ~2-5 minutes
  • 50 epochs: ~2-4 hours

Speedup: 100-1000x faster training iteration!

πŸ”§ Implementation Details

Pre-Processing Service

File: ylff/services/preprocessing.py

Function: preprocess_arkit_sequence()

Steps:

  1. Extract ARKit data (free)
  2. Run DA3 inference (GPU)
  3. Decide: ARKit poses (if quality good) or BA (if quality poor)
  4. Compute oracle uncertainty propagation
  5. Save to cache

Preprocessed Dataset

File: ylff/services/preprocessed_dataset.py

Class: PreprocessedARKitDataset

Features:

  • Loads pre-computed oracle targets
  • Loads uncertainty results (confidence, covariance)
  • Loads ARKit data (for reference)
  • Fast disk I/O (no BA computation)

Training Integration

File: ylff/services/pretrain.py

Changes:

  • Detects preprocessed data (checks for uncertainty_results in batch)
  • Uses oracle_uncertainty_ensemble_loss() when available
  • Falls back to standard loss for live data (backward compatibility)

πŸ“ Usage Examples

Full Workflow

# Step 1: Pre-process (one-time, overnight)
ylff preprocess arkit data/arkit_sequences \
    --output-cache cache/preprocessed \
    --num-workers 8

# Step 2: Train (fast iteration)
ylff train pretrain data/arkit_sequences \
    --use-preprocessed \
    --preprocessed-cache-dir cache/preprocessed \
    --epochs 50

# Step 3: Iterate on training (no re-preprocessing needed)
ylff train pretrain data/arkit_sequences \
    --use-preprocessed \
    --preprocessed-cache-dir cache/preprocessed \
    --epochs 100 \
    --lr 5e-5  # Lower LR for fine-tuning

When to Re-Preprocess

Only needed if:

  • βœ… New sequences added
  • βœ… Different DA3 model used for initial inference
  • βœ… BA parameters changed
  • βœ… Oracle uncertainty parameters changed

Not needed for:

  • ❌ Training hyperparameter changes (LR, batch size, etc.)
  • ❌ Model architecture changes (same input/output)
  • ❌ Training iteration (epochs, etc.)

πŸŽ“ Key Benefits

  1. 100-1000x faster training iteration - No BA during training
  2. Continuous confidence weighting - No binary rejection
  3. All data contributes - Low confidence = low weight, not zero
  4. Uncertainty propagation - Covariance estimates available
  5. Parallelizable pre-processing - Can process multiple sequences simultaneously
  6. Reusable cache - Pre-process once, train many times

πŸ“Š Summary

Pre-Processing:

  • Runs BA and oracle uncertainty computation offline
  • Saves results to cache
  • One-time cost per dataset

Training:

  • Loads pre-computed results
  • Fast iteration (no BA)
  • Uses continuous confidence weighting
  • All data contributes (weighted by confidence)

This architecture enables efficient training while using all available oracle sources! πŸš€