End-to-End Training Pipeline Architecture
π― Overview
The training pipeline is split into two phases to handle the computational cost of BA:
- Pre-Processing Phase (offline, expensive) - Compute BA and oracle uncertainty
- Training Phase (online, fast) - Load pre-computed results and train
π Pipeline Flow
Phase 1: Pre-Processing (Offline)
When: Run once before training (or when data/model changes)
What it does:
- Extract ARKit data (poses, LiDAR) - FREE
- Run DA3 inference (GPU, batchable) - Moderate cost
- Run BA validation (CPU, expensive) - Only if ARKit quality is poor
- Compute oracle uncertainty propagation - Moderate cost
- Save to cache - Fast disk I/O
Time: ~10-20 minutes per sequence (mostly BA)
Command:
ylff preprocess arkit data/arkit_sequences \
--output-cache cache/preprocessed \
--num-workers 8
Phase 2: Training (Online)
When: Run repeatedly during training iterations
What it does:
- Load pre-computed results from cache - Fast (disk I/O)
- Run DA3 inference (current model) - GPU, fast
- Compute uncertainty-weighted loss - GPU, fast
- Backprop & update - Standard training
Time: ~1-3 seconds per sequence
Command:
ylff train pretrain data/arkit_sequences \
--use-preprocessed \
--preprocessed-cache-dir cache/preprocessed \
--epochs 50
π Complete Workflow
Step 1: Pre-Process All Sequences
# Pre-process all ARKit sequences (one-time, can run overnight)
ylff preprocess arkit data/arkit_sequences \
--output-cache cache/preprocessed \
--model-name depth-anything/DA3-LARGE \
--num-workers 8 \
--use-lidar \
--prefer-arkit-poses
# This:
# - Extracts ARKit data (free)
# - Runs DA3 inference (GPU)
# - Runs BA only for sequences with poor ARKit tracking
# - Computes oracle uncertainty
# - Saves everything to cache
Output:
cache/preprocessed/
βββ sequence_001/
β βββ oracle_targets.npz # Best poses/depth (BA or ARKit)
β βββ uncertainty_results.npz # Confidence scores, uncertainty
β βββ arkit_data.npz # Original ARKit data
β βββ metadata.json # Sequence info
βββ sequence_002/
βββ ...
Step 2: Train Using Pre-Processed Data
# Train using pre-computed results (fast iteration)
ylff train pretrain data/arkit_sequences \
--use-preprocessed \
--preprocessed-cache-dir cache/preprocessed \
--epochs 50 \
--lr 1e-4 \
--batch-size 1
What happens:
- Loads pre-computed oracle targets and uncertainty from cache
- Runs DA3 inference with current model
- Computes uncertainty-weighted loss (continuous confidence)
- Updates model weights
π« Handling Rejection/Failure
No Binary Rejection
Key Principle: All data contributes, just weighted by confidence.
Continuous Confidence Weighting
In Loss Function:
# All pixels/frames contribute, weighted by confidence
loss = confidence * prediction_error
# Low confidence (0.3) β weight=0.3 (contributes less)
# High confidence (0.9) β weight=0.9 (contributes more)
# No hard cutoff - smooth weighting
Failure Scenarios
BA Failure:
- β Falls back to ARKit poses (if quality good)
- β Lower confidence score (reflects uncertainty)
- β Still used for training (just weighted less)
- β Model learns from ARKit poses with lower confidence
Missing LiDAR:
- β Uses BA depth (if available)
- β Or geometric consistency only
- β Lower confidence score
- β Still used for training
Poor Tracking:
- β Lower confidence score
- β Still used for training
- β Model learns to handle uncertainty
Key Insight: Even "failed" or low-confidence data contributes to training, just with lower weight. This is better than binary rejection because:
- No information loss
- Model learns to handle uncertainty
- Smooth gradient flow (no hard cutoffs)
- Better generalization
π Performance Comparison
Without Pre-Processing (Current)
Per Training Iteration:
- BA computation: ~5-15 min per sequence (CPU, expensive)
- DA3 inference: ~0.5-2 sec per sequence (GPU)
- Loss computation: ~0.1-0.5 sec per sequence (GPU)
- Total: ~5-15 min per sequence
For 100 sequences:
- One epoch: ~8-25 hours
- 50 epochs: ~17-52 days
With Pre-Processing (New)
Pre-Processing (One-Time):
- BA computation: ~5-15 min per sequence (CPU, expensive)
- Oracle uncertainty: ~10-30 sec per sequence (CPU)
- Total: ~10-20 min per sequence (one-time cost)
Training (Per Iteration):
- Load cache: ~0.1-1 sec per sequence (disk I/O)
- DA3 inference: ~0.5-2 sec per sequence (GPU)
- Loss computation: ~0.1-0.5 sec per sequence (GPU)
- Total: ~1-3 sec per sequence
For 100 sequences:
- Pre-processing: ~17-33 hours (one-time)
- One epoch: ~2-5 minutes
- 50 epochs: ~2-4 hours
Speedup: 100-1000x faster training iteration!
π§ Implementation Details
Pre-Processing Service
File: ylff/services/preprocessing.py
Function: preprocess_arkit_sequence()
Steps:
- Extract ARKit data (free)
- Run DA3 inference (GPU)
- Decide: ARKit poses (if quality good) or BA (if quality poor)
- Compute oracle uncertainty propagation
- Save to cache
Preprocessed Dataset
File: ylff/services/preprocessed_dataset.py
Class: PreprocessedARKitDataset
Features:
- Loads pre-computed oracle targets
- Loads uncertainty results (confidence, covariance)
- Loads ARKit data (for reference)
- Fast disk I/O (no BA computation)
Training Integration
File: ylff/services/pretrain.py
Changes:
- Detects preprocessed data (checks for
uncertainty_resultsin batch) - Uses
oracle_uncertainty_ensemble_loss()when available - Falls back to standard loss for live data (backward compatibility)
π Usage Examples
Full Workflow
# Step 1: Pre-process (one-time, overnight)
ylff preprocess arkit data/arkit_sequences \
--output-cache cache/preprocessed \
--num-workers 8
# Step 2: Train (fast iteration)
ylff train pretrain data/arkit_sequences \
--use-preprocessed \
--preprocessed-cache-dir cache/preprocessed \
--epochs 50
# Step 3: Iterate on training (no re-preprocessing needed)
ylff train pretrain data/arkit_sequences \
--use-preprocessed \
--preprocessed-cache-dir cache/preprocessed \
--epochs 100 \
--lr 5e-5 # Lower LR for fine-tuning
When to Re-Preprocess
Only needed if:
- β New sequences added
- β Different DA3 model used for initial inference
- β BA parameters changed
- β Oracle uncertainty parameters changed
Not needed for:
- β Training hyperparameter changes (LR, batch size, etc.)
- β Model architecture changes (same input/output)
- β Training iteration (epochs, etc.)
π Key Benefits
- 100-1000x faster training iteration - No BA during training
- Continuous confidence weighting - No binary rejection
- All data contributes - Low confidence = low weight, not zero
- Uncertainty propagation - Covariance estimates available
- Parallelizable pre-processing - Can process multiple sequences simultaneously
- Reusable cache - Pre-process once, train many times
π Summary
Pre-Processing:
- Runs BA and oracle uncertainty computation offline
- Saves results to cache
- One-time cost per dataset
Training:
- Loads pre-computed results
- Fast iteration (no BA)
- Uses continuous confidence weighting
- All data contributes (weighted by confidence)
This architecture enables efficient training while using all available oracle sources! π