| # End-to-End Training Pipeline Architecture | |
| ## π― Overview | |
| The training pipeline is split into **two phases** to handle the computational cost of BA: | |
| 1. **Pre-Processing Phase** (offline, expensive) - Compute BA and oracle uncertainty | |
| 2. **Training Phase** (online, fast) - Load pre-computed results and train | |
| ## π Pipeline Flow | |
| ### Phase 1: Pre-Processing (Offline) | |
| **When:** Run once before training (or when data/model changes) | |
| **What it does:** | |
| 1. Extract ARKit data (poses, LiDAR) - **FREE** | |
| 2. Run DA3 inference (GPU, batchable) - **Moderate cost** | |
| 3. Run BA validation (CPU, expensive) - **Only if ARKit quality is poor** | |
| 4. Compute oracle uncertainty propagation - **Moderate cost** | |
| 5. Save to cache - **Fast disk I/O** | |
| **Time:** ~10-20 minutes per sequence (mostly BA) | |
| **Command:** | |
| ```bash | |
| ylff preprocess arkit data/arkit_sequences \ | |
| --output-cache cache/preprocessed \ | |
| --num-workers 8 | |
| ``` | |
| ### Phase 2: Training (Online) | |
| **When:** Run repeatedly during training iterations | |
| **What it does:** | |
| 1. Load pre-computed results from cache - **Fast (disk I/O)** | |
| 2. Run DA3 inference (current model) - **GPU, fast** | |
| 3. Compute uncertainty-weighted loss - **GPU, fast** | |
| 4. Backprop & update - **Standard training** | |
| **Time:** ~1-3 seconds per sequence | |
| **Command:** | |
| ```bash | |
| ylff train pretrain data/arkit_sequences \ | |
| --use-preprocessed \ | |
| --preprocessed-cache-dir cache/preprocessed \ | |
| --epochs 50 | |
| ``` | |
| ## π Complete Workflow | |
| ### Step 1: Pre-Process All Sequences | |
| ```bash | |
| # Pre-process all ARKit sequences (one-time, can run overnight) | |
| ylff preprocess arkit data/arkit_sequences \ | |
| --output-cache cache/preprocessed \ | |
| --model-name depth-anything/DA3-LARGE \ | |
| --num-workers 8 \ | |
| --use-lidar \ | |
| --prefer-arkit-poses | |
| # This: | |
| # - Extracts ARKit data (free) | |
| # - Runs DA3 inference (GPU) | |
| # - Runs BA only for sequences with poor ARKit tracking | |
| # - Computes oracle uncertainty | |
| # - Saves everything to cache | |
| ``` | |
| **Output:** | |
| ``` | |
| cache/preprocessed/ | |
| βββ sequence_001/ | |
| β βββ oracle_targets.npz # Best poses/depth (BA or ARKit) | |
| β βββ uncertainty_results.npz # Confidence scores, uncertainty | |
| β βββ arkit_data.npz # Original ARKit data | |
| β βββ metadata.json # Sequence info | |
| βββ sequence_002/ | |
| βββ ... | |
| ``` | |
| ### Step 2: Train Using Pre-Processed Data | |
| ```bash | |
| # Train using pre-computed results (fast iteration) | |
| ylff train pretrain data/arkit_sequences \ | |
| --use-preprocessed \ | |
| --preprocessed-cache-dir cache/preprocessed \ | |
| --epochs 50 \ | |
| --lr 1e-4 \ | |
| --batch-size 1 | |
| ``` | |
| **What happens:** | |
| 1. Loads pre-computed oracle targets and uncertainty from cache | |
| 2. Runs DA3 inference with current model | |
| 3. Computes uncertainty-weighted loss (continuous confidence) | |
| 4. Updates model weights | |
| ## π« Handling Rejection/Failure | |
| ### No Binary Rejection | |
| **Key Principle:** All data contributes, just weighted by confidence. | |
| ### Continuous Confidence Weighting | |
| **In Loss Function:** | |
| ```python | |
| # All pixels/frames contribute, weighted by confidence | |
| loss = confidence * prediction_error | |
| # Low confidence (0.3) β weight=0.3 (contributes less) | |
| # High confidence (0.9) β weight=0.9 (contributes more) | |
| # No hard cutoff - smooth weighting | |
| ``` | |
| ### Failure Scenarios | |
| **BA Failure:** | |
| - β Falls back to ARKit poses (if quality good) | |
| - β Lower confidence score (reflects uncertainty) | |
| - β Still used for training (just weighted less) | |
| - β Model learns from ARKit poses with lower confidence | |
| **Missing LiDAR:** | |
| - β Uses BA depth (if available) | |
| - β Or geometric consistency only | |
| - β Lower confidence score | |
| - β Still used for training | |
| **Poor Tracking:** | |
| - β Lower confidence score | |
| - β Still used for training | |
| - β Model learns to handle uncertainty | |
| **Key Insight:** Even "failed" or low-confidence data contributes to training, just with lower weight. This is better than binary rejection because: | |
| - No information loss | |
| - Model learns to handle uncertainty | |
| - Smooth gradient flow (no hard cutoffs) | |
| - Better generalization | |
| ## π Performance Comparison | |
| ### Without Pre-Processing (Current) | |
| **Per Training Iteration:** | |
| - BA computation: ~5-15 min per sequence (CPU, expensive) | |
| - DA3 inference: ~0.5-2 sec per sequence (GPU) | |
| - Loss computation: ~0.1-0.5 sec per sequence (GPU) | |
| - **Total: ~5-15 min per sequence** | |
| **For 100 sequences:** | |
| - One epoch: ~8-25 hours | |
| - 50 epochs: ~17-52 days | |
| ### With Pre-Processing (New) | |
| **Pre-Processing (One-Time):** | |
| - BA computation: ~5-15 min per sequence (CPU, expensive) | |
| - Oracle uncertainty: ~10-30 sec per sequence (CPU) | |
| - **Total: ~10-20 min per sequence** (one-time cost) | |
| **Training (Per Iteration):** | |
| - Load cache: ~0.1-1 sec per sequence (disk I/O) | |
| - DA3 inference: ~0.5-2 sec per sequence (GPU) | |
| - Loss computation: ~0.1-0.5 sec per sequence (GPU) | |
| - **Total: ~1-3 sec per sequence** | |
| **For 100 sequences:** | |
| - Pre-processing: ~17-33 hours (one-time) | |
| - One epoch: ~2-5 minutes | |
| - 50 epochs: ~2-4 hours | |
| **Speedup:** 100-1000x faster training iteration! | |
| ## π§ Implementation Details | |
| ### Pre-Processing Service | |
| **File:** `ylff/services/preprocessing.py` | |
| **Function:** `preprocess_arkit_sequence()` | |
| **Steps:** | |
| 1. Extract ARKit data (free) | |
| 2. Run DA3 inference (GPU) | |
| 3. Decide: ARKit poses (if quality good) or BA (if quality poor) | |
| 4. Compute oracle uncertainty propagation | |
| 5. Save to cache | |
| ### Preprocessed Dataset | |
| **File:** `ylff/services/preprocessed_dataset.py` | |
| **Class:** `PreprocessedARKitDataset` | |
| **Features:** | |
| - Loads pre-computed oracle targets | |
| - Loads uncertainty results (confidence, covariance) | |
| - Loads ARKit data (for reference) | |
| - Fast disk I/O (no BA computation) | |
| ### Training Integration | |
| **File:** `ylff/services/pretrain.py` | |
| **Changes:** | |
| - Detects preprocessed data (checks for `uncertainty_results` in batch) | |
| - Uses `oracle_uncertainty_ensemble_loss()` when available | |
| - Falls back to standard loss for live data (backward compatibility) | |
| ## π Usage Examples | |
| ### Full Workflow | |
| ```bash | |
| # Step 1: Pre-process (one-time, overnight) | |
| ylff preprocess arkit data/arkit_sequences \ | |
| --output-cache cache/preprocessed \ | |
| --num-workers 8 | |
| # Step 2: Train (fast iteration) | |
| ylff train pretrain data/arkit_sequences \ | |
| --use-preprocessed \ | |
| --preprocessed-cache-dir cache/preprocessed \ | |
| --epochs 50 | |
| # Step 3: Iterate on training (no re-preprocessing needed) | |
| ylff train pretrain data/arkit_sequences \ | |
| --use-preprocessed \ | |
| --preprocessed-cache-dir cache/preprocessed \ | |
| --epochs 100 \ | |
| --lr 5e-5 # Lower LR for fine-tuning | |
| ``` | |
| ### When to Re-Preprocess | |
| Only needed if: | |
| - β New sequences added | |
| - β Different DA3 model used for initial inference | |
| - β BA parameters changed | |
| - β Oracle uncertainty parameters changed | |
| **Not needed for:** | |
| - β Training hyperparameter changes (LR, batch size, etc.) | |
| - β Model architecture changes (same input/output) | |
| - β Training iteration (epochs, etc.) | |
| ## π Key Benefits | |
| 1. **100-1000x faster training iteration** - No BA during training | |
| 2. **Continuous confidence weighting** - No binary rejection | |
| 3. **All data contributes** - Low confidence = low weight, not zero | |
| 4. **Uncertainty propagation** - Covariance estimates available | |
| 5. **Parallelizable pre-processing** - Can process multiple sequences simultaneously | |
| 6. **Reusable cache** - Pre-process once, train many times | |
| ## π Summary | |
| **Pre-Processing:** | |
| - Runs BA and oracle uncertainty computation offline | |
| - Saves results to cache | |
| - One-time cost per dataset | |
| **Training:** | |
| - Loads pre-computed results | |
| - Fast iteration (no BA) | |
| - Uses continuous confidence weighting | |
| - All data contributes (weighted by confidence) | |
| This architecture enables efficient training while using all available oracle sources! π | |