Spaces:

azan888
/

3d_model

Running

App Files Files Community

3d_model / docs /END_TO_END_PIPELINE.md

Azan

Clean deployment build (Squashed)

7a87926 21 days ago

preview code

raw

history blame contribute delete

7.78 kB

End-to-End Training Pipeline Architecture

🎯 Overview

The training pipeline is split into two phases to handle the computational cost of BA:

Pre-Processing Phase (offline, expensive) - Compute BA and oracle uncertainty
Training Phase (online, fast) - Load pre-computed results and train

📊 Pipeline Flow

Phase 1: Pre-Processing (Offline)

When: Run once before training (or when data/model changes)

What it does:

Extract ARKit data (poses, LiDAR) - FREE
Run DA3 inference (GPU, batchable) - Moderate cost
Run BA validation (CPU, expensive) - Only if ARKit quality is poor
Compute oracle uncertainty propagation - Moderate cost
Save to cache - Fast disk I/O

Time: ~10-20 minutes per sequence (mostly BA)

Command:

ylff preprocess arkit data/arkit_sequences \
    --output-cache cache/preprocessed \
    --num-workers 8

Phase 2: Training (Online)

When: Run repeatedly during training iterations

What it does:

Load pre-computed results from cache - Fast (disk I/O)
Run DA3 inference (current model) - GPU, fast
Compute uncertainty-weighted loss - GPU, fast
Backprop & update - Standard training

Time: ~1-3 seconds per sequence

Command:

ylff train pretrain data/arkit_sequences \
    --use-preprocessed \
    --preprocessed-cache-dir cache/preprocessed \
    --epochs 50

🔄 Complete Workflow

Step 1: Pre-Process All Sequences

# Pre-process all ARKit sequences (one-time, can run overnight)
ylff preprocess arkit data/arkit_sequences \
    --output-cache cache/preprocessed \
    --model-name depth-anything/DA3-LARGE \
    --num-workers 8 \
    --use-lidar \
    --prefer-arkit-poses

# This:
# - Extracts ARKit data (free)
# - Runs DA3 inference (GPU)
# - Runs BA only for sequences with poor ARKit tracking
# - Computes oracle uncertainty
# - Saves everything to cache

Output:

cache/preprocessed/
├── sequence_001/
│   ├── oracle_targets.npz      # Best poses/depth (BA or ARKit)
│   ├── uncertainty_results.npz  # Confidence scores, uncertainty
│   ├── arkit_data.npz          # Original ARKit data
│   └── metadata.json           # Sequence info
└── sequence_002/
    └── ...

Step 2: Train Using Pre-Processed Data

# Train using pre-computed results (fast iteration)
ylff train pretrain data/arkit_sequences \
    --use-preprocessed \
    --preprocessed-cache-dir cache/preprocessed \
    --epochs 50 \
    --lr 1e-4 \
    --batch-size 1

What happens:

Loads pre-computed oracle targets and uncertainty from cache
Runs DA3 inference with current model
Computes uncertainty-weighted loss (continuous confidence)
Updates model weights

🚫 Handling Rejection/Failure

No Binary Rejection

Key Principle: All data contributes, just weighted by confidence.

Continuous Confidence Weighting

In Loss Function:

# All pixels/frames contribute, weighted by confidence
loss = confidence * prediction_error

# Low confidence (0.3) → weight=0.3 (contributes less)
# High confidence (0.9) → weight=0.9 (contributes more)
# No hard cutoff - smooth weighting

Failure Scenarios

BA Failure:

✅ Falls back to ARKit poses (if quality good)
✅ Lower confidence score (reflects uncertainty)
✅ Still used for training (just weighted less)
✅ Model learns from ARKit poses with lower confidence

Missing LiDAR:

✅ Uses BA depth (if available)
✅ Or geometric consistency only
✅ Lower confidence score
✅ Still used for training

Poor Tracking:

✅ Lower confidence score
✅ Still used for training
✅ Model learns to handle uncertainty

Key Insight: Even "failed" or low-confidence data contributes to training, just with lower weight. This is better than binary rejection because:

No information loss
Model learns to handle uncertainty
Smooth gradient flow (no hard cutoffs)
Better generalization

📈 Performance Comparison

Without Pre-Processing (Current)

Per Training Iteration:

BA computation: ~5-15 min per sequence (CPU, expensive)
DA3 inference: ~0.5-2 sec per sequence (GPU)
Loss computation: ~0.1-0.5 sec per sequence (GPU)
Total: ~5-15 min per sequence

For 100 sequences:

One epoch: ~8-25 hours
50 epochs: ~17-52 days

With Pre-Processing (New)

Pre-Processing (One-Time):

BA computation: ~5-15 min per sequence (CPU, expensive)
Oracle uncertainty: ~10-30 sec per sequence (CPU)
Total: ~10-20 min per sequence (one-time cost)

Training (Per Iteration):

Load cache: ~0.1-1 sec per sequence (disk I/O)
DA3 inference: ~0.5-2 sec per sequence (GPU)
Loss computation: ~0.1-0.5 sec per sequence (GPU)
Total: ~1-3 sec per sequence

For 100 sequences:

Pre-processing: ~17-33 hours (one-time)
One epoch: ~2-5 minutes
50 epochs: ~2-4 hours

Speedup: 100-1000x faster training iteration!

🔧 Implementation Details

Pre-Processing Service

File: ylff/services/preprocessing.py

Function: preprocess_arkit_sequence()

Steps:

Extract ARKit data (free)
Run DA3 inference (GPU)
Decide: ARKit poses (if quality good) or BA (if quality poor)
Compute oracle uncertainty propagation
Save to cache

Preprocessed Dataset

File: ylff/services/preprocessed_dataset.py

Class: PreprocessedARKitDataset

Features:

Loads pre-computed oracle targets
Loads uncertainty results (confidence, covariance)
Loads ARKit data (for reference)
Fast disk I/O (no BA computation)

Training Integration

File: ylff/services/pretrain.py

Changes:

Detects preprocessed data (checks for uncertainty_results in batch)
Uses oracle_uncertainty_ensemble_loss() when available
Falls back to standard loss for live data (backward compatibility)

📝 Usage Examples

Full Workflow

# Step 1: Pre-process (one-time, overnight)
ylff preprocess arkit data/arkit_sequences \
    --output-cache cache/preprocessed \
    --num-workers 8

# Step 2: Train (fast iteration)
ylff train pretrain data/arkit_sequences \
    --use-preprocessed \
    --preprocessed-cache-dir cache/preprocessed \
    --epochs 50

# Step 3: Iterate on training (no re-preprocessing needed)
ylff train pretrain data/arkit_sequences \
    --use-preprocessed \
    --preprocessed-cache-dir cache/preprocessed \
    --epochs 100 \
    --lr 5e-5  # Lower LR for fine-tuning

When to Re-Preprocess

Only needed if:

✅ New sequences added
✅ Different DA3 model used for initial inference
✅ BA parameters changed
✅ Oracle uncertainty parameters changed

Not needed for:

❌ Training hyperparameter changes (LR, batch size, etc.)
❌ Model architecture changes (same input/output)
❌ Training iteration (epochs, etc.)

🎓 Key Benefits

100-1000x faster training iteration - No BA during training
Continuous confidence weighting - No binary rejection
All data contributes - Low confidence = low weight, not zero
Uncertainty propagation - Covariance estimates available
Parallelizable pre-processing - Can process multiple sequences simultaneously
Reusable cache - Pre-process once, train many times

📊 Summary

Pre-Processing:

Runs BA and oracle uncertainty computation offline
Saves results to cache
One-time cost per dataset

Training:

Loads pre-computed results
Fast iteration (no BA)
Uses continuous confidence weighting
All data contributes (weighted by confidence)

This architecture enables efficient training while using all available oracle sources! 🚀