Spaces:

azan888
/

3d_model

Running

App Files Files Community

3d_model / docs /END_TO_END_PIPELINE.md

Azan

Clean deployment build (Squashed)

7a87926 22 days ago

preview code

raw

history blame contribute delete

7.78 kB

	# End-to-End Training Pipeline Architecture

	## 🎯 Overview

	The training pipeline is split into two phases to handle the computational cost of BA:

	1. Pre-Processing Phase (offline, expensive) - Compute BA and oracle uncertainty
	2. Training Phase (online, fast) - Load pre-computed results and train

	## 📊 Pipeline Flow

	### Phase 1: Pre-Processing (Offline)

	When: Run once before training (or when data/model changes)

	What it does:

	1. Extract ARKit data (poses, LiDAR) - FREE
	2. Run DA3 inference (GPU, batchable) - Moderate cost
	3. Run BA validation (CPU, expensive) - Only if ARKit quality is poor
	4. Compute oracle uncertainty propagation - Moderate cost
	5. Save to cache - Fast disk I/O

	Time: ~10-20 minutes per sequence (mostly BA)

	Command:

	```bash
	ylff preprocess arkit data/arkit_sequences \
	--output-cache cache/preprocessed \
	--num-workers 8
	```

	### Phase 2: Training (Online)

	When: Run repeatedly during training iterations

	What it does:

	1. Load pre-computed results from cache - Fast (disk I/O)
	2. Run DA3 inference (current model) - GPU, fast
	3. Compute uncertainty-weighted loss - GPU, fast
	4. Backprop & update - Standard training

	Time: ~1-3 seconds per sequence

	Command:

	```bash
	ylff train pretrain data/arkit_sequences \
	--use-preprocessed \
	--preprocessed-cache-dir cache/preprocessed \
	--epochs 50
	```

	## 🔄 Complete Workflow

	### Step 1: Pre-Process All Sequences

	```bash
	# Pre-process all ARKit sequences (one-time, can run overnight)
	ylff preprocess arkit data/arkit_sequences \
	--output-cache cache/preprocessed \
	--model-name depth-anything/DA3-LARGE \
	--num-workers 8 \
	--use-lidar \
	--prefer-arkit-poses

	# This:
	# - Extracts ARKit data (free)
	# - Runs DA3 inference (GPU)
	# - Runs BA only for sequences with poor ARKit tracking
	# - Computes oracle uncertainty
	# - Saves everything to cache
	```

	Output:

	```
	cache/preprocessed/
	├── sequence_001/
	│ ├── oracle_targets.npz # Best poses/depth (BA or ARKit)
	│ ├── uncertainty_results.npz # Confidence scores, uncertainty
	│ ├── arkit_data.npz # Original ARKit data
	│ └── metadata.json # Sequence info
	└── sequence_002/
	└── ...
	```

	### Step 2: Train Using Pre-Processed Data

	```bash
	# Train using pre-computed results (fast iteration)
	ylff train pretrain data/arkit_sequences \
	--use-preprocessed \
	--preprocessed-cache-dir cache/preprocessed \
	--epochs 50 \
	--lr 1e-4 \
	--batch-size 1
	```

	What happens:

	1. Loads pre-computed oracle targets and uncertainty from cache
	2. Runs DA3 inference with current model
	3. Computes uncertainty-weighted loss (continuous confidence)
	4. Updates model weights

	## 🚫 Handling Rejection/Failure

	### No Binary Rejection

	Key Principle: All data contributes, just weighted by confidence.

	### Continuous Confidence Weighting

	In Loss Function:

	```python
	# All pixels/frames contribute, weighted by confidence
	loss = confidence * prediction_error

	# Low confidence (0.3) → weight=0.3 (contributes less)
	# High confidence (0.9) → weight=0.9 (contributes more)
	# No hard cutoff - smooth weighting
	```

	### Failure Scenarios

	BA Failure:

	- ✅ Falls back to ARKit poses (if quality good)
	- ✅ Lower confidence score (reflects uncertainty)
	- ✅ Still used for training (just weighted less)
	- ✅ Model learns from ARKit poses with lower confidence

	Missing LiDAR:

	- ✅ Uses BA depth (if available)
	- ✅ Or geometric consistency only
	- ✅ Lower confidence score
	- ✅ Still used for training

	Poor Tracking:

	- ✅ Lower confidence score
	- ✅ Still used for training
	- ✅ Model learns to handle uncertainty

	Key Insight: Even "failed" or low-confidence data contributes to training, just with lower weight. This is better than binary rejection because:

	- No information loss
	- Model learns to handle uncertainty
	- Smooth gradient flow (no hard cutoffs)
	- Better generalization

	## 📈 Performance Comparison

	### Without Pre-Processing (Current)

	Per Training Iteration:

	- BA computation: ~5-15 min per sequence (CPU, expensive)
	- DA3 inference: ~0.5-2 sec per sequence (GPU)
	- Loss computation: ~0.1-0.5 sec per sequence (GPU)
	- Total: ~5-15 min per sequence

	For 100 sequences:

	- One epoch: ~8-25 hours
	- 50 epochs: ~17-52 days

	### With Pre-Processing (New)

	Pre-Processing (One-Time):

	- BA computation: ~5-15 min per sequence (CPU, expensive)
	- Oracle uncertainty: ~10-30 sec per sequence (CPU)
	- Total: ~10-20 min per sequence (one-time cost)

	Training (Per Iteration):

	- Load cache: ~0.1-1 sec per sequence (disk I/O)
	- DA3 inference: ~0.5-2 sec per sequence (GPU)
	- Loss computation: ~0.1-0.5 sec per sequence (GPU)
	- Total: ~1-3 sec per sequence

	For 100 sequences:

	- Pre-processing: ~17-33 hours (one-time)
	- One epoch: ~2-5 minutes
	- 50 epochs: ~2-4 hours

	Speedup: 100-1000x faster training iteration!

	## 🔧 Implementation Details

	### Pre-Processing Service

	File: `ylff/services/preprocessing.py`

	Function: `preprocess_arkit_sequence()`

	Steps:

	1. Extract ARKit data (free)
	2. Run DA3 inference (GPU)
	3. Decide: ARKit poses (if quality good) or BA (if quality poor)
	4. Compute oracle uncertainty propagation
	5. Save to cache

	### Preprocessed Dataset

	File: `ylff/services/preprocessed_dataset.py`

	Class: `PreprocessedARKitDataset`

	Features:

	- Loads pre-computed oracle targets
	- Loads uncertainty results (confidence, covariance)
	- Loads ARKit data (for reference)
	- Fast disk I/O (no BA computation)

	### Training Integration

	File: `ylff/services/pretrain.py`

	Changes:

	- Detects preprocessed data (checks for `uncertainty_results` in batch)
	- Uses `oracle_uncertainty_ensemble_loss()` when available
	- Falls back to standard loss for live data (backward compatibility)

	## 📝 Usage Examples

	### Full Workflow

	```bash
	# Step 1: Pre-process (one-time, overnight)
	ylff preprocess arkit data/arkit_sequences \
	--output-cache cache/preprocessed \
	--num-workers 8

	# Step 2: Train (fast iteration)
	ylff train pretrain data/arkit_sequences \
	--use-preprocessed \
	--preprocessed-cache-dir cache/preprocessed \
	--epochs 50

	# Step 3: Iterate on training (no re-preprocessing needed)
	ylff train pretrain data/arkit_sequences \
	--use-preprocessed \
	--preprocessed-cache-dir cache/preprocessed \
	--epochs 100 \
	--lr 5e-5 # Lower LR for fine-tuning
	```

	### When to Re-Preprocess

	Only needed if:

	- ✅ New sequences added
	- ✅ Different DA3 model used for initial inference
	- ✅ BA parameters changed
	- ✅ Oracle uncertainty parameters changed

	Not needed for:

	- ❌ Training hyperparameter changes (LR, batch size, etc.)
	- ❌ Model architecture changes (same input/output)
	- ❌ Training iteration (epochs, etc.)

	## 🎓 Key Benefits

	1. 100-1000x faster training iteration - No BA during training
	2. Continuous confidence weighting - No binary rejection
	3. All data contributes - Low confidence = low weight, not zero
	4. Uncertainty propagation - Covariance estimates available
	5. Parallelizable pre-processing - Can process multiple sequences simultaneously
	6. Reusable cache - Pre-process once, train many times

	## 📊 Summary

	Pre-Processing:

	- Runs BA and oracle uncertainty computation offline
	- Saves results to cache
	- One-time cost per dataset

	Training:

	- Loads pre-computed results
	- Fast iteration (no BA)
	- Uses continuous confidence weighting
	- All data contributes (weighted by confidence)

	This architecture enables efficient training while using all available oracle sources! 🚀