Spaces:

azan888
/

3d_model

Running

App Files Files Community

3d_model / README.md

Azan

Clean deployment build (Squashed)

7a87926 27 days ago

preview code

raw

history blame contribute delete

37.6 kB

	---
	title: YLFF Training
	emoji: 🚀
	colorFrom: blue
	colorTo: purple
	sdk: docker
	app_port: 7860
	---

	# You Learn From Failure (YLFF)

	Geometric Consistency First: Training Visual Geometry Models with BA Supervision

	## Overview

	YLFF is a unified framework for training geometrically accurate depth estimation models using Bundle Adjustment (BA) and LiDAR as oracle teachers. Unlike traditional approaches that prioritize perceptual quality, YLFF treats geometric consistency as a first-order goal.

	### Core Philosophy

	Geometric Accuracy > Perceptual Quality

	- Multi-view geometric consistency is the primary objective (not just regularization)
	- Absolute scale accuracy is critical for metric depth estimation
	- Multi-view pose consistency is essential for 3D reconstruction
	- Teacher-student learning provides stability during training

	## End-to-End Pipeline

	The complete YLFF pipeline from data collection to trained model:

	```mermaid
	flowchart TD
	Start([Start: Data Collection]) --> Upload[Upload ARKit Sequences]
	Upload --> Extract[Extract ARKit Data<br/>Poses, LiDAR, Intrinsics]

	Extract --> Preprocess{Pre-Processing Phase<br/>Offline, Expensive}

	Preprocess --> DA3Infer[Run DA3 Inference<br/>Initial Predictions]
	DA3Infer --> QualityCheck{ARKit Quality<br/>Check}

	QualityCheck -->\|High Quality<br/>≥ 0.8\| UseARKit[Use ARKit Poses<br/>Skip BA]
	QualityCheck -->\|Low Quality<br/>< 0.8\| RunBA[Run BA Validation<br/>Refine Poses]

	UseARKit --> OracleUncertainty[Compute Oracle Uncertainty<br/>Confidence Maps]
	RunBA --> OracleUncertainty

	OracleUncertainty --> SelectTargets[Select Oracle Targets<br/>BA or ARKit Poses]
	SelectTargets --> Cache[Save to Cache<br/>oracle_targets.npz<br/>uncertainty_results.npz]

	Cache --> TrainingPhase{Training Phase<br/>Online, Fast}

	TrainingPhase --> LoadCache[Load Pre-Computed<br/>Oracle Results]
	LoadCache --> LoadModel[Load/Resume Model<br/>Student + Teacher]

	LoadModel --> TrainingLoop[Training Loop]

	TrainingLoop --> Forward[Forward Pass<br/>Student Model Inference]
	Forward --> ComputeLoss[Compute Geometric Losses<br/>Multi-view: 3.0<br/>Absolute Scale: 2.5<br/>Pose: 2.0<br/>Gradient: 1.0<br/>Teacher: 0.5]

	ComputeLoss --> Backward[Backward Pass<br/>Gradient Computation]
	Backward --> ClipGrad[Gradient Clipping<br/>Max Norm: 1.0]
	ClipGrad --> Update[Update Weights<br/>AdamW Optimizer]

	Update --> UpdateTeacher[Update Teacher Model<br/>EMA Decay: 0.999]
	UpdateTeacher --> Scheduler[Update Learning Rate<br/>Cosine Annealing]

	Scheduler --> Checkpoint{Checkpoint<br/>Interval?}

	Checkpoint -->\|Every N Steps\| SaveCheckpoint[Save Checkpoint<br/>Periodic + Best + Latest]
	Checkpoint -->\|Continue\| LogMetrics[Log Metrics<br/>W&B / Console]

	SaveCheckpoint --> LogMetrics
	LogMetrics --> EpochComplete{Epoch<br/>Complete?}

	EpochComplete -->\|No\| TrainingLoop
	EpochComplete -->\|Yes\| MoreEpochs{More<br/>Epochs?}

	MoreEpochs -->\|Yes\| TrainingLoop
	MoreEpochs -->\|No\| SaveFinal[Save Final Checkpoint<br/>Final Model State]

	SaveFinal --> Evaluate[Evaluate Model<br/>BA Agreement]
	Evaluate --> Results[Training Results<br/>Metrics & Checkpoints]

	Results --> Resume{Resume<br/>Training?}
	Resume -->\|Yes\| LoadCheckpoint[Load Checkpoint<br/>latest_checkpoint.pt]
	LoadCheckpoint --> LoadModel
	Resume -->\|No\| End([End: Trained Model])

	style Preprocess fill:#e1f5ff
	style TrainingPhase fill:#fff4e1
	style ComputeLoss fill:#ffe1f5
	style SaveCheckpoint fill:#e1ffe1
	style Evaluate fill:#f5e1ff
	```

	### Pipeline Stages

	#### 1. Data Collection & Upload

	- Input: ARKit sequences (video + metadata.json)
	- Extract: Poses, LiDAR depth, camera intrinsics
	- Output: Structured ARKit data

	#### 2. Pre-Processing Phase (Offline)

	- DA3 Inference: Initial depth/pose predictions (GPU)
	- Quality Check: Evaluate ARKit tracking quality
	- BA Validation: Run only if ARKit quality < threshold (CPU, expensive)
	- Oracle Uncertainty: Compute confidence maps from multiple sources
	- Cache Results: Save oracle targets and uncertainty to disk
	- Time: ~10-20 min per sequence (one-time cost)

	#### 3. Training Phase (Online)

	- Load Cache: Fast disk I/O of pre-computed results
	- Model Loading: Load or resume from checkpoint (student + teacher)
	- Training Loop:
	- Forward pass through student model
	- Compute geometric losses (primary objective)
	- Backward pass with gradient clipping
	- Update weights (AdamW optimizer)
	- Update teacher model (EMA)
	- Update learning rate (cosine scheduler)
	- Checkpointing: Save periodic, best, and latest checkpoints
	- Logging: Metrics to W&B and console
	- Time: ~1-3 sec per sequence (100-1000x faster than BA)

	#### 4. Evaluation & Resumption

	- Evaluation: Test model agreement with BA
	- Resume: Load checkpoint to continue training
	- Final Model: Best checkpoint saved for deployment

	## Key Features

	### 🎯 Unified Training Approach

	- Single Training Service: `ylff/services/ylff_training.py` consolidates all training methods
	- DINOv2 Backbone: Teacher-student paradigm with EMA teacher for stable training
	- DA3 Techniques: Depth-ray representation, multi-resolution training
	- Geometric Losses: Multi-view consistency, absolute scale, pose accuracy as primary objectives

	### 📊 Two-Phase Pipeline

	1. Pre-Processing Phase (offline, expensive)

	- Compute BA validation and oracle uncertainty
	- Cache results for fast training iteration
	- Can be parallelized across sequences

	2. Training Phase (online, fast)
	- Load pre-computed oracle results
	- Train with geometric losses as primary objective
	- 100-1000x faster than computing BA during training

	### 🔧 Core Components

	- BA Validation: Validate model predictions using COLMAP Bundle Adjustment
	- ARKit Integration: Process ARKit data with ground truth poses and LiDAR depth
	- Oracle Uncertainty: Continuous confidence weighting (not binary rejection)
	- Geometric Losses: Multi-view consistency, absolute scale, pose reprojection error
	- Unified Training: Single training service with geometric consistency first

	## Installation

	### Basic Installation

	```bash
	# Clone repository
	git clone <repository-url>
	cd ylff

	# Create virtual environment
	python -m venv .venv
	source .venv/bin/activate # On Windows: .venv\Scripts\activate

	# Install package
	pip install -e .

	# Install optional dependencies
	pip install -e ".[gui]" # For GUI visualization
	```

	### BA Pipeline Setup

	For BA validation, you need additional dependencies:

	```bash
	# Install BA pipeline dependencies
	bash scripts/bin/setup_ba_pipeline.sh

	# Or manually:
	pip install pycolmap
	# Install hloc from source (see docs/SETUP.md)
	# Install LightGlue from source (see docs/SETUP.md)
	```

	See `docs/SETUP.md` for detailed installation instructions.

	## Quick Start

	### 1. Pre-Process ARKit Sequences

	```bash
	# Pre-process ARKit sequences (offline, can run overnight)
	ylff preprocess arkit data/arkit_sequences \
	--output-cache cache/preprocessed \
	--model-name depth-anything/DA3-LARGE \
	--num-workers 8 \
	--prefer-arkit-poses
	```

	This computes BA and oracle uncertainty for all sequences and caches results.

	### 2. Train with Unified Service

	```bash
	# Train using pre-computed results (fast iteration)
	ylff train unified cache/preprocessed \
	--model-name depth-anything/DA3-LARGE \
	--epochs 200 \
	--lr 2e-4 \
	--batch-size 32 \
	--checkpoint-dir checkpoints \
	--use-wandb
	```

	Or use the Python API:

	```python
	from ylff.services.ylff_training import train_ylff
	from ylff.services.preprocessed_dataset import PreprocessedARKitDataset

	# Load preprocessed dataset
	dataset = PreprocessedARKitDataset(
	cache_dir="cache/preprocessed",
	arkit_sequences_dir="data/arkit_sequences",
	load_images=True,
	)

	# Train with unified service
	metrics = train_ylff(
	model=da3_model,
	dataset=dataset,
	epochs=200,
	lr=2e-4,
	batch_size=32,
	loss_weights={
	'geometric_consistency': 3.0, # PRIMARY GOAL
	'absolute_scale': 2.5, # CRITICAL
	'pose_geometric': 2.0, # ESSENTIAL
	},
	use_wandb=True,
	checkpoint_dir=Path("checkpoints"),
	)
	```

	### 3. Validate Sequences

	```bash
	# Validate a sequence of images
	ylff validate sequence path/to/images \
	--model-name depth-anything/DA3-LARGE \
	--accept-threshold 2.0 \
	--reject-threshold 30.0 \
	--output results.json
	```

	### 4. Evaluate Model

	```bash
	# Evaluate model agreement with BA
	ylff eval ba-agreement path/to/test/sequences \
	--model-name depth-anything/DA3-LARGE \
	--checkpoint checkpoints/best_model.pt \
	--threshold 2.0
	```

	## Training Approach

	### Unified Training Service

	YLFF uses a single, unified training service (`ylff/services/ylff_training.py`) that:

	1. Uses DINOv2's teacher-student paradigm as the backbone

	- EMA teacher provides stable targets
	- Layer-wise learning rate decay
	- Cosine scheduler with warmup

	2. Incorporates DA3 techniques

	- Depth-ray representation (if available)
	- Multi-resolution training support
	- Scale normalization

	3. Treats geometric consistency as first-order goal
	- Multi-view geometric consistency: weight 3.0 (PRIMARY)
	- Absolute scale loss: weight 2.5 (CRITICAL)
	- Pose geometric loss: weight 2.0 (ESSENTIAL)
	- Gradient loss: weight 1.0 (DA3 technique)
	- Teacher-student consistency: weight 0.5 (STABILITY)

	### Experiment Tracking & Ablations

	YLFF integrates Weights & Biases (W&B) for comprehensive experiment tracking and ablation studies:

	Logged Configuration (per run):

	- Training hyperparameters: `epochs`, `lr`, `batch_size`, `ema_decay`
	- Loss weights: All component weights (geometric_consistency, absolute_scale, pose_geometric, gradient_loss, teacher_consistency)
	- Model configuration: Task type, device, precision (FP16/BF16)

	Logged Metrics (per step):

	- Loss Components: All individual loss terms tracked separately
	- `total_loss`: Overall training loss
	- `geometric_consistency`: Multi-view consistency loss
	- `absolute_scale`: Absolute depth scale loss
	- `pose_geometric`: Pose reprojection error loss
	- `gradient_loss`: Depth gradient loss
	- `teacher_consistency`: Teacher-student consistency loss
	- Training State: `step`, `epoch`, `lr` (learning rate over time)

	Ablation Study Support:

	- Compare runs: Filter by hyperparameters (loss weights, learning rate, etc.)
	- Track component contributions: See how each loss component evolves
	- Hyperparameter sweeps: Use W&B sweeps to systematically explore configurations
	- Reproducibility: All hyperparameters logged in config for exact reproduction

	Example Ablation Workflow:

	```bash
	# Run 1: Baseline (default geometric-first weights)
	ylff train unified cache/preprocessed \
	--epochs 200 \
	--use-wandb \
	--wandb-project ylff-ablations \
	--wandb-name baseline-geometric-first

	# Run 2: Ablation: Lower geometric consistency weight
	ylff train unified cache/preprocessed \
	--epochs 200 \
	--use-wandb \
	--wandb-project ylff-ablations \
	--wandb-name ablation-lower-geo-weight \
	--loss-weight-geometric-consistency 1.0 # vs default 3.0

	# Run 3: Ablation: No teacher-student consistency
	ylff train unified cache/preprocessed \
	--epochs 200 \
	--use-wandb \
	--wandb-project ylff-ablations \
	--wandb-name ablation-no-teacher \
	--loss-weight-teacher-consistency 0.0 # Disable teacher loss

	# Compare in W&B dashboard:
	# - Filter by project: "ylff-ablations"
	# - Compare loss curves across runs
	# - Analyze which loss components matter most
	```

	W&B Dashboard Features:

	- Parallel coordinates plot: Visualize hyperparameter relationships
	- Loss curves: Compare training dynamics across ablations
	- Component analysis: See contribution of each loss term
	- Best run identification: Automatically identify best configurations

	### Suggested Ablation Studies

	Based on YLFF's architecture, here are key ablation experiments to validate our design choices:

	#### 1. Loss Weight Ablations (Geometric Consistency First)

	Question: How critical is treating geometric consistency as a first-order goal?

	```python
	from ylff.services.ylff_training import train_ylff
	from ylff.services.preprocessed_dataset import PreprocessedARKitDataset

	# Baseline: Geometric-first (default)
	train_ylff(
	model=model,
	dataset=dataset,
	epochs=200,
	use_wandb=True,
	wandb_project="ylff-ablations",
	loss_weights={
	'geometric_consistency': 3.0, # PRIMARY GOAL
	'absolute_scale': 2.5,
	'pose_geometric': 2.0,
	'gradient_loss': 1.0,
	'teacher_consistency': 0.5,
	},
	)

	# Ablation 1: Equal weights (traditional approach)
	train_ylff(
	model=model,
	dataset=dataset,
	epochs=200,
	use_wandb=True,
	wandb_project="ylff-ablations",
	loss_weights={
	'geometric_consistency': 1.0, # Equal weight
	'absolute_scale': 1.0,
	'pose_geometric': 1.0,
	'gradient_loss': 1.0,
	'teacher_consistency': 0.5,
	},
	)

	# Ablation 2: Perceptual-first (reverse priority)
	train_ylff(
	model=model,
	dataset=dataset,
	epochs=200,
	use_wandb=True,
	wandb_project="ylff-ablations",
	loss_weights={
	'geometric_consistency': 0.5, # Lower priority
	'absolute_scale': 0.5,
	'pose_geometric': 0.5,
	'gradient_loss': 3.0, # Emphasize smoothness
	'teacher_consistency': 0.5,
	},
	)

	# Ablation 3: Remove geometric consistency entirely
	train_ylff(
	model=model,
	dataset=dataset,
	epochs=200,
	use_wandb=True,
	wandb_project="ylff-ablations",
	loss_weights={
	'geometric_consistency': 0.0, # Disabled
	'absolute_scale': 2.5,
	'pose_geometric': 2.0,
	'gradient_loss': 1.0,
	'teacher_consistency': 0.5,
	},
	)
	```

	Metrics to Compare:

	- Final geometric consistency loss
	- BA agreement (reprojection error)
	- Absolute scale accuracy (vs LiDAR)
	- Multi-view reconstruction quality

	#### 2. Teacher-Student Ablation

	Question: Does EMA teacher provide training stability and better convergence?

	```python
	# Baseline: With EMA teacher (default ema_decay=0.999)
	train_ylff(
	model=model,
	dataset=dataset,
	epochs=200,
	ema_decay=0.999,
	use_wandb=True,
	wandb_project="ylff-ablations",
	)

	# Ablation 1: No teacher-student (ema_decay=0.0)
	train_ylff(
	model=model,
	dataset=dataset,
	epochs=200,
	ema_decay=0.0, # No EMA updates
	loss_weights={
	'geometric_consistency': 3.0,
	'absolute_scale': 2.5,
	'pose_geometric': 2.0,
	'gradient_loss': 1.0,
	'teacher_consistency': 0.0, # Disable teacher loss
	},
	use_wandb=True,
	wandb_project="ylff-ablations",
	)

	# Ablation 2: Faster teacher updates (ema_decay=0.99)
	train_ylff(
	model=model,
	dataset=dataset,
	epochs=200,
	ema_decay=0.99, # Faster updates
	use_wandb=True,
	wandb_project="ylff-ablations",
	)

	# Ablation 3: Slower teacher updates (ema_decay=0.9999)
	train_ylff(
	model=model,
	dataset=dataset,
	epochs=200,
	ema_decay=0.9999, # Slower updates
	use_wandb=True,
	wandb_project="ylff-ablations",
	)
	```

	Metrics to Compare:

	- Training stability (loss variance)
	- Convergence speed
	- Final model quality
	- Teacher-student consistency loss

	#### 3. Oracle Source Ablation (BA vs ARKit)

	Question: How much does BA refinement improve over ARKit poses?

	```bash
	# Baseline: Use BA when ARKit quality < 0.8 (default)
	ylff preprocess arkit data/arkit_sequences \
	--output-cache cache/preprocessed-ba \
	--prefer-arkit-poses --min-arkit-quality 0.8

	ylff train unified cache/preprocessed-ba \
	--use-wandb --wandb-project ylff-ablations

	# Ablation 1: Always use ARKit (no BA, faster preprocessing)
	ylff preprocess arkit data/arkit_sequences \
	--output-cache cache/preprocessed-arkit-only \
	--prefer-arkit-poses --min-arkit-quality 0.0

	ylff train unified cache/preprocessed-arkit-only \
	--use-wandb --wandb-project ylff-ablations

	# Ablation 2: Always use BA (expensive but highest quality)
	ylff preprocess arkit data/arkit_sequences \
	--output-cache cache/preprocessed-ba-always \
	--prefer-arkit-poses --min-arkit-quality 1.0 # Never use ARKit

	ylff train unified cache/preprocessed-ba-always \
	--use-wandb --wandb-project ylff-ablations
	```

	Metrics to Compare:

	- Pose accuracy (reprojection error)
	- Training data quality (confidence scores)
	- Final model performance
	- Preprocessing time cost

	#### 4. Uncertainty Weighting Ablation

	Question: Does confidence-weighted loss improve training vs uniform weighting?

	```bash
	# Baseline: With uncertainty weighting (default)
	# Uses depth_confidence and pose_confidence from preprocessing

	# Ablation: Uniform weighting (ignore uncertainty)
	# Modify preprocessing to set all confidence = 1.0
	# Or modify loss computation to ignore confidence maps
	```

	Metrics to Compare:

	- Loss on high-confidence vs low-confidence regions
	- Model performance on uncertain scenes
	- Training stability

	#### 5. Multi-View Consistency Ablation

	Question: How many views are needed for effective geometric consistency?

	```python
	# Baseline: Variable views (2-18, default from dataset)
	train_ylff(
	model=model,
	dataset=dataset, # Uses all available views
	epochs=200,
	use_wandb=True,
	wandb_project="ylff-ablations",
	)

	# Ablation 1: Single view only (disable geometric consistency)
	train_ylff(
	model=model,
	dataset=single_view_dataset, # Modified dataset with 1 view
	epochs=200,
	loss_weights={
	'geometric_consistency': 0.0, # Disabled (needs 2+ views)
	'absolute_scale': 2.5,
	'pose_geometric': 2.0,
	'gradient_loss': 1.0,
	'teacher_consistency': 0.5,
	},
	use_wandb=True,
	wandb_project="ylff-ablations",
	)

	# Ablation 2-4: Fixed N views
	# Modify dataset to sample exactly N views per sequence
	# Compare: 2 views, 5 views, 10 views, 18 views
	```

	Metrics to Compare:

	- Geometric consistency loss
	- Multi-view reconstruction accuracy
	- Training efficiency (more views = slower)

	#### 6. DA3 Techniques Ablation

	Question: Which DA3 techniques contribute most?

	```python
	# Baseline: All DA3 techniques enabled
	train_ylff(
	model=model,
	dataset=dataset,
	epochs=200,
	use_wandb=True,
	wandb_project="ylff-ablations",
	)

	# Ablation 1: No gradient loss (DA3 edge preservation)
	train_ylff(
	model=model,
	dataset=dataset,
	epochs=200,
	loss_weights={
	'geometric_consistency': 3.0,
	'absolute_scale': 2.5,
	'pose_geometric': 2.0,
	'gradient_loss': 0.0, # Disabled
	'teacher_consistency': 0.5,
	},
	use_wandb=True,
	wandb_project="ylff-ablations",
	)

	# Ablation 2: No depth-ray representation
	# Use model that outputs separate depth + poses instead of depth-ray
	# (Requires different model architecture)

	# Ablation 3: Fixed resolution (no multi-resolution training)
	# Modify dataset to use fixed resolution instead of variable
	```

	Metrics to Compare:

	- Depth edge quality (gradient loss ablation)
	- Training efficiency (multi-resolution ablation)
	- Model generalization

	#### 7. Preprocessing Phase Ablation

	Question: How much does the two-phase pipeline improve training efficiency?

	```bash
	# Baseline: With preprocessing (fast training)
	ylff preprocess arkit data/arkit_sequences --output-cache cache/preprocessed
	ylff train unified cache/preprocessed \
	--use-wandb --wandb-project ylff-ablations \
	--wandb-name baseline-with-preprocessing

	# Ablation: Live BA during training (slow but no preprocessing)
	# This would require modifying training to compute BA on-the-fly
	# Compare: Training time per epoch, total training time
	```

	Metrics to Compare:

	- Training time per epoch
	- Total training time
	- Model quality (should be similar, preprocessing is just optimization)

	#### 8. Loss Component Contribution Analysis

	Question: Which loss component contributes most to final model quality?

	Run systematic sweeps using W&B sweeps or Python script:

	```python
	# sweep_config.yaml
	program: train_ablation_sweep.py
	method: grid
	parameters:
	loss_weight_geometric_consistency:
	values: [0.0, 1.0, 2.0, 3.0, 4.0]
	loss_weight_absolute_scale:
	values: [0.0, 1.0, 2.0, 2.5, 3.0]
	loss_weight_pose_geometric:
	values: [0.0, 1.0, 2.0, 3.0]
	loss_weight_gradient_loss:
	values: [0.0, 0.5, 1.0, 1.5]
	loss_weight_teacher_consistency:
	values: [0.0, 0.25, 0.5, 0.75, 1.0]

	# train_ablation_sweep.py
	import wandb
	from ylff.services.ylff_training import train_ylff

	wandb.init()
	config = wandb.config

	train_ylff(
	model=model,
	dataset=dataset,
	epochs=200,
	loss_weights={
	'geometric_consistency': config.loss_weight_geometric_consistency,
	'absolute_scale': config.loss_weight_absolute_scale,
	'pose_geometric': config.loss_weight_pose_geometric,
	'gradient_loss': config.loss_weight_gradient_loss,
	'teacher_consistency': config.loss_weight_teacher_consistency,
	},
	use_wandb=True,
	wandb_project="ylff-ablations",
	)

	# Run: wandb sweep sweep_config.yaml
	```

	Analysis:

	- Use W&B parallel coordinates plot to find optimal weight combinations
	- Identify which components are essential vs optional
	- Find Pareto frontier (best quality for given training time)

	#### Recommended Ablation Order

	1. Start with Loss Weight Ablations (#1) - Most fundamental to our approach
	2. Teacher-Student Ablation (#2) - Validates DINOv2 adaptation
	3. Oracle Source Ablation (#3) - Validates preprocessing strategy
	4. Component Contribution (#8) - Systematic analysis
	5. DA3 Techniques (#6) - Validates DA3 integration
	6. Multi-View Consistency (#5) - Optimizes training efficiency
	7. Uncertainty Weighting (#4) - Fine-tuning
	8. Preprocessing Phase (#7) - Efficiency validation

	Each ablation should be run with:

	- Same random seed (for reproducibility)
	- Same dataset split
	- Same number of epochs
	- W&B tracking enabled for easy comparison

	## Training Datasets

	Depth Anything 3 (DA3) was trained exclusively on public academic datasets. The following table documents all datasets used in DA3 training, their sources, and availability status for YLFF:

	\| Dataset \| # Scenes \| Data Type \| Source / URL \| YLFF Status \| Notes \|
	\| ------------------------------------ \| -------- \| --------- \| ----------------------------------------------------------------------------------------------- \| ---------------- \| ------------------------------ \|
	\| Synthetic Datasets \|
	\| AriaDigitalTwin \| 237 \| Synthetic \| [Aria Digital Twin](https://github.com/facebookresearch/AriaDigitalTwin) \| ❌ Not Available \| Meta's AR dataset \|
	\| AriaSyntheticENV \| 99,950 \| Synthetic \| [Aria Synthetic](https://github.com/facebookresearch/AriaDigitalTwin) \| ❌ Not Available \| Large-scale synthetic AR \|
	\| HyperSim \| 344 \| Synthetic \| [HyperSim](https://github.com/apple/ml-hypersim) \| ❌ Not Available \| Apple's photorealistic dataset \|
	\| MegaSynth \| 6,049 \| Synthetic \| Unknown \| ❓ To Verify \| Synthetic multi-view \|
	\| MvsSynth \| 121 \| Synthetic \| Unknown \| ❓ To Verify \| Multi-view stereo synthetic \|
	\| Objaverse \| 505,557 \| Synthetic \| [Objaverse](https://objaverse.allenai.org/) \| ❓ To Verify \| Large-scale 3D objects \|
	\| Omniobject \| 5,885 \| Synthetic \| [OmniObject3D](https://omniobject3d.github.io/) \| ❓ To Verify \| Object-centric dataset \|
	\| OmniWorld \| 1,039 \| Synthetic \| [OmniWorld](https://arxiv.org/abs/2509.12201) \| ❓ To Verify \| Multi-domain dataset \|
	\| PointOdyssey \| 44 \| Synthetic \| [PointOdyssey](https://pointodyssey.com/) \| ❓ To Verify \| Long-term point tracking \|
	\| ReplicaVMAP \| 17 \| Synthetic \| [Replica](https://github.com/facebookresearch/Replica-Dataset) \| ❓ To Verify \| Indoor scene dataset \|
	\| ScenenetRGBD \| 16,866 \| Synthetic \| [SceneNet RGB-D](https://robotvault.bitbucket.io/scenenet-rgbd.html) \| ❓ To Verify \| Indoor RGB-D scenes \|
	\| TartanAir \| 355 \| Synthetic \| [TartanAir](https://theairlab.org/tartanair-dataset/) \| ❓ To Verify \| Large-scale simulation \|
	\| Trellis \| 557,408 \| Synthetic \| Unknown \| ❓ To Verify \| Large-scale synthetic \|
	\| vKitti2 \| 50 \| Synthetic \| [vKITTI2](https://europe.naverlabs.com/research/computer-vision/proxy-virtual-worlds-vkitti-2/) \| ❓ To Verify \| Virtual KITTI \|
	\| Real-World Datasets (LiDAR) \|
	\| ARKitScenes \| 4,388 \| LiDAR \| [ARKitScenes](https://github.com/apple/ARKitScenes) \| ✅ Available \| Primary dataset for YLFF \|
	\| ScanNet++ \| 230 \| LiDAR \| [ScanNet++](https://github.com/ScanNet/ScanNetPlusPlus) \| ❓ To Verify \| High-fidelity indoor \|
	\| WildRGBD \| 23,050 \| LiDAR \| [WildRGBD](https://wildrgbd.github.io/) \| ❓ To Verify \| Large-scale RGB-D \|
	\| Real-World Datasets (COLMAP/SfM) \|
	\| BlendedMVS \| 503 \| 3D Recon \| [BlendedMVS](https://github.com/YoYo000/BlendedMVS) \| ❓ To Verify \| Multi-view stereo \|
	\| Co3dv2 \| 30,616 \| COLMAP \| [Common Objects in 3D](https://github.com/facebookresearch/co3d) \| ❓ To Verify \| Object-centric \|
	\| DL3DV \| 6,379 \| COLMAP \| [DL3DV-10K](https://github.com/OpenGVLab/DL3DV) \| ❓ To Verify \| Large-scale 3D vision \|
	\| MapFree \| 921 \| COLMAP \| [Map-free Visual Relocalization](https://github.com/nianticlabs/map-free-reloc) \| ❓ To Verify \| Visual relocalization \|
	\| MegaDepth \| 268 \| COLMAP \| [MegaDepth](https://www.cs.cornell.edu/projects/megadepth/) \| ❓ To Verify \| Internet photos \|

	Legend:

	- ✅ Available: Dataset is accessible and can be used for YLFF training
	- ❌ Not Available: Dataset is not accessible (proprietary, requires special access, etc.)
	- ❓ To Verify: Dataset availability needs to be confirmed

	### Dataset Statistics

	Total Training Data:

	- Synthetic: ~1,093,000 scenes (majority from Objaverse and Trellis)
	- Real-World LiDAR: ~27,668 scenes (ARKitScenes, ScanNet++, WildRGBD)
	- Real-World COLMAP: ~38,687 scenes (BlendedMVS, Co3dv2, DL3DV, MapFree, MegaDepth)
	- Total: ~1,159,355 scenes

	Data Type Distribution:

	- Synthetic: 94.3% (provides high-quality dense depth)
	- LiDAR: 2.4% (provides metric accuracy)
	- COLMAP/SfM: 3.3% (provides multi-view geometry)

	### YLFF Dataset Strategy

	YLFF currently focuses on ARKitScenes as the primary training dataset because:

	1. ✅ Available: Publicly accessible dataset
	2. ✅ High Quality: LiDAR depth provides metric accuracy
	3. ✅ Real-World: Captures real indoor scenes with natural variations
	4. ✅ Rich Metadata: Includes poses, intrinsics, and LiDAR depth
	5. ✅ Large Scale: 4,388 scenes provide substantial training data

	Future Dataset Integration:

	- Priority: ScanNet++, WildRGBD (LiDAR datasets for metric accuracy)
	- Secondary: DL3DV, Co3dv2 (COLMAP datasets for multi-view geometry)
	- Synthetic: Consider for teacher model training (if accessible)

	### Dataset Access Notes

	- ARKitScenes: Download from [official repository](https://github.com/apple/ARKitScenes)
	- ScanNet++: Requires registration and approval
	- COLMAP datasets: Most are publicly available but may require preprocessing
	- Synthetic datasets: Many require special access or are proprietary

	For detailed dataset preparation and preprocessing instructions, see `docs/DATASET_PREPARATION.md` (to be created).

	### Loss Components

	The training uses geometric losses as the primary objective:

	1. Multi-View Geometric Consistency (weight: 3.0)

	- Enforces that the same 3D point projects correctly across views
	- Uses back-projection + projection across multiple views
	- This is treated as a first-order objective, not regularization

	2. Absolute Scale Loss (weight: 2.5)

	- Direct supervision from LiDAR/BA depth
	- Enforces correct absolute depth values in meters
	- Critical for metric accuracy

	3. Pose Geometric Loss (weight: 2.0)

	- Reprojection error using predicted poses
	- Enforces geometric consistency between poses and depth
	- Multi-view pose consistency is paramount

	4. Gradient Loss (weight: 1.0)

	- Preserves sharp depth boundaries
	- Ensures smoothness in planar regions
	- DA3 technique for better depth quality

	5. Teacher-Student Consistency (weight: 0.5)
	- L1 loss between student and teacher predictions
	- Encourages stable training
	- Prevents student from diverging

	## Project Structure

	```
	ylff/
	├── ylff/ # Main package
	│ ├── services/ # Business logic
	│ │ ├── ylff_training.py # ⭐ Unified training service
	│ │ ├── preprocessing.py # Offline preprocessing (BA, uncertainty)
	│ │ ├── preprocessed_dataset.py # Dataset for pre-computed results
	│ │ ├── ba_validator.py # BA validation pipeline
	│ │ ├── arkit_processor.py # ARKit data processing
	│ │ ├── evaluate.py # Evaluation metrics
	│ │ └── ... # Other services
	│ │
	│ ├── utils/ # Utilities
	│ │ ├── geometric_losses.py # Geometric loss functions
	│ │ ├── oracle_uncertainty.py # Oracle uncertainty propagation
	│ │ ├── oracle_losses.py # Oracle-weighted losses
	│ │ └── ... # Other utilities
	│ │
	│ ├── routers/ # FastAPI route handlers
	│ ├── models/ # Pydantic API models
	│ └── cli.py # Command-line interface
	│
	├── configs/ # Configuration files
	│ ├── dinov2_train_config.yaml # Training configuration
	│ └── ba_config.yaml # BA pipeline configuration
	│
	├── docs/ # Documentation
	│ ├── UNIFIED_TRAINING.md # Unified training guide
	│ ├── TRAINING_PIPELINE_ARCHITECTURE.md
	│ └── ... # Other documentation
	│
	└── research_docs/ # Research documentation
	└── MODEL_ARCH.md # Model architecture details
	```

	## CLI Commands

	### Preprocessing

	- `ylff preprocess arkit <dir>` - Pre-process ARKit sequences (offline)

	### Training

	- `ylff train unified <cache_dir>` - Train using unified training service

	### Validation

	- `ylff validate sequence <dir>` - Validate a single sequence
	- `ylff validate arkit <dir> [--gui]` - Validate ARKit data (with optional GUI)

	### Evaluation

	- `ylff eval ba-agreement <dir>` - Evaluate model agreement with BA

	### Visualization

	- `ylff visualize <results_dir>` - Generate static visualizations

	## Complete Workflow

	### Step 1: Pre-Process All Sequences

	```bash
	# Pre-process all ARKit sequences (one-time, can run overnight)
	ylff preprocess arkit data/arkit_sequences \
	--output-cache cache/preprocessed \
	--model-name depth-anything/DA3-LARGE \
	--num-workers 8 \
	--prefer-arkit-poses \
	--use-lidar
	```

	This:

	- Extracts ARKit data (poses, LiDAR depth) - FREE
	- Runs DA3 inference (GPU, batchable)
	- Runs BA only for sequences with poor ARKit tracking
	- Computes oracle uncertainty
	- Saves everything to cache

	### Step 2: Train with Unified Service

	```bash
	# Train using pre-computed results (fast iteration)
	ylff train unified cache/preprocessed \
	--model-name depth-anything/DA3-LARGE \
	--epochs 200 \
	--lr 2e-4 \
	--batch-size 32 \
	--checkpoint-dir checkpoints \
	--use-wandb \
	--wandb-project ylff-training
	```

	This:

	- Loads pre-computed oracle results (fast, disk I/O)
	- Runs DA3 inference (current model, GPU)
	- Computes geometric losses (primary objective)
	- Updates model weights with teacher-student learning

	### Step 3: Evaluate

	```bash
	# Evaluate fine-tuned model
	ylff eval ba-agreement data/test \
	--checkpoint checkpoints/best_model.pt
	```

	## Configuration

	Configuration files are in `configs/`:

	- `dinov2_train_config.yaml` - Unified training configuration

	- Optimizer settings (DINOv2 style)
	- Loss weights (geometric consistency first)
	- Teacher-student settings
	- Multi-resolution and multi-view training

	- `ba_config.yaml` - BA pipeline settings

	## Documentation

	- Unified Training: `docs/UNIFIED_TRAINING.md` - Complete guide to unified training
	- Training Pipeline: `docs/TRAINING_PIPELINE_ARCHITECTURE.md` - Two-phase pipeline architecture
	- Model Architecture: `research_docs/MODEL_ARCH.md` - Detailed architecture and training approach
	- API Documentation: `docs/API.md` - API reference
	- ARKit Integration: `docs/ARKIT_INTEGRATION.md` - ARKit data processing

	## Key Design Decisions

	### Why Geometric Consistency First?

	Traditional depth estimation models prioritize perceptual quality (how realistic the depth looks) over geometric accuracy (how accurate the absolute scale and multi-view consistency are). YLFF reverses this priority:

	- Geometric consistency ensures that the same 3D point projects correctly across views
	- Absolute scale ensures metric accuracy (depth in meters, not just relative)
	- Pose consistency ensures that predicted poses align with depth predictions

	This approach is essential for applications requiring accurate 3D reconstruction, SLAM, and metric depth estimation.

	### Why Two-Phase Pipeline?

	BA computation is expensive (5-15 minutes per sequence) and cannot run during training. The two-phase pipeline:

	1. Pre-processing (offline): Compute BA once, cache results
	2. Training (online): Load cached results, train fast

	This enables 100-1000x faster training iteration while still using BA as supervision.

	### Why Teacher-Student Learning?

	DINOv2's teacher-student paradigm provides:

	- Stability: EMA teacher prevents training instability
	- Better convergence: Teacher provides stable targets
	- Scalability: Works well with large-scale training

	## Development

	### Running Tests

	```bash
	# Basic smoke test
	python scripts/tests/smoke_test_basic.py

	# GUI test
	python scripts/tests/test_gui_simple.py
	```

	### Code Quality

	```bash
	# Format code
	black ylff/ scripts/

	# Sort imports
	isort ylff/ scripts/

	# Type checking
	mypy ylff/
	```

	## Dependencies

	### Core Dependencies

	- PyTorch >= 2.0
	- NumPy < 2.0
	- OpenCV
	- pycolmap >= 0.4.0
	- Typer (for CLI)

	### Optional Dependencies

	- GUI: Plotly (for interactive 3D plots)
	- BA Pipeline: hloc, LightGlue (installed from source)
	- Training: Weights & Biases (for experiment tracking)

	See `pyproject.toml` for complete dependency list.

	## License

	Apache-2.0

	## Citation

	If you use YLFF in your research, please cite:

	```bibtex
	@software{ylff2024,
	title={You Learn From Failure: Geometric Consistency First Training for Visual Geometry},
	author={YLFF Contributors},
	year={2024},
	url={https://github.com/your-org/ylff}
	}
	```

	## References

	- DINOv2: https://github.com/facebookresearch/dinov2
	- DA3 Paper: Depth Anything 3 (arXiv:2511.10647)
	- Unified Training: `ylff/services/ylff_training.py`
	- Model Architecture: `research_docs/MODEL_ARCH.md`