Spaces:

azan888
/

3d_model

Sleeping

App Files Files Community

3d_model / docs /MODEL_SELECTION.md

Azan

Clean deployment build (Squashed)

7a87926 8 days ago

preview code

raw

history blame contribute delete

7.16 kB

	# DA3 Model Selection Guide

	## Overview

	DA3 provides multiple model series, each optimized for different use cases. This guide helps you choose the right model for YLFF workflows.

	## Model Series

	### 🌟 DA3 Main Series

	Models: `DA3-GIANT`, `DA3-LARGE`, `DA3-BASE`, `DA3-SMALL`

	Capabilities:

	- ✅ Monocular depth estimation
	- ✅ Multi-view depth estimation
	- ✅ Pose-conditioned depth estimation
	- ✅ Camera pose estimation
	- ✅ 3D Gaussian estimation

	Characteristics:

	- Unified depth-ray representation
	- Not metric (relative depth, requires scale alignment)
	- Varying sizes: Giant (best quality) → Small (fastest)

	Best For:

	- General-purpose visual geometry tasks
	- When you need pose estimation but can handle scale alignment
	- Fast iteration with smaller models

	### 📐 DA3 Metric Series

	Models: `DA3Metric-LARGE`

	Capabilities:

	- ✅ Monocular depth estimation
	- ✅ Metric depth (real-world scale)

	Characteristics:

	- Specialized for metric depth
	- Fine-tuned for real-world scale
	- No pose estimation

	Best For:

	- Applications requiring real-world scale
	- When you have poses from another source
	- Metric depth-only workflows

	### 🔍 DA3 Monocular Series

	Models: `DA3Mono-LARGE`

	Capabilities:

	- ✅ High-quality relative monocular depth

	Characteristics:

	- Dedicated for monocular depth
	- Superior geometric accuracy vs. disparity-based models
	- No pose estimation, not metric

	Best For:

	- Single-image depth estimation
	- When geometric accuracy is critical
	- Relative depth is sufficient

	### 🔗 DA3 Nested Series

	Models: `DA3NESTED-GIANT-LARGE`

	Capabilities:

	- ✅ Monocular depth estimation
	- ✅ Multi-view depth estimation
	- ✅ Pose-conditioned depth estimation
	- ✅ Camera pose estimation
	- ✅ Metric depth (real-world scale)

	Characteristics:

	- Combines giant model with metric model
	- Both pose estimation AND metric depth
	- Real-world metric scale reconstruction
	- Recommended for BA validation and fine-tuning

	Best For:

	- ✅ BA validation (needs metric depth + poses)
	- ✅ Fine-tuning workflows (needs metric depth + poses)
	- ✅ Metric reconstruction at real-world scale
	- ✅ When you need both pose and metric depth

	## YLFF Recommendations

	### For BA Validation

	Recommended: `DA3NESTED-GIANT-LARGE`

	Why:

	- Provides both camera poses and metric depth
	- Metric depth enables proper comparison with BA (real-world scale)
	- Best accuracy for validation workflows

	Usage:

	```bash
	# Auto-selects DA3NESTED-GIANT-LARGE
	ylff validate arkit assets/examples/ARKit

	# Or explicitly specify
	ylff validate arkit assets/examples/ARKit \
	--model-name depth-anything/DA3NESTED-GIANT-LARGE
	```

	### For Fine-Tuning

	Recommended: `DA3NESTED-GIANT-LARGE`

	Why:

	- Fine-tuning benefits from metric depth (real-world scale)
	- Pose estimation needed for training
	- Best starting point for improvement

	Usage:

	```bash
	# Auto-selects DA3NESTED-GIANT-LARGE
	ylff train start data/training

	# Or explicitly specify
	ylff train start data/training \
	--model-name depth-anything/DA3NESTED-GIANT-LARGE
	```

	### For Fast Experimentation

	Recommended: `DA3-LARGE` or `DA3-BASE`

	Why:

	- Faster inference
	- Still provides pose estimation
	- Good for quick tests

	Usage:

	```bash
	ylff validate sequence path/to/images \
	--model-name depth-anything/DA3-BASE
	```

	### For Metric Depth Only

	Recommended: `DA3Metric-LARGE`

	Why:

	- Specialized for metric depth
	- Best accuracy for metric-only tasks

	Note: This model does not provide pose estimation. Use with external pose sources.

	## Model Comparison

	\| Model \| Pose Est. \| Metric Depth \| Speed \| Quality \| Use Case \|
	\| --------------------- \| --------- \| ------------ \| ------- \| ------- \| ------------------------------ \|
	\| DA3NESTED-GIANT-LARGE \| ✅ \| ✅ \| Medium \| Best \| BA validation, fine-tuning \|
	\| DA3-GIANT \| ✅ \| ❌ \| Slow \| Best \| Best quality, non-metric \|
	\| DA3-LARGE \| ✅ \| ❌ \| Medium \| High \| General purpose \|
	\| DA3-BASE \| ✅ \| ❌ \| Fast \| Good \| Fast iteration \|
	\| DA3-SMALL \| ✅ \| ❌ \| Fastest \| Good \| Fastest \|
	\| DA3Metric-LARGE \| ❌ \| ✅ \| Medium \| High \| Metric depth only \|
	\| DA3Mono-LARGE \| ❌ \| ❌ \| Medium \| High \| Monocular depth only \|

	## Auto-Selection

	YLFF automatically selects the best model for each use case:

	```python
	from ylff.models import get_recommended_model

	# For BA validation
	model = get_recommended_model("ba_validation")
	# Returns: "depth-anything/DA3NESTED-GIANT-LARGE"

	# For fine-tuning
	model = get_recommended_model("fine_tuning")
	# Returns: "depth-anything/DA3NESTED-GIANT-LARGE"

	# For fast inference
	model = get_recommended_model("fast")
	# Returns: "depth-anything/DA3-SMALL"
	```

	## CLI Usage

	### Auto-Select Model

	```bash
	# YLFF auto-selects DA3NESTED-GIANT-LARGE for BA validation
	ylff validate arkit assets/examples/ARKit

	# YLFF auto-selects DA3NESTED-GIANT-LARGE for fine-tuning
	ylff train start data/training
	```

	### Explicit Model Selection

	```bash
	# Use specific model
	ylff validate arkit assets/examples/ARKit \
	--model-name depth-anything/DA3-LARGE

	# Use smaller model for speed
	ylff validate sequence path/to/images \
	--model-name depth-anything/DA3-BASE
	```

	### List Available Models

	```python
	from ylff.models import list_available_models, get_model_info

	# List all models
	models = list_available_models()
	for name, info in models.items():
	print(f"{name}: {info['description']}")

	# Get specific model info
	info = get_model_info("depth-anything/DA3NESTED-GIANT-LARGE")
	print(info['capabilities'])
	print(info['recommended_for'])
	```

	## Why DA3NESTED-GIANT-LARGE for BA Validation?

	1. Metric Depth: BA works in real-world scale. Metric depth enables proper comparison.

	2. Pose Estimation: BA validation compares predicted poses with BA-refined poses. Need pose estimation capability.

	3. Accuracy: Nested model combines best of both worlds (giant model quality + metric specialization).

	4. Consistency: Using metric depth ensures depth values are in real-world units, matching BA's output scale.

	## Performance Considerations

	- DA3NESTED-GIANT-LARGE: Slower but most accurate for BA workflows
	- DA3-LARGE: Good balance for experimentation
	- DA3-BASE: Faster, good for quick tests
	- DA3-SMALL: Fastest, acceptable quality for rapid iteration

	## Migration Guide

	If you were using `DA3-LARGE` before:

	```bash
	# Old (still works)
	ylff validate arkit assets/examples/ARKit \
	--model-name depth-anything/DA3-LARGE

	# New (recommended, auto-selected)
	ylff validate arkit assets/examples/ARKit
	# Automatically uses DA3NESTED-GIANT-LARGE
	```

	The new default provides better results for BA validation due to metric depth support.