3d_model / docs /MODEL_SELECTION.md
Azan
Clean deployment build (Squashed)
7a87926
# DA3 Model Selection Guide
## Overview
DA3 provides multiple model series, each optimized for different use cases. This guide helps you choose the right model for YLFF workflows.
## Model Series
### 🌟 DA3 Main Series
**Models**: `DA3-GIANT`, `DA3-LARGE`, `DA3-BASE`, `DA3-SMALL`
**Capabilities**:
- βœ… Monocular depth estimation
- βœ… Multi-view depth estimation
- βœ… Pose-conditioned depth estimation
- βœ… Camera pose estimation
- βœ… 3D Gaussian estimation
**Characteristics**:
- Unified depth-ray representation
- **Not metric** (relative depth, requires scale alignment)
- Varying sizes: Giant (best quality) β†’ Small (fastest)
**Best For**:
- General-purpose visual geometry tasks
- When you need pose estimation but can handle scale alignment
- Fast iteration with smaller models
### πŸ“ DA3 Metric Series
**Models**: `DA3Metric-LARGE`
**Capabilities**:
- βœ… Monocular depth estimation
- βœ… **Metric depth** (real-world scale)
**Characteristics**:
- Specialized for metric depth
- Fine-tuned for real-world scale
- **No pose estimation**
**Best For**:
- Applications requiring real-world scale
- When you have poses from another source
- Metric depth-only workflows
### πŸ” DA3 Monocular Series
**Models**: `DA3Mono-LARGE`
**Capabilities**:
- βœ… High-quality relative monocular depth
**Characteristics**:
- Dedicated for monocular depth
- Superior geometric accuracy vs. disparity-based models
- **No pose estimation, not metric**
**Best For**:
- Single-image depth estimation
- When geometric accuracy is critical
- Relative depth is sufficient
### πŸ”— DA3 Nested Series
**Models**: `DA3NESTED-GIANT-LARGE`
**Capabilities**:
- βœ… Monocular depth estimation
- βœ… Multi-view depth estimation
- βœ… Pose-conditioned depth estimation
- βœ… Camera pose estimation
- βœ… **Metric depth** (real-world scale)
**Characteristics**:
- Combines giant model with metric model
- **Both pose estimation AND metric depth**
- Real-world metric scale reconstruction
- **Recommended for BA validation and fine-tuning**
**Best For**:
- βœ… **BA validation** (needs metric depth + poses)
- βœ… **Fine-tuning workflows** (needs metric depth + poses)
- βœ… Metric reconstruction at real-world scale
- βœ… When you need both pose and metric depth
## YLFF Recommendations
### For BA Validation
**Recommended**: `DA3NESTED-GIANT-LARGE`
**Why**:
- Provides both camera poses and metric depth
- Metric depth enables proper comparison with BA (real-world scale)
- Best accuracy for validation workflows
**Usage**:
```bash
# Auto-selects DA3NESTED-GIANT-LARGE
ylff validate arkit assets/examples/ARKit
# Or explicitly specify
ylff validate arkit assets/examples/ARKit \
--model-name depth-anything/DA3NESTED-GIANT-LARGE
```
### For Fine-Tuning
**Recommended**: `DA3NESTED-GIANT-LARGE`
**Why**:
- Fine-tuning benefits from metric depth (real-world scale)
- Pose estimation needed for training
- Best starting point for improvement
**Usage**:
```bash
# Auto-selects DA3NESTED-GIANT-LARGE
ylff train start data/training
# Or explicitly specify
ylff train start data/training \
--model-name depth-anything/DA3NESTED-GIANT-LARGE
```
### For Fast Experimentation
**Recommended**: `DA3-LARGE` or `DA3-BASE`
**Why**:
- Faster inference
- Still provides pose estimation
- Good for quick tests
**Usage**:
```bash
ylff validate sequence path/to/images \
--model-name depth-anything/DA3-BASE
```
### For Metric Depth Only
**Recommended**: `DA3Metric-LARGE`
**Why**:
- Specialized for metric depth
- Best accuracy for metric-only tasks
**Note**: This model does **not** provide pose estimation. Use with external pose sources.
## Model Comparison
| Model | Pose Est. | Metric Depth | Speed | Quality | Use Case |
| --------------------- | --------- | ------------ | ------- | ------- | ------------------------------ |
| DA3NESTED-GIANT-LARGE | βœ… | βœ… | Medium | Best | **BA validation, fine-tuning** |
| DA3-GIANT | βœ… | ❌ | Slow | Best | Best quality, non-metric |
| DA3-LARGE | βœ… | ❌ | Medium | High | General purpose |
| DA3-BASE | βœ… | ❌ | Fast | Good | Fast iteration |
| DA3-SMALL | βœ… | ❌ | Fastest | Good | Fastest |
| DA3Metric-LARGE | ❌ | βœ… | Medium | High | Metric depth only |
| DA3Mono-LARGE | ❌ | ❌ | Medium | High | Monocular depth only |
## Auto-Selection
YLFF automatically selects the best model for each use case:
```python
from ylff.models import get_recommended_model
# For BA validation
model = get_recommended_model("ba_validation")
# Returns: "depth-anything/DA3NESTED-GIANT-LARGE"
# For fine-tuning
model = get_recommended_model("fine_tuning")
# Returns: "depth-anything/DA3NESTED-GIANT-LARGE"
# For fast inference
model = get_recommended_model("fast")
# Returns: "depth-anything/DA3-SMALL"
```
## CLI Usage
### Auto-Select Model
```bash
# YLFF auto-selects DA3NESTED-GIANT-LARGE for BA validation
ylff validate arkit assets/examples/ARKit
# YLFF auto-selects DA3NESTED-GIANT-LARGE for fine-tuning
ylff train start data/training
```
### Explicit Model Selection
```bash
# Use specific model
ylff validate arkit assets/examples/ARKit \
--model-name depth-anything/DA3-LARGE
# Use smaller model for speed
ylff validate sequence path/to/images \
--model-name depth-anything/DA3-BASE
```
### List Available Models
```python
from ylff.models import list_available_models, get_model_info
# List all models
models = list_available_models()
for name, info in models.items():
print(f"{name}: {info['description']}")
# Get specific model info
info = get_model_info("depth-anything/DA3NESTED-GIANT-LARGE")
print(info['capabilities'])
print(info['recommended_for'])
```
## Why DA3NESTED-GIANT-LARGE for BA Validation?
1. **Metric Depth**: BA works in real-world scale. Metric depth enables proper comparison.
2. **Pose Estimation**: BA validation compares predicted poses with BA-refined poses. Need pose estimation capability.
3. **Accuracy**: Nested model combines best of both worlds (giant model quality + metric specialization).
4. **Consistency**: Using metric depth ensures depth values are in real-world units, matching BA's output scale.
## Performance Considerations
- **DA3NESTED-GIANT-LARGE**: Slower but most accurate for BA workflows
- **DA3-LARGE**: Good balance for experimentation
- **DA3-BASE**: Faster, good for quick tests
- **DA3-SMALL**: Fastest, acceptable quality for rapid iteration
## Migration Guide
If you were using `DA3-LARGE` before:
```bash
# Old (still works)
ylff validate arkit assets/examples/ARKit \
--model-name depth-anything/DA3-LARGE
# New (recommended, auto-selected)
ylff validate arkit assets/examples/ARKit
# Automatically uses DA3NESTED-GIANT-LARGE
```
The new default provides better results for BA validation due to metric depth support.