Spaces:

azan888
/

3d_model

Running

File size: 7,160 Bytes

7a87926

# DA3 Model Selection Guide

## Overview

DA3 provides multiple model series, each optimized for different use cases. This guide helps you choose the right model for YLFF workflows.

## Model Series

### 🌟 DA3 Main Series

**Models**: `DA3-GIANT`, `DA3-LARGE`, `DA3-BASE`, `DA3-SMALL`

**Capabilities**:

- ✅ Monocular depth estimation
- ✅ Multi-view depth estimation
- ✅ Pose-conditioned depth estimation
- ✅ Camera pose estimation
- ✅ 3D Gaussian estimation

**Characteristics**:

- Unified depth-ray representation
- **Not metric** (relative depth, requires scale alignment)
- Varying sizes: Giant (best quality) → Small (fastest)

**Best For**:

- General-purpose visual geometry tasks
- When you need pose estimation but can handle scale alignment
- Fast iteration with smaller models

### 📐 DA3 Metric Series

**Models**: `DA3Metric-LARGE`

**Capabilities**:

- ✅ Monocular depth estimation
- ✅ **Metric depth** (real-world scale)

**Characteristics**:

- Specialized for metric depth
- Fine-tuned for real-world scale
- **No pose estimation**

**Best For**:

- Applications requiring real-world scale
- When you have poses from another source
- Metric depth-only workflows

### 🔍 DA3 Monocular Series

**Models**: `DA3Mono-LARGE`

**Capabilities**:

- ✅ High-quality relative monocular depth

**Characteristics**:

- Dedicated for monocular depth
- Superior geometric accuracy vs. disparity-based models
- **No pose estimation, not metric**

**Best For**:

- Single-image depth estimation
- When geometric accuracy is critical
- Relative depth is sufficient

### 🔗 DA3 Nested Series

**Models**: `DA3NESTED-GIANT-LARGE`

**Capabilities**:

- ✅ Monocular depth estimation
- ✅ Multi-view depth estimation
- ✅ Pose-conditioned depth estimation
- ✅ Camera pose estimation
- ✅ **Metric depth** (real-world scale)

**Characteristics**:

- Combines giant model with metric model
- **Both pose estimation AND metric depth**
- Real-world metric scale reconstruction
- **Recommended for BA validation and fine-tuning**

**Best For**:

- ✅ **BA validation** (needs metric depth + poses)
- ✅ **Fine-tuning workflows** (needs metric depth + poses)
- ✅ Metric reconstruction at real-world scale
- ✅ When you need both pose and metric depth

## YLFF Recommendations

### For BA Validation

**Recommended**: `DA3NESTED-GIANT-LARGE`

**Why**:

- Provides both camera poses and metric depth
- Metric depth enables proper comparison with BA (real-world scale)
- Best accuracy for validation workflows

**Usage**:

```bash
# Auto-selects DA3NESTED-GIANT-LARGE
ylff validate arkit assets/examples/ARKit

# Or explicitly specify
ylff validate arkit assets/examples/ARKit \
    --model-name depth-anything/DA3NESTED-GIANT-LARGE
```

### For Fine-Tuning

**Recommended**: `DA3NESTED-GIANT-LARGE`

**Why**:

- Fine-tuning benefits from metric depth (real-world scale)
- Pose estimation needed for training
- Best starting point for improvement

**Usage**:

```bash
# Auto-selects DA3NESTED-GIANT-LARGE
ylff train start data/training

# Or explicitly specify
ylff train start data/training \
    --model-name depth-anything/DA3NESTED-GIANT-LARGE
```

### For Fast Experimentation

**Recommended**: `DA3-LARGE` or `DA3-BASE`

**Why**:

- Faster inference
- Still provides pose estimation
- Good for quick tests

**Usage**:

```bash
ylff validate sequence path/to/images \
    --model-name depth-anything/DA3-BASE
```

### For Metric Depth Only

**Recommended**: `DA3Metric-LARGE`

**Why**:

- Specialized for metric depth
- Best accuracy for metric-only tasks

**Note**: This model does **not** provide pose estimation. Use with external pose sources.

## Model Comparison

| Model                 | Pose Est. | Metric Depth | Speed   | Quality | Use Case                       |
| --------------------- | --------- | ------------ | ------- | ------- | ------------------------------ |
| DA3NESTED-GIANT-LARGE | ✅        | ✅           | Medium  | Best    | **BA validation, fine-tuning** |
| DA3-GIANT             | ✅        | ❌           | Slow    | Best    | Best quality, non-metric       |
| DA3-LARGE             | ✅        | ❌           | Medium  | High    | General purpose                |
| DA3-BASE              | ✅        | ❌           | Fast    | Good    | Fast iteration                 |
| DA3-SMALL             | ✅        | ❌           | Fastest | Good    | Fastest                        |
| DA3Metric-LARGE       | ❌        | ✅           | Medium  | High    | Metric depth only              |
| DA3Mono-LARGE         | ❌        | ❌           | Medium  | High    | Monocular depth only           |

## Auto-Selection

YLFF automatically selects the best model for each use case:

```python
from ylff.models import get_recommended_model

# For BA validation
model = get_recommended_model("ba_validation")
# Returns: "depth-anything/DA3NESTED-GIANT-LARGE"

# For fine-tuning
model = get_recommended_model("fine_tuning")
# Returns: "depth-anything/DA3NESTED-GIANT-LARGE"

# For fast inference
model = get_recommended_model("fast")
# Returns: "depth-anything/DA3-SMALL"
```

## CLI Usage

### Auto-Select Model

```bash
# YLFF auto-selects DA3NESTED-GIANT-LARGE for BA validation
ylff validate arkit assets/examples/ARKit

# YLFF auto-selects DA3NESTED-GIANT-LARGE for fine-tuning
ylff train start data/training
```

### Explicit Model Selection

```bash
# Use specific model
ylff validate arkit assets/examples/ARKit \
    --model-name depth-anything/DA3-LARGE

# Use smaller model for speed
ylff validate sequence path/to/images \
    --model-name depth-anything/DA3-BASE
```

### List Available Models

```python
from ylff.models import list_available_models, get_model_info

# List all models
models = list_available_models()
for name, info in models.items():
    print(f"{name}: {info['description']}")

# Get specific model info
info = get_model_info("depth-anything/DA3NESTED-GIANT-LARGE")
print(info['capabilities'])
print(info['recommended_for'])
```

## Why DA3NESTED-GIANT-LARGE for BA Validation?

1. **Metric Depth**: BA works in real-world scale. Metric depth enables proper comparison.

2. **Pose Estimation**: BA validation compares predicted poses with BA-refined poses. Need pose estimation capability.

3. **Accuracy**: Nested model combines best of both worlds (giant model quality + metric specialization).

4. **Consistency**: Using metric depth ensures depth values are in real-world units, matching BA's output scale.

## Performance Considerations

- **DA3NESTED-GIANT-LARGE**: Slower but most accurate for BA workflows
- **DA3-LARGE**: Good balance for experimentation
- **DA3-BASE**: Faster, good for quick tests
- **DA3-SMALL**: Fastest, acceptable quality for rapid iteration

## Migration Guide

If you were using `DA3-LARGE` before:

```bash
# Old (still works)
ylff validate arkit assets/examples/ARKit \
    --model-name depth-anything/DA3-LARGE

# New (recommended, auto-selected)
ylff validate arkit assets/examples/ARKit
# Automatically uses DA3NESTED-GIANT-LARGE
```

The new default provides better results for BA validation due to metric depth support.