3d_model / docs /ORACLE_ENSEMBLE.md
Azan
Clean deployment build (Squashed)
7a87926

Oracle Ensemble: Multi-Source Validation and Rejection

🎯 Overview

The Oracle Ensemble system uses all available oracle sources (ARKit poses, BA poses, LiDAR depth, IMU data) to create high-confidence training masks by rejecting DA3 predictions where oracles disagree. This enables training only on pixels/points where multiple independent sources agree, resulting in higher-quality supervision.

πŸ” Core Concept

Instead of choosing one oracle source, we use all of them together:

For each DA3 prediction:
  β”œβ”€ Compare with ARKit poses (VIO)
  β”œβ”€ Compare with BA poses (multi-view geometry)
  β”œβ”€ Compare with LiDAR depth (direct ToF)
  β”œβ”€ Check geometric consistency (reprojection error)
  └─ Check IMU consistency (motion matches sensors)

  β†’ Create confidence mask: Only train on pixels where oracles agree

πŸ“Š Oracle Sources and Accuracy

1. ARKit Poses (VIO)

  • Accuracy: <1Β° rotation, <5cm translation (when tracking is good)
  • Coverage: Frame-level (all pixels in frame)
  • Trust Level: High (0.8) when tracking is "normal"
  • Limitations: Drift over long sequences, poor when tracking fails

2. BA Poses (Multi-View Geometry)

  • Accuracy: <0.5Β° rotation, <2cm translation (after optimization)
  • Coverage: Frame-level (all pixels in frame)
  • Trust Level: Highest (0.9) - most robust
  • Limitations: Requires good feature matching, slower computation

3. LiDAR Depth (Time-of-Flight)

  • Accuracy: Β±1-2cm absolute error
  • Coverage: Pixel-level (sparse, ~10-30% of pixels)
  • Trust Level: Very High (0.95) - direct measurement
  • Limitations: Sparse coverage, only available on LiDAR-enabled devices

4. Geometric Consistency

  • Accuracy: <2 pixels reprojection error
  • Coverage: Pixel-level (all pixels)
  • Trust Level: High (0.85) - enforces epipolar geometry
  • Limitations: Requires good depth predictions

5. IMU Data (Motion Sensors)

  • Accuracy: Velocity Β±0.5 m/s, angular velocity Β±0.1 rad/s
  • Coverage: Frame-level (motion between frames)
  • Trust Level: Medium (0.7) - indirect but useful
  • Limitations: Requires integration, may not be in ARKit metadata

🎚️ Confidence Mask Generation

Agreement Scoring

For each pixel/frame, compute agreement score:

agreement_score = weighted_sum(oracle_votes) / total_weight

where:
  - oracle_votes: 1 if oracle agrees, 0 if disagrees
  - weights: Trust level of each oracle (0.7-0.95)

Rejection Strategy

Per-Pixel Rejection:

  • Reject pixels where agreement_score < min_agreement_ratio (default: 0.7)
  • Only train on pixels where β‰₯70% of oracles agree

Per-Frame Rejection:

  • Reject entire frames if pose agreement is too low
  • Useful for sequences with tracking failures

Confidence Mask

confidence_mask = {
    'pose_confidence': (N,) frame-level scores [0.0-1.0]
    'depth_confidence': (N, H, W) pixel-level scores [0.0-1.0]
    'rejection_mask': (N, H, W) bool - pixels to reject
    'agreement_scores': (N, H, W) fraction of oracles that agree
}

πŸš€ Usage

Basic Usage

from ylff.utils.oracle_ensemble import OracleEnsemble

# Initialize ensemble
ensemble = OracleEnsemble(
    pose_rotation_threshold=2.0,  # degrees
    pose_translation_threshold=0.05,  # meters
    depth_relative_threshold=0.1,  # 10% relative error
    min_agreement_ratio=0.7,  # Require 70% agreement
)

# Validate DA3 predictions
results = ensemble.validate_da3_predictions(
    da3_poses=da3_poses,  # (N, 3, 4) w2c
    da3_depth=da3_depth,  # (N, H, W)
    intrinsics=intrinsics,  # (N, 3, 3)
    arkit_poses=arkit_poses_c2w,  # (N, 4, 4) c2w
    ba_poses=ba_poses_w2c,  # (N, 3, 4) w2c
    lidar_depth=lidar_depth,  # (N, H, W) optional
)

# Get confidence masks
confidence_mask = results['confidence_mask']  # (N, H, W)
rejection_mask = results['rejection_mask']  # (N, H, W) bool

Training with Oracle Ensemble

from ylff.utils.oracle_losses import oracle_ensemble_loss

# Compute loss with confidence weighting
loss_dict = oracle_ensemble_loss(
    da3_output={
        'poses': predicted_poses,  # (N, 3, 4)
        'depth': predicted_depth,  # (N, H, W)
    },
    oracle_targets={
        'poses': target_poses,  # (N, 3, 4)
        'depth': target_depth,  # (N, H, W)
    },
    confidence_masks={
        'pose_confidence': frame_confidence,  # (N,)
        'depth_confidence': pixel_confidence,  # (N, H, W)
    },
    min_confidence=0.7,  # Only train on high-confidence pixels
)

total_loss = loss_dict['total_loss']

πŸ“ˆ Expected Results

Training Quality

With Oracle Ensemble:

  • βœ… Only trains on pixels where multiple oracles agree
  • βœ… Rejects noisy/incorrect DA3 predictions
  • βœ… Higher-quality supervision signal
  • βœ… Better generalization

Typical Rejection Rates:

  • 20-40% of pixels rejected (oracles disagree)
  • 5-15% of frames rejected (poor pose agreement)
  • Higher rejection in challenging scenes (low texture, motion blur)

Performance Impact

Processing Time:

  • Oracle validation: +10-20% overhead
  • Training: Faster convergence (better supervision)
  • Overall: Net positive (better quality > slight overhead)

βš™οΈ Configuration

Thresholds

ensemble = OracleEnsemble(
    # Pose agreement
    pose_rotation_threshold=2.0,  # degrees - stricter = more rejections
    pose_translation_threshold=0.05,  # meters (5cm)

    # Depth agreement
    depth_relative_threshold=0.1,  # 10% relative error
    depth_absolute_threshold=0.1,  # 10cm absolute error

    # Geometric consistency
    reprojection_error_threshold=2.0,  # pixels

    # IMU consistency
    imu_velocity_threshold=0.5,  # m/s
    imu_angular_velocity_threshold=0.1,  # rad/s

    # Minimum agreement
    min_agreement_ratio=0.7,  # Require 70% of oracles to agree
)

Oracle Weights

Customize trust levels:

ensemble = OracleEnsemble(
    oracle_weights={
        'arkit_pose': 0.8,  # High trust when tracking is good
        'ba_pose': 0.9,  # Highest trust
        'lidar_depth': 0.95,  # Very high trust (direct measurement)
        'imu': 0.7,  # Medium trust
        'geometric_consistency': 0.85,  # High trust
    }
)

πŸ”¬ Advanced Usage

Per-Oracle Analysis

results = ensemble.validate_da3_predictions(...)

# Individual oracle votes
oracle_votes = results['oracle_votes']
arkit_agreement = oracle_votes['arkit_pose']  # (N, 1, 1)
ba_agreement = oracle_votes['ba_pose']  # (N, 1, 1)
lidar_agreement = oracle_votes['lidar_depth']  # (N, H, W)

# Error metrics
rotation_errors = results['rotation_errors']  # (N, 2) [arkit, ba]
translation_errors = results['translation_errors']  # (N, 2)
depth_relative_errors = results['relative_errors']  # (N, H, W)

Adaptive Thresholds

Adjust thresholds based on scene difficulty:

# Easy scene (good tracking, high texture)
ensemble_easy = OracleEnsemble(
    pose_rotation_threshold=1.0,  # Stricter
    min_agreement_ratio=0.8,  # Require more agreement
)

# Hard scene (poor tracking, low texture)
ensemble_hard = OracleEnsemble(
    pose_rotation_threshold=3.0,  # More lenient
    min_agreement_ratio=0.6,  # Require less agreement
)

πŸ’‘ Best Practices

1. Start Conservative

Begin with strict thresholds, then relax if needed:

min_agreement_ratio=0.8  # Start high
pose_rotation_threshold=1.0  # Stricter

2. Monitor Rejection Rates

Track how many pixels/frames are rejected:

rejection_rate = rejection_mask.sum() / rejection_mask.numel()
logger.info(f"Rejection rate: {rejection_rate:.1%}")

3. Use All Available Oracles

Don't skip oracles - more sources = better validation:

# Always include all available sources
results = ensemble.validate_da3_predictions(
    da3_poses=...,
    da3_depth=...,
    arkit_poses=arkit_poses,  # Include if available
    ba_poses=ba_poses,  # Include if available
    lidar_depth=lidar_depth,  # Include if available
)

4. Visualize Confidence Masks

import matplotlib.pyplot as plt

# Visualize confidence
plt.imshow(confidence_mask[0], cmap='hot')
plt.colorbar(label='Confidence')
plt.title('Oracle Agreement Confidence')

πŸŽ“ Why This Works

Multiple Independent Sources:

  • Each oracle has different failure modes
  • Agreement across multiple sources = high confidence
  • Disagreement = likely error in DA3 prediction

Confidence-Weighted Training:

  • Train more on high-confidence pixels
  • Reject low-confidence pixels
  • Better supervision signal = better model

Robust to Oracle Failures:

  • If one oracle fails, others can still validate
  • Weighted voting reduces impact of single failures
  • Minimum agreement ratio ensures consensus

πŸ“Š Statistics

After processing, you'll see:

Oracle Ensemble Validation:
  - ARKit pose agreement: 85.2% of frames
  - BA pose agreement: 92.1% of frames
  - LiDAR depth agreement: 78.5% of pixels (where available)
  - Geometric consistency: 91.3% of pixels
  - Overall confidence: 0.87 (mean)
  - Rejection rate: 23.1% of pixels

This system enables high-quality training by only using pixels where multiple independent sources agree! πŸš€