Oracle Ensemble: Multi-Source Validation and Rejection
π― Overview
The Oracle Ensemble system uses all available oracle sources (ARKit poses, BA poses, LiDAR depth, IMU data) to create high-confidence training masks by rejecting DA3 predictions where oracles disagree. This enables training only on pixels/points where multiple independent sources agree, resulting in higher-quality supervision.
π Core Concept
Instead of choosing one oracle source, we use all of them together:
For each DA3 prediction:
ββ Compare with ARKit poses (VIO)
ββ Compare with BA poses (multi-view geometry)
ββ Compare with LiDAR depth (direct ToF)
ββ Check geometric consistency (reprojection error)
ββ Check IMU consistency (motion matches sensors)
β Create confidence mask: Only train on pixels where oracles agree
π Oracle Sources and Accuracy
1. ARKit Poses (VIO)
- Accuracy: <1Β° rotation, <5cm translation (when tracking is good)
- Coverage: Frame-level (all pixels in frame)
- Trust Level: High (0.8) when tracking is "normal"
- Limitations: Drift over long sequences, poor when tracking fails
2. BA Poses (Multi-View Geometry)
- Accuracy: <0.5Β° rotation, <2cm translation (after optimization)
- Coverage: Frame-level (all pixels in frame)
- Trust Level: Highest (0.9) - most robust
- Limitations: Requires good feature matching, slower computation
3. LiDAR Depth (Time-of-Flight)
- Accuracy: Β±1-2cm absolute error
- Coverage: Pixel-level (sparse, ~10-30% of pixels)
- Trust Level: Very High (0.95) - direct measurement
- Limitations: Sparse coverage, only available on LiDAR-enabled devices
4. Geometric Consistency
- Accuracy: <2 pixels reprojection error
- Coverage: Pixel-level (all pixels)
- Trust Level: High (0.85) - enforces epipolar geometry
- Limitations: Requires good depth predictions
5. IMU Data (Motion Sensors)
- Accuracy: Velocity Β±0.5 m/s, angular velocity Β±0.1 rad/s
- Coverage: Frame-level (motion between frames)
- Trust Level: Medium (0.7) - indirect but useful
- Limitations: Requires integration, may not be in ARKit metadata
ποΈ Confidence Mask Generation
Agreement Scoring
For each pixel/frame, compute agreement score:
agreement_score = weighted_sum(oracle_votes) / total_weight
where:
- oracle_votes: 1 if oracle agrees, 0 if disagrees
- weights: Trust level of each oracle (0.7-0.95)
Rejection Strategy
Per-Pixel Rejection:
- Reject pixels where
agreement_score < min_agreement_ratio(default: 0.7) - Only train on pixels where β₯70% of oracles agree
Per-Frame Rejection:
- Reject entire frames if pose agreement is too low
- Useful for sequences with tracking failures
Confidence Mask
confidence_mask = {
'pose_confidence': (N,) frame-level scores [0.0-1.0]
'depth_confidence': (N, H, W) pixel-level scores [0.0-1.0]
'rejection_mask': (N, H, W) bool - pixels to reject
'agreement_scores': (N, H, W) fraction of oracles that agree
}
π Usage
Basic Usage
from ylff.utils.oracle_ensemble import OracleEnsemble
# Initialize ensemble
ensemble = OracleEnsemble(
pose_rotation_threshold=2.0, # degrees
pose_translation_threshold=0.05, # meters
depth_relative_threshold=0.1, # 10% relative error
min_agreement_ratio=0.7, # Require 70% agreement
)
# Validate DA3 predictions
results = ensemble.validate_da3_predictions(
da3_poses=da3_poses, # (N, 3, 4) w2c
da3_depth=da3_depth, # (N, H, W)
intrinsics=intrinsics, # (N, 3, 3)
arkit_poses=arkit_poses_c2w, # (N, 4, 4) c2w
ba_poses=ba_poses_w2c, # (N, 3, 4) w2c
lidar_depth=lidar_depth, # (N, H, W) optional
)
# Get confidence masks
confidence_mask = results['confidence_mask'] # (N, H, W)
rejection_mask = results['rejection_mask'] # (N, H, W) bool
Training with Oracle Ensemble
from ylff.utils.oracle_losses import oracle_ensemble_loss
# Compute loss with confidence weighting
loss_dict = oracle_ensemble_loss(
da3_output={
'poses': predicted_poses, # (N, 3, 4)
'depth': predicted_depth, # (N, H, W)
},
oracle_targets={
'poses': target_poses, # (N, 3, 4)
'depth': target_depth, # (N, H, W)
},
confidence_masks={
'pose_confidence': frame_confidence, # (N,)
'depth_confidence': pixel_confidence, # (N, H, W)
},
min_confidence=0.7, # Only train on high-confidence pixels
)
total_loss = loss_dict['total_loss']
π Expected Results
Training Quality
With Oracle Ensemble:
- β Only trains on pixels where multiple oracles agree
- β Rejects noisy/incorrect DA3 predictions
- β Higher-quality supervision signal
- β Better generalization
Typical Rejection Rates:
- 20-40% of pixels rejected (oracles disagree)
- 5-15% of frames rejected (poor pose agreement)
- Higher rejection in challenging scenes (low texture, motion blur)
Performance Impact
Processing Time:
- Oracle validation: +10-20% overhead
- Training: Faster convergence (better supervision)
- Overall: Net positive (better quality > slight overhead)
βοΈ Configuration
Thresholds
ensemble = OracleEnsemble(
# Pose agreement
pose_rotation_threshold=2.0, # degrees - stricter = more rejections
pose_translation_threshold=0.05, # meters (5cm)
# Depth agreement
depth_relative_threshold=0.1, # 10% relative error
depth_absolute_threshold=0.1, # 10cm absolute error
# Geometric consistency
reprojection_error_threshold=2.0, # pixels
# IMU consistency
imu_velocity_threshold=0.5, # m/s
imu_angular_velocity_threshold=0.1, # rad/s
# Minimum agreement
min_agreement_ratio=0.7, # Require 70% of oracles to agree
)
Oracle Weights
Customize trust levels:
ensemble = OracleEnsemble(
oracle_weights={
'arkit_pose': 0.8, # High trust when tracking is good
'ba_pose': 0.9, # Highest trust
'lidar_depth': 0.95, # Very high trust (direct measurement)
'imu': 0.7, # Medium trust
'geometric_consistency': 0.85, # High trust
}
)
π¬ Advanced Usage
Per-Oracle Analysis
results = ensemble.validate_da3_predictions(...)
# Individual oracle votes
oracle_votes = results['oracle_votes']
arkit_agreement = oracle_votes['arkit_pose'] # (N, 1, 1)
ba_agreement = oracle_votes['ba_pose'] # (N, 1, 1)
lidar_agreement = oracle_votes['lidar_depth'] # (N, H, W)
# Error metrics
rotation_errors = results['rotation_errors'] # (N, 2) [arkit, ba]
translation_errors = results['translation_errors'] # (N, 2)
depth_relative_errors = results['relative_errors'] # (N, H, W)
Adaptive Thresholds
Adjust thresholds based on scene difficulty:
# Easy scene (good tracking, high texture)
ensemble_easy = OracleEnsemble(
pose_rotation_threshold=1.0, # Stricter
min_agreement_ratio=0.8, # Require more agreement
)
# Hard scene (poor tracking, low texture)
ensemble_hard = OracleEnsemble(
pose_rotation_threshold=3.0, # More lenient
min_agreement_ratio=0.6, # Require less agreement
)
π‘ Best Practices
1. Start Conservative
Begin with strict thresholds, then relax if needed:
min_agreement_ratio=0.8 # Start high
pose_rotation_threshold=1.0 # Stricter
2. Monitor Rejection Rates
Track how many pixels/frames are rejected:
rejection_rate = rejection_mask.sum() / rejection_mask.numel()
logger.info(f"Rejection rate: {rejection_rate:.1%}")
3. Use All Available Oracles
Don't skip oracles - more sources = better validation:
# Always include all available sources
results = ensemble.validate_da3_predictions(
da3_poses=...,
da3_depth=...,
arkit_poses=arkit_poses, # Include if available
ba_poses=ba_poses, # Include if available
lidar_depth=lidar_depth, # Include if available
)
4. Visualize Confidence Masks
import matplotlib.pyplot as plt
# Visualize confidence
plt.imshow(confidence_mask[0], cmap='hot')
plt.colorbar(label='Confidence')
plt.title('Oracle Agreement Confidence')
π Why This Works
Multiple Independent Sources:
- Each oracle has different failure modes
- Agreement across multiple sources = high confidence
- Disagreement = likely error in DA3 prediction
Confidence-Weighted Training:
- Train more on high-confidence pixels
- Reject low-confidence pixels
- Better supervision signal = better model
Robust to Oracle Failures:
- If one oracle fails, others can still validate
- Weighted voting reduces impact of single failures
- Minimum agreement ratio ensures consensus
π Statistics
After processing, you'll see:
Oracle Ensemble Validation:
- ARKit pose agreement: 85.2% of frames
- BA pose agreement: 92.1% of frames
- LiDAR depth agreement: 78.5% of pixels (where available)
- Geometric consistency: 91.3% of pixels
- Overall confidence: 0.87 (mean)
- Rejection rate: 23.1% of pixels
This system enables high-quality training by only using pixels where multiple independent sources agree! π