| # Oracle Ensemble: Multi-Source Validation and Rejection | |
| ## π― Overview | |
| The Oracle Ensemble system uses **all available oracle sources** (ARKit poses, BA poses, LiDAR depth, IMU data) to create high-confidence training masks by **rejecting DA3 predictions where oracles disagree**. This enables training only on pixels/points where multiple independent sources agree, resulting in higher-quality supervision. | |
| ## π Core Concept | |
| Instead of choosing one oracle source, we use **all of them together**: | |
| ``` | |
| For each DA3 prediction: | |
| ββ Compare with ARKit poses (VIO) | |
| ββ Compare with BA poses (multi-view geometry) | |
| ββ Compare with LiDAR depth (direct ToF) | |
| ββ Check geometric consistency (reprojection error) | |
| ββ Check IMU consistency (motion matches sensors) | |
| β Create confidence mask: Only train on pixels where oracles agree | |
| ``` | |
| ## π Oracle Sources and Accuracy | |
| ### 1. ARKit Poses (VIO) | |
| - **Accuracy**: <1Β° rotation, <5cm translation (when tracking is good) | |
| - **Coverage**: Frame-level (all pixels in frame) | |
| - **Trust Level**: High (0.8) when tracking is "normal" | |
| - **Limitations**: Drift over long sequences, poor when tracking fails | |
| ### 2. BA Poses (Multi-View Geometry) | |
| - **Accuracy**: <0.5Β° rotation, <2cm translation (after optimization) | |
| - **Coverage**: Frame-level (all pixels in frame) | |
| - **Trust Level**: Highest (0.9) - most robust | |
| - **Limitations**: Requires good feature matching, slower computation | |
| ### 3. LiDAR Depth (Time-of-Flight) | |
| - **Accuracy**: Β±1-2cm absolute error | |
| - **Coverage**: Pixel-level (sparse, ~10-30% of pixels) | |
| - **Trust Level**: Very High (0.95) - direct measurement | |
| - **Limitations**: Sparse coverage, only available on LiDAR-enabled devices | |
| ### 4. Geometric Consistency | |
| - **Accuracy**: <2 pixels reprojection error | |
| - **Coverage**: Pixel-level (all pixels) | |
| - **Trust Level**: High (0.85) - enforces epipolar geometry | |
| - **Limitations**: Requires good depth predictions | |
| ### 5. IMU Data (Motion Sensors) | |
| - **Accuracy**: Velocity Β±0.5 m/s, angular velocity Β±0.1 rad/s | |
| - **Coverage**: Frame-level (motion between frames) | |
| - **Trust Level**: Medium (0.7) - indirect but useful | |
| - **Limitations**: Requires integration, may not be in ARKit metadata | |
| ## ποΈ Confidence Mask Generation | |
| ### Agreement Scoring | |
| For each pixel/frame, compute agreement score: | |
| ```python | |
| agreement_score = weighted_sum(oracle_votes) / total_weight | |
| where: | |
| - oracle_votes: 1 if oracle agrees, 0 if disagrees | |
| - weights: Trust level of each oracle (0.7-0.95) | |
| ``` | |
| ### Rejection Strategy | |
| **Per-Pixel Rejection:** | |
| - Reject pixels where `agreement_score < min_agreement_ratio` (default: 0.7) | |
| - Only train on pixels where β₯70% of oracles agree | |
| **Per-Frame Rejection:** | |
| - Reject entire frames if pose agreement is too low | |
| - Useful for sequences with tracking failures | |
| ### Confidence Mask | |
| ```python | |
| confidence_mask = { | |
| 'pose_confidence': (N,) frame-level scores [0.0-1.0] | |
| 'depth_confidence': (N, H, W) pixel-level scores [0.0-1.0] | |
| 'rejection_mask': (N, H, W) bool - pixels to reject | |
| 'agreement_scores': (N, H, W) fraction of oracles that agree | |
| } | |
| ``` | |
| ## π Usage | |
| ### Basic Usage | |
| ```python | |
| from ylff.utils.oracle_ensemble import OracleEnsemble | |
| # Initialize ensemble | |
| ensemble = OracleEnsemble( | |
| pose_rotation_threshold=2.0, # degrees | |
| pose_translation_threshold=0.05, # meters | |
| depth_relative_threshold=0.1, # 10% relative error | |
| min_agreement_ratio=0.7, # Require 70% agreement | |
| ) | |
| # Validate DA3 predictions | |
| results = ensemble.validate_da3_predictions( | |
| da3_poses=da3_poses, # (N, 3, 4) w2c | |
| da3_depth=da3_depth, # (N, H, W) | |
| intrinsics=intrinsics, # (N, 3, 3) | |
| arkit_poses=arkit_poses_c2w, # (N, 4, 4) c2w | |
| ba_poses=ba_poses_w2c, # (N, 3, 4) w2c | |
| lidar_depth=lidar_depth, # (N, H, W) optional | |
| ) | |
| # Get confidence masks | |
| confidence_mask = results['confidence_mask'] # (N, H, W) | |
| rejection_mask = results['rejection_mask'] # (N, H, W) bool | |
| ``` | |
| ### Training with Oracle Ensemble | |
| ```python | |
| from ylff.utils.oracle_losses import oracle_ensemble_loss | |
| # Compute loss with confidence weighting | |
| loss_dict = oracle_ensemble_loss( | |
| da3_output={ | |
| 'poses': predicted_poses, # (N, 3, 4) | |
| 'depth': predicted_depth, # (N, H, W) | |
| }, | |
| oracle_targets={ | |
| 'poses': target_poses, # (N, 3, 4) | |
| 'depth': target_depth, # (N, H, W) | |
| }, | |
| confidence_masks={ | |
| 'pose_confidence': frame_confidence, # (N,) | |
| 'depth_confidence': pixel_confidence, # (N, H, W) | |
| }, | |
| min_confidence=0.7, # Only train on high-confidence pixels | |
| ) | |
| total_loss = loss_dict['total_loss'] | |
| ``` | |
| ## π Expected Results | |
| ### Training Quality | |
| **With Oracle Ensemble:** | |
| - β Only trains on pixels where multiple oracles agree | |
| - β Rejects noisy/incorrect DA3 predictions | |
| - β Higher-quality supervision signal | |
| - β Better generalization | |
| **Typical Rejection Rates:** | |
| - 20-40% of pixels rejected (oracles disagree) | |
| - 5-15% of frames rejected (poor pose agreement) | |
| - Higher rejection in challenging scenes (low texture, motion blur) | |
| ### Performance Impact | |
| **Processing Time:** | |
| - Oracle validation: +10-20% overhead | |
| - Training: Faster convergence (better supervision) | |
| - Overall: Net positive (better quality > slight overhead) | |
| ## βοΈ Configuration | |
| ### Thresholds | |
| ```python | |
| ensemble = OracleEnsemble( | |
| # Pose agreement | |
| pose_rotation_threshold=2.0, # degrees - stricter = more rejections | |
| pose_translation_threshold=0.05, # meters (5cm) | |
| # Depth agreement | |
| depth_relative_threshold=0.1, # 10% relative error | |
| depth_absolute_threshold=0.1, # 10cm absolute error | |
| # Geometric consistency | |
| reprojection_error_threshold=2.0, # pixels | |
| # IMU consistency | |
| imu_velocity_threshold=0.5, # m/s | |
| imu_angular_velocity_threshold=0.1, # rad/s | |
| # Minimum agreement | |
| min_agreement_ratio=0.7, # Require 70% of oracles to agree | |
| ) | |
| ``` | |
| ### Oracle Weights | |
| Customize trust levels: | |
| ```python | |
| ensemble = OracleEnsemble( | |
| oracle_weights={ | |
| 'arkit_pose': 0.8, # High trust when tracking is good | |
| 'ba_pose': 0.9, # Highest trust | |
| 'lidar_depth': 0.95, # Very high trust (direct measurement) | |
| 'imu': 0.7, # Medium trust | |
| 'geometric_consistency': 0.85, # High trust | |
| } | |
| ) | |
| ``` | |
| ## π¬ Advanced Usage | |
| ### Per-Oracle Analysis | |
| ```python | |
| results = ensemble.validate_da3_predictions(...) | |
| # Individual oracle votes | |
| oracle_votes = results['oracle_votes'] | |
| arkit_agreement = oracle_votes['arkit_pose'] # (N, 1, 1) | |
| ba_agreement = oracle_votes['ba_pose'] # (N, 1, 1) | |
| lidar_agreement = oracle_votes['lidar_depth'] # (N, H, W) | |
| # Error metrics | |
| rotation_errors = results['rotation_errors'] # (N, 2) [arkit, ba] | |
| translation_errors = results['translation_errors'] # (N, 2) | |
| depth_relative_errors = results['relative_errors'] # (N, H, W) | |
| ``` | |
| ### Adaptive Thresholds | |
| Adjust thresholds based on scene difficulty: | |
| ```python | |
| # Easy scene (good tracking, high texture) | |
| ensemble_easy = OracleEnsemble( | |
| pose_rotation_threshold=1.0, # Stricter | |
| min_agreement_ratio=0.8, # Require more agreement | |
| ) | |
| # Hard scene (poor tracking, low texture) | |
| ensemble_hard = OracleEnsemble( | |
| pose_rotation_threshold=3.0, # More lenient | |
| min_agreement_ratio=0.6, # Require less agreement | |
| ) | |
| ``` | |
| ## π‘ Best Practices | |
| ### 1. Start Conservative | |
| Begin with strict thresholds, then relax if needed: | |
| ```python | |
| min_agreement_ratio=0.8 # Start high | |
| pose_rotation_threshold=1.0 # Stricter | |
| ``` | |
| ### 2. Monitor Rejection Rates | |
| Track how many pixels/frames are rejected: | |
| ```python | |
| rejection_rate = rejection_mask.sum() / rejection_mask.numel() | |
| logger.info(f"Rejection rate: {rejection_rate:.1%}") | |
| ``` | |
| ### 3. Use All Available Oracles | |
| Don't skip oracles - more sources = better validation: | |
| ```python | |
| # Always include all available sources | |
| results = ensemble.validate_da3_predictions( | |
| da3_poses=..., | |
| da3_depth=..., | |
| arkit_poses=arkit_poses, # Include if available | |
| ba_poses=ba_poses, # Include if available | |
| lidar_depth=lidar_depth, # Include if available | |
| ) | |
| ``` | |
| ### 4. Visualize Confidence Masks | |
| ```python | |
| import matplotlib.pyplot as plt | |
| # Visualize confidence | |
| plt.imshow(confidence_mask[0], cmap='hot') | |
| plt.colorbar(label='Confidence') | |
| plt.title('Oracle Agreement Confidence') | |
| ``` | |
| ## π Why This Works | |
| **Multiple Independent Sources:** | |
| - Each oracle has different failure modes | |
| - Agreement across multiple sources = high confidence | |
| - Disagreement = likely error in DA3 prediction | |
| **Confidence-Weighted Training:** | |
| - Train more on high-confidence pixels | |
| - Reject low-confidence pixels | |
| - Better supervision signal = better model | |
| **Robust to Oracle Failures:** | |
| - If one oracle fails, others can still validate | |
| - Weighted voting reduces impact of single failures | |
| - Minimum agreement ratio ensures consensus | |
| ## π Statistics | |
| After processing, you'll see: | |
| ``` | |
| Oracle Ensemble Validation: | |
| - ARKit pose agreement: 85.2% of frames | |
| - BA pose agreement: 92.1% of frames | |
| - LiDAR depth agreement: 78.5% of pixels (where available) | |
| - Geometric consistency: 91.3% of pixels | |
| - Overall confidence: 0.87 (mean) | |
| - Rejection rate: 23.1% of pixels | |
| ``` | |
| This system enables **high-quality training** by only using pixels where multiple independent sources agree! π | |