# Oracle Ensemble: Multi-Source Validation and Rejection ## 🎯 Overview The Oracle Ensemble system uses **all available oracle sources** (ARKit poses, BA poses, LiDAR depth, IMU data) to create high-confidence training masks by **rejecting DA3 predictions where oracles disagree**. This enables training only on pixels/points where multiple independent sources agree, resulting in higher-quality supervision. ## 🔍 Core Concept Instead of choosing one oracle source, we use **all of them together**: ``` For each DA3 prediction: ├─ Compare with ARKit poses (VIO) ├─ Compare with BA poses (multi-view geometry) ├─ Compare with LiDAR depth (direct ToF) ├─ Check geometric consistency (reprojection error) └─ Check IMU consistency (motion matches sensors) → Create confidence mask: Only train on pixels where oracles agree ``` ## 📊 Oracle Sources and Accuracy ### 1. ARKit Poses (VIO) - **Accuracy**: <1° rotation, <5cm translation (when tracking is good) - **Coverage**: Frame-level (all pixels in frame) - **Trust Level**: High (0.8) when tracking is "normal" - **Limitations**: Drift over long sequences, poor when tracking fails ### 2. BA Poses (Multi-View Geometry) - **Accuracy**: <0.5° rotation, <2cm translation (after optimization) - **Coverage**: Frame-level (all pixels in frame) - **Trust Level**: Highest (0.9) - most robust - **Limitations**: Requires good feature matching, slower computation ### 3. LiDAR Depth (Time-of-Flight) - **Accuracy**: ±1-2cm absolute error - **Coverage**: Pixel-level (sparse, ~10-30% of pixels) - **Trust Level**: Very High (0.95) - direct measurement - **Limitations**: Sparse coverage, only available on LiDAR-enabled devices ### 4. Geometric Consistency - **Accuracy**: <2 pixels reprojection error - **Coverage**: Pixel-level (all pixels) - **Trust Level**: High (0.85) - enforces epipolar geometry - **Limitations**: Requires good depth predictions ### 5. IMU Data (Motion Sensors) - **Accuracy**: Velocity ±0.5 m/s, angular velocity ±0.1 rad/s - **Coverage**: Frame-level (motion between frames) - **Trust Level**: Medium (0.7) - indirect but useful - **Limitations**: Requires integration, may not be in ARKit metadata ## 🎚️ Confidence Mask Generation ### Agreement Scoring For each pixel/frame, compute agreement score: ```python agreement_score = weighted_sum(oracle_votes) / total_weight where: - oracle_votes: 1 if oracle agrees, 0 if disagrees - weights: Trust level of each oracle (0.7-0.95) ``` ### Rejection Strategy **Per-Pixel Rejection:** - Reject pixels where `agreement_score < min_agreement_ratio` (default: 0.7) - Only train on pixels where ≥70% of oracles agree **Per-Frame Rejection:** - Reject entire frames if pose agreement is too low - Useful for sequences with tracking failures ### Confidence Mask ```python confidence_mask = { 'pose_confidence': (N,) frame-level scores [0.0-1.0] 'depth_confidence': (N, H, W) pixel-level scores [0.0-1.0] 'rejection_mask': (N, H, W) bool - pixels to reject 'agreement_scores': (N, H, W) fraction of oracles that agree } ``` ## 🚀 Usage ### Basic Usage ```python from ylff.utils.oracle_ensemble import OracleEnsemble # Initialize ensemble ensemble = OracleEnsemble( pose_rotation_threshold=2.0, # degrees pose_translation_threshold=0.05, # meters depth_relative_threshold=0.1, # 10% relative error min_agreement_ratio=0.7, # Require 70% agreement ) # Validate DA3 predictions results = ensemble.validate_da3_predictions( da3_poses=da3_poses, # (N, 3, 4) w2c da3_depth=da3_depth, # (N, H, W) intrinsics=intrinsics, # (N, 3, 3) arkit_poses=arkit_poses_c2w, # (N, 4, 4) c2w ba_poses=ba_poses_w2c, # (N, 3, 4) w2c lidar_depth=lidar_depth, # (N, H, W) optional ) # Get confidence masks confidence_mask = results['confidence_mask'] # (N, H, W) rejection_mask = results['rejection_mask'] # (N, H, W) bool ``` ### Training with Oracle Ensemble ```python from ylff.utils.oracle_losses import oracle_ensemble_loss # Compute loss with confidence weighting loss_dict = oracle_ensemble_loss( da3_output={ 'poses': predicted_poses, # (N, 3, 4) 'depth': predicted_depth, # (N, H, W) }, oracle_targets={ 'poses': target_poses, # (N, 3, 4) 'depth': target_depth, # (N, H, W) }, confidence_masks={ 'pose_confidence': frame_confidence, # (N,) 'depth_confidence': pixel_confidence, # (N, H, W) }, min_confidence=0.7, # Only train on high-confidence pixels ) total_loss = loss_dict['total_loss'] ``` ## 📈 Expected Results ### Training Quality **With Oracle Ensemble:** - ✅ Only trains on pixels where multiple oracles agree - ✅ Rejects noisy/incorrect DA3 predictions - ✅ Higher-quality supervision signal - ✅ Better generalization **Typical Rejection Rates:** - 20-40% of pixels rejected (oracles disagree) - 5-15% of frames rejected (poor pose agreement) - Higher rejection in challenging scenes (low texture, motion blur) ### Performance Impact **Processing Time:** - Oracle validation: +10-20% overhead - Training: Faster convergence (better supervision) - Overall: Net positive (better quality > slight overhead) ## ⚙️ Configuration ### Thresholds ```python ensemble = OracleEnsemble( # Pose agreement pose_rotation_threshold=2.0, # degrees - stricter = more rejections pose_translation_threshold=0.05, # meters (5cm) # Depth agreement depth_relative_threshold=0.1, # 10% relative error depth_absolute_threshold=0.1, # 10cm absolute error # Geometric consistency reprojection_error_threshold=2.0, # pixels # IMU consistency imu_velocity_threshold=0.5, # m/s imu_angular_velocity_threshold=0.1, # rad/s # Minimum agreement min_agreement_ratio=0.7, # Require 70% of oracles to agree ) ``` ### Oracle Weights Customize trust levels: ```python ensemble = OracleEnsemble( oracle_weights={ 'arkit_pose': 0.8, # High trust when tracking is good 'ba_pose': 0.9, # Highest trust 'lidar_depth': 0.95, # Very high trust (direct measurement) 'imu': 0.7, # Medium trust 'geometric_consistency': 0.85, # High trust } ) ``` ## 🔬 Advanced Usage ### Per-Oracle Analysis ```python results = ensemble.validate_da3_predictions(...) # Individual oracle votes oracle_votes = results['oracle_votes'] arkit_agreement = oracle_votes['arkit_pose'] # (N, 1, 1) ba_agreement = oracle_votes['ba_pose'] # (N, 1, 1) lidar_agreement = oracle_votes['lidar_depth'] # (N, H, W) # Error metrics rotation_errors = results['rotation_errors'] # (N, 2) [arkit, ba] translation_errors = results['translation_errors'] # (N, 2) depth_relative_errors = results['relative_errors'] # (N, H, W) ``` ### Adaptive Thresholds Adjust thresholds based on scene difficulty: ```python # Easy scene (good tracking, high texture) ensemble_easy = OracleEnsemble( pose_rotation_threshold=1.0, # Stricter min_agreement_ratio=0.8, # Require more agreement ) # Hard scene (poor tracking, low texture) ensemble_hard = OracleEnsemble( pose_rotation_threshold=3.0, # More lenient min_agreement_ratio=0.6, # Require less agreement ) ``` ## 💡 Best Practices ### 1. Start Conservative Begin with strict thresholds, then relax if needed: ```python min_agreement_ratio=0.8 # Start high pose_rotation_threshold=1.0 # Stricter ``` ### 2. Monitor Rejection Rates Track how many pixels/frames are rejected: ```python rejection_rate = rejection_mask.sum() / rejection_mask.numel() logger.info(f"Rejection rate: {rejection_rate:.1%}") ``` ### 3. Use All Available Oracles Don't skip oracles - more sources = better validation: ```python # Always include all available sources results = ensemble.validate_da3_predictions( da3_poses=..., da3_depth=..., arkit_poses=arkit_poses, # Include if available ba_poses=ba_poses, # Include if available lidar_depth=lidar_depth, # Include if available ) ``` ### 4. Visualize Confidence Masks ```python import matplotlib.pyplot as plt # Visualize confidence plt.imshow(confidence_mask[0], cmap='hot') plt.colorbar(label='Confidence') plt.title('Oracle Agreement Confidence') ``` ## 🎓 Why This Works **Multiple Independent Sources:** - Each oracle has different failure modes - Agreement across multiple sources = high confidence - Disagreement = likely error in DA3 prediction **Confidence-Weighted Training:** - Train more on high-confidence pixels - Reject low-confidence pixels - Better supervision signal = better model **Robust to Oracle Failures:** - If one oracle fails, others can still validate - Weighted voting reduces impact of single failures - Minimum agreement ratio ensures consensus ## 📊 Statistics After processing, you'll see: ``` Oracle Ensemble Validation: - ARKit pose agreement: 85.2% of frames - BA pose agreement: 92.1% of frames - LiDAR depth agreement: 78.5% of pixels (where available) - Geometric consistency: 91.3% of pixels - Overall confidence: 0.87 (mean) - Rejection rate: 23.1% of pixels ``` This system enables **high-quality training** by only using pixels where multiple independent sources agree! 🚀