| # ARKit Integration Guide | |
| ## Overview | |
| The ARKit integration allows us to: | |
| 1. Use ARKit poses as **ground truth** for evaluating DA3 and BA | |
| 2. Compare DA3 poses vs ARKit poses (VIO-based) | |
| 3. Compare BA poses vs ARKit poses | |
| 4. Use ARKit intrinsics for more accurate BA | |
| ## ARKit Data Structure | |
| ### Metadata JSON Format | |
| ```json | |
| { | |
| "frames": [ | |
| { | |
| "camera": { | |
| "viewMatrix": [[...]], // 4x4 camera-to-world transform | |
| "intrinsics": [[...]], // 3x3 camera intrinsics | |
| "trackingState": "limited", // "normal", "limited", "notAvailable" | |
| "trackingStateReason": "initializing" // "normal", "initializing", "relocalizing" | |
| }, | |
| "featurePointCount": 0, | |
| "worldMappingStatus": "notAvailable", | |
| "timestamp": 1764913298.01684, | |
| "frameIndex": 0 | |
| } | |
| ] | |
| } | |
| ``` | |
| ### Key Fields | |
| - **viewMatrix**: 4x4 camera-to-world transformation (ARKit convention) | |
| - **intrinsics**: 3x3 camera intrinsics matrix (fx, fy, cx, cy) | |
| - **trackingState**: Overall tracking quality | |
| - **trackingStateReason**: Why tracking is in current state | |
| - **featurePointCount**: Number of tracked feature points (may be 0 in metadata) | |
| ## Usage | |
| ### Basic Processing | |
| ```python | |
| from ylff.arkit_processor import ARKitProcessor | |
| from pathlib import Path | |
| # Initialize processor | |
| processor = ARKitProcessor( | |
| video_path=Path("arkit/video.MOV"), | |
| metadata_path=Path("arkit/metadata.json") | |
| ) | |
| # Process for BA validation | |
| arkit_data = processor.process_for_ba_validation( | |
| output_dir=Path("output"), | |
| max_frames=50, | |
| frame_interval=1, | |
| use_good_tracking_only=False, # Use all frames if tracking is limited | |
| ) | |
| # Extract data | |
| image_paths = arkit_data['image_paths'] | |
| arkit_poses_c2w = arkit_data['arkit_poses_c2w'] # 4x4 camera-to-world | |
| arkit_poses_w2c = arkit_data['arkit_poses_w2c'] # 3x4 world-to-camera (DA3 format) | |
| arkit_intrinsics = arkit_data['arkit_intrinsics'] # 3x3 | |
| ``` | |
| ### Running BA Validation | |
| ```bash | |
| python scripts/run_arkit_ba_validation.py \ | |
| --arkit-dir assets/examples/ARKit \ | |
| --output-dir data/arkit_ba_validation \ | |
| --max-frames 30 \ | |
| --frame-interval 1 \ | |
| --device cpu | |
| ``` | |
| This script will: | |
| 1. Extract frames from ARKit video | |
| 2. Parse ARKit poses and intrinsics | |
| 3. Run DA3 inference | |
| 4. Compare DA3 vs ARKit (ground truth) | |
| 5. Run BA validation | |
| 6. Compare BA vs ARKit (ground truth) | |
| 7. Compare DA3 vs BA | |
| 8. Save results to JSON | |
| ## Coordinate System Conversion | |
| ARKit uses **camera-to-world** (c2w) convention: | |
| - `viewMatrix`: 4x4 c2w transform | |
| - Right-handed coordinate system | |
| - Y-up convention | |
| DA3 uses **world-to-camera** (w2c) convention: | |
| - `extrinsics`: 3x4 w2c transform | |
| - OpenCV convention (typically) | |
| The `ARKitProcessor` automatically converts: | |
| ```python | |
| w2c_poses = processor.convert_arkit_to_w2c(c2w_poses) # (N, 3, 4) | |
| ``` | |
| ## Evaluation Metrics | |
| The validation script computes: | |
| 1. **DA3 vs ARKit**: | |
| - Rotation error (degrees) | |
| - Translation error | |
| - Shows how well DA3 matches ARKit VIO | |
| 2. **BA vs ARKit**: | |
| - Rotation error (degrees) | |
| - Translation error | |
| - Shows how well BA matches ARKit VIO | |
| 3. **DA3 vs BA**: | |
| - Rotation error (degrees) | |
| - Shows agreement between DA3 and BA | |
| ## Notes | |
| - ARKit poses are VIO-based (Visual-Inertial Odometry) | |
| - They may drift over long sequences | |
| - For short sequences (< 1 minute), ARKit poses are very accurate | |
| - Feature point counts may be 0 in metadata (not always included) | |
| - Tracking state "limited" is acceptable for short sequences | |
| ## Example Output | |
| ``` | |
| === Comparing DA3 vs ARKit (Ground Truth) === | |
| DA3 vs ARKit: | |
| Mean rotation error: 2.45° | |
| Max rotation error: 8.32° | |
| Mean translation error: 0.12 | |
| === Comparing BA vs ARKit (Ground Truth) === | |
| BA vs ARKit: | |
| Mean rotation error: 1.23° | |
| Max rotation error: 3.45° | |
| Mean translation error: 0.08 | |
| === Comparing DA3 vs BA === | |
| DA3 vs BA: | |
| Mean rotation error: 1.89° | |
| Max rotation error: 5.67° | |
| ``` | |
| This shows: | |
| - DA3 is within ~2.5° of ARKit (good) | |
| - BA is within ~1.2° of ARKit (better, as expected) | |
| - DA3 and BA agree within ~1.9° (reasonable) | |