Spaces:

azan888
/

3d_model

Running

App Files Files Community

3d_model / docs /ARKIT_INTEGRATION.md

Azan

Clean deployment build (Squashed)

7a87926 20 days ago

preview code

raw

history blame contribute delete

4.07 kB

ARKit Integration Guide

Overview

The ARKit integration allows us to:

Use ARKit poses as ground truth for evaluating DA3 and BA
Compare DA3 poses vs ARKit poses (VIO-based)
Compare BA poses vs ARKit poses
Use ARKit intrinsics for more accurate BA

ARKit Data Structure

Metadata JSON Format

{
  "frames": [
    {
      "camera": {
        "viewMatrix": [[...]],      // 4x4 camera-to-world transform
        "intrinsics": [[...]],      // 3x3 camera intrinsics
        "trackingState": "limited", // "normal", "limited", "notAvailable"
        "trackingStateReason": "initializing" // "normal", "initializing", "relocalizing"
      },
      "featurePointCount": 0,
      "worldMappingStatus": "notAvailable",
      "timestamp": 1764913298.01684,
      "frameIndex": 0
    }
  ]
}

Key Fields

viewMatrix: 4x4 camera-to-world transformation (ARKit convention)
intrinsics: 3x3 camera intrinsics matrix (fx, fy, cx, cy)
trackingState: Overall tracking quality
trackingStateReason: Why tracking is in current state
featurePointCount: Number of tracked feature points (may be 0 in metadata)

Usage

Basic Processing

from ylff.arkit_processor import ARKitProcessor
from pathlib import Path

# Initialize processor
processor = ARKitProcessor(
    video_path=Path("arkit/video.MOV"),
    metadata_path=Path("arkit/metadata.json")
)

# Process for BA validation
arkit_data = processor.process_for_ba_validation(
    output_dir=Path("output"),
    max_frames=50,
    frame_interval=1,
    use_good_tracking_only=False,  # Use all frames if tracking is limited
)

# Extract data
image_paths = arkit_data['image_paths']
arkit_poses_c2w = arkit_data['arkit_poses_c2w']  # 4x4 camera-to-world
arkit_poses_w2c = arkit_data['arkit_poses_w2c']  # 3x4 world-to-camera (DA3 format)
arkit_intrinsics = arkit_data['arkit_intrinsics']  # 3x3

Running BA Validation

python scripts/run_arkit_ba_validation.py \
    --arkit-dir assets/examples/ARKit \
    --output-dir data/arkit_ba_validation \
    --max-frames 30 \
    --frame-interval 1 \
    --device cpu

This script will:

Extract frames from ARKit video
Parse ARKit poses and intrinsics
Run DA3 inference
Compare DA3 vs ARKit (ground truth)
Run BA validation
Compare BA vs ARKit (ground truth)
Compare DA3 vs BA
Save results to JSON

Coordinate System Conversion

ARKit uses camera-to-world (c2w) convention:

viewMatrix: 4x4 c2w transform
Right-handed coordinate system
Y-up convention

DA3 uses world-to-camera (w2c) convention:

extrinsics: 3x4 w2c transform
OpenCV convention (typically)

The ARKitProcessor automatically converts:

w2c_poses = processor.convert_arkit_to_w2c(c2w_poses)  # (N, 3, 4)

Evaluation Metrics

The validation script computes:

DA3 vs ARKit:
- Rotation error (degrees)
- Translation error
- Shows how well DA3 matches ARKit VIO
BA vs ARKit:
- Rotation error (degrees)
- Translation error
- Shows how well BA matches ARKit VIO
DA3 vs BA:
- Rotation error (degrees)
- Shows agreement between DA3 and BA

Notes

ARKit poses are VIO-based (Visual-Inertial Odometry)
They may drift over long sequences
For short sequences (< 1 minute), ARKit poses are very accurate
Feature point counts may be 0 in metadata (not always included)
Tracking state "limited" is acceptable for short sequences

Example Output

=== Comparing DA3 vs ARKit (Ground Truth) ===
DA3 vs ARKit:
  Mean rotation error: 2.45°
  Max rotation error: 8.32°
  Mean translation error: 0.12

=== Comparing BA vs ARKit (Ground Truth) ===
BA vs ARKit:
  Mean rotation error: 1.23°
  Max rotation error: 3.45°
  Mean translation error: 0.08

=== Comparing DA3 vs BA ===
DA3 vs BA:
  Mean rotation error: 1.89°
  Max rotation error: 5.67°

This shows:

DA3 is within ~2.5° of ARKit (good)
BA is within ~1.2° of ARKit (better, as expected)
DA3 and BA agree within ~1.9° (reasonable)