foundationpose / README.md
Georg
Fix Dockerfile build errors and always use real model
dd44013
metadata
title: FoundationPose Inference
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
tags:
  - computer-vision
  - 6d-pose
  - object-detection
  - robotics
  - foundationpose

FoundationPose Inference Server

This Hugging Face Space provides 6D object pose estimation using FoundationPose with GPU support via Docker.

Features

  • 6D Pose Estimation: Detect object position and orientation in 3D space
  • Reference-based Tracking: Register objects using multiple reference images
  • REST API: Easy integration with robotics pipelines
  • ZeroGPU: On-demand GPU allocation for efficient inference

Usage

Web Interface

  1. Initialize Tab: Upload reference images of your object from different angles (16-20 recommended)
  2. Estimate Tab: Upload a query image to detect the object's 6D pose

HTTP API

Initialize Object

curl -X POST https://gpue-foundationpose.hf.space/api/initialize \
  -H "Content-Type: application/json" \
  -d '{
    "object_id": "target_cube",
    "reference_images_b64": ["<base64-jpeg>", ...],
    "camera_intrinsics": "{\"fx\": 500, \"fy\": 500, \"cx\": 320, \"cy\": 240}"
  }'

Estimate Pose

curl -X POST https://gpue-foundationpose.hf.space/api/estimate \
  -H "Content-Type: application/json" \
  -d '{
    "object_id": "target_cube",
    "query_image_b64": "<base64-jpeg>",
    "camera_intrinsics": "{\"fx\": 500, \"fy\": 500, \"cx\": 320, \"cy\": 240}"
  }'

Response Format

{
  "success": true,
  "poses": [
    {
      "object_id": "target_cube",
      "position": {"x": 0.5, "y": 0.3, "z": 0.1},
      "orientation": {"w": 1.0, "x": 0.0, "y": 0.0, "z": 0.0},
      "confidence": 0.95,
      "dimensions": [0.1, 0.1, 0.1]
    }
  ]
}

Integration with robot-ml

This Space is designed to work with the robot-ml training pipeline:

  1. Capture reference images: make capture-reference
  2. Configure perception in observations.yaml:
    perception:
      enabled: true
      model: foundation_pose
      api_url: https://gpue-foundationpose.hf.space
    
  3. Run training with perception: make train

Setup

Placeholder Mode (Default)

This Space runs in placeholder mode by default - the API works but returns empty pose results. Perfect for testing integrations!

Enable Real Inference

To enable actual 6D pose estimation:

  1. Create a Hugging Face model repository to host the weights (recommended)

  2. Set environment variables in this Space's settings:

    FOUNDATIONPOSE_MODEL_REPO=YOUR_USERNAME/foundationpose-weights
    USE_HF_WEIGHTS=true
    USE_REAL_MODEL=true
    
  3. Restart the Space - weights will download automatically!

Why use a model repo? Faster downloads, version control, share across Spaces, no git-lfs needed!

Performance

  • Cold Start: 15-30 seconds (ZeroGPU allocation + model loading)
  • Warm Inference: 0.5-2 seconds per query
  • Recommended Use: Batch processing, validation, demos

For real-time training loops (30 Hz), use the local dummy estimator instead.

Documentation

Citation

@inproceedings{wen2023foundationpose,
  title={FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects},
  author={Wen, Bowen and Yang, Wei and Kautz, Jan and Birchfield, Stan},
  booktitle={CVPR},
  year={2024}
}