File size: 4,015 Bytes
9f5db88
24857f8
 
 
 
2df2c23
 
9f5db88
24857f8
 
ef6bfe4
24857f8
 
ef6bfe4
9f5db88
 
24857f8
 
dd44013
24857f8
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e15abf5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
24857f8
 
e15abf5
24857f8
 
 
 
 
e15abf5
24857f8
e15abf5
 
 
 
 
 
24857f8
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
title: FoundationPose Inference
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
pinned: false
tags:
  - computer-vision
  - 6d-pose
  - object-detection
  - robotics
  - foundationpose
---

# FoundationPose Inference Server

This Hugging Face Space provides 6D object pose estimation using [FoundationPose](https://github.com/NVlabs/FoundationPose) with GPU support via Docker.

## Features

- **6D Pose Estimation**: Detect object position and orientation in 3D space
- **Reference-based Tracking**: Register objects using multiple reference images
- **REST API**: Easy integration with robotics pipelines
- **ZeroGPU**: On-demand GPU allocation for efficient inference

## Usage

### Web Interface

1. **Initialize Tab**: Upload reference images of your object from different angles (16-20 recommended)
2. **Estimate Tab**: Upload a query image to detect the object's 6D pose

### HTTP API

#### Initialize Object

```bash
curl -X POST https://gpue-foundationpose.hf.space/api/initialize \
  -H "Content-Type: application/json" \
  -d '{
    "object_id": "target_cube",
    "reference_images_b64": ["<base64-jpeg>", ...],
    "camera_intrinsics": "{\"fx\": 500, \"fy\": 500, \"cx\": 320, \"cy\": 240}"
  }'
```

#### Estimate Pose

```bash
curl -X POST https://gpue-foundationpose.hf.space/api/estimate \
  -H "Content-Type: application/json" \
  -d '{
    "object_id": "target_cube",
    "query_image_b64": "<base64-jpeg>",
    "camera_intrinsics": "{\"fx\": 500, \"fy\": 500, \"cx\": 320, \"cy\": 240}"
  }'
```

#### Response Format

```json
{
  "success": true,
  "poses": [
    {
      "object_id": "target_cube",
      "position": {"x": 0.5, "y": 0.3, "z": 0.1},
      "orientation": {"w": 1.0, "x": 0.0, "y": 0.0, "z": 0.0},
      "confidence": 0.95,
      "dimensions": [0.1, 0.1, 0.1]
    }
  ]
}
```

## Integration with robot-ml

This Space is designed to work with the [robot-ml](https://github.com/gpuschel/robot-ml) training pipeline:

1. Capture reference images: `make capture-reference`
2. Configure perception in `observations.yaml`:
   ```yaml
   perception:
     enabled: true
     model: foundation_pose
     api_url: https://gpue-foundationpose.hf.space
   ```
3. Run training with perception: `make train`

## Setup

### Placeholder Mode (Default)

This Space runs in **placeholder mode** by default - the API works but returns empty pose results. Perfect for testing integrations!

### Enable Real Inference

To enable actual 6D pose estimation:

1. **Create a Hugging Face model repository** to host the weights (recommended)
   - 📖 Quick guide: [HF_MODEL_SETUP.md](HF_MODEL_SETUP.md)
   - 📖 Detailed guide: [UPLOAD_WEIGHTS.md](UPLOAD_WEIGHTS.md)

2. **Set environment variables** in this Space's settings:
   ```
   FOUNDATIONPOSE_MODEL_REPO=YOUR_USERNAME/foundationpose-weights
   USE_HF_WEIGHTS=true
   USE_REAL_MODEL=true
   ```

3. **Restart the Space** - weights will download automatically!

**Why use a model repo?** Faster downloads, version control, share across Spaces, no git-lfs needed!

## Performance

- **Cold Start**: 15-30 seconds (ZeroGPU allocation + model loading)
- **Warm Inference**: 0.5-2 seconds per query
- **Recommended Use**: Batch processing, validation, demos

For real-time training loops (30 Hz), use the local dummy estimator instead.

## Documentation

- 🚀 [README_SETUP.md](README_SETUP.md) - Start here for setup
- ⚡ [QUICKSTART.md](QUICKSTART.md) - API usage & integration examples
- 📦 [HF_MODEL_SETUP.md](HF_MODEL_SETUP.md) - 5-minute model repo setup
- 📖 [UPLOAD_WEIGHTS.md](UPLOAD_WEIGHTS.md) - Detailed weight upload guide
- 🔧 [DEPLOYMENT.md](DEPLOYMENT.md) - Full deployment options
- 📊 [STATUS.md](STATUS.md) - Complete project status

## Citation

```bibtex
@inproceedings{wen2023foundationpose,
  title={FoundationPose: Unified 6D Pose Estimation and Tracking of Novel Objects},
  author={Wen, Bowen and Yang, Wei and Kautz, Jan and Birchfield, Stan},
  booktitle={CVPR},
  year={2024}
}
```