Georg commited on
Commit
bbc3fdc
·
1 Parent(s): 0d59dc5

Two-stage Docker build: base image + GPU compilation

Browse files
Files changed (5) hide show
  1. BUILD.md +171 -0
  2. Dockerfile +7 -62
  3. Dockerfile.base +65 -0
  4. build_base.sh +20 -0
  5. deploy.sh +107 -0
BUILD.md ADDED
@@ -0,0 +1,171 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Two-Stage Docker Build
2
+
3
+ FoundationPose requires CUDA C++ extensions that must be compiled with a GPU present. To enable local development and avoid build failures, we use a two-stage build process:
4
+
5
+ ## Architecture
6
+
7
+ ```
8
+ ┌─────────────────────────────────────────────────┐
9
+ │ Stage 1: Base Image (Local, No GPU) │
10
+ │ - Install system dependencies │
11
+ │ - Install PyTorch + Python packages │
12
+ │ - Clone FoundationPose repository │
13
+ │ - Patch setup.py for C++17 │
14
+ │ Build locally → Push to DockerHub │
15
+ └─────────────────────────────────────────────────┘
16
+
17
+ ┌─────────────────────────────────────────────────┐
18
+ │ Stage 2: Final Image (HuggingFace, GPU) │
19
+ │ FROM gpue/foundationpose-base:latest │
20
+ │ - Compile mycuda C++ extension (needs GPU) │
21
+ │ - Compile mycpp C++ extension │
22
+ │ - Download model weights │
23
+ │ Build on HuggingFace Spaces with GPU │
24
+ └─────────────────────────────────────────────────┘
25
+ ```
26
+
27
+ ## Files
28
+
29
+ - **Dockerfile.base** - Stage 1: Base image without GPU compilation
30
+ - **Dockerfile** - Stage 2: Final image that compiles CUDA extensions
31
+ - **build_base.sh** - Build base image only
32
+ - **deploy.sh** - Full two-stage deployment
33
+
34
+ ## Quick Start
35
+
36
+ ### Option 1: Automated Deployment
37
+
38
+ ```bash
39
+ cd /Users/georgpuschel/repos/robot-ml/foundationpose
40
+
41
+ # Login to DockerHub
42
+ docker login
43
+
44
+ # Run full deployment (builds base, pushes to DockerHub, deploys to HF, follows logs)
45
+ ./deploy.sh
46
+ ```
47
+
48
+ The script will:
49
+ 1. Build the base image for linux/amd64
50
+ 2. Push to DockerHub as `gpue/foundationpose-base:latest`
51
+ 3. Commit and push changes to HuggingFace
52
+ 4. Automatically follow the HuggingFace build logs (press Ctrl+C to stop)
53
+
54
+ **Prerequisites**:
55
+ - Docker login credentials for DockerHub
56
+ - HuggingFace token in `../training/.env.local` (for following logs)
57
+
58
+ ### Option 2: Manual Steps
59
+
60
+ #### Step 1: Build and Push Base Image
61
+
62
+ ```bash
63
+ # Build for linux/amd64 (HuggingFace platform)
64
+ docker build --platform linux/amd64 -f Dockerfile.base -t gpue/foundationpose-base:latest .
65
+
66
+ # Push to DockerHub
67
+ docker login
68
+ docker push gpue/foundationpose-base:latest
69
+ ```
70
+
71
+ #### Step 2: Deploy to HuggingFace
72
+
73
+ ```bash
74
+ # Add the git remote (if not already added)
75
+ git remote add hf https://huggingface.co/spaces/gpue/foundationpose
76
+
77
+ # Push updated Dockerfile to HuggingFace
78
+ git add Dockerfile Dockerfile.base
79
+ git commit -m "Two-stage build: base image + GPU compilation"
80
+ git push hf main
81
+ ```
82
+
83
+ HuggingFace will automatically:
84
+ 1. Pull `gpue/foundationpose-base:latest` from DockerHub
85
+ 2. Compile C++ extensions with GPU present
86
+ 3. Download model weights
87
+ 4. Start the application
88
+
89
+ ## Why Two Stages?
90
+
91
+ **Problem**: CUDA C++ extensions fail to compile without a GPU:
92
+ ```
93
+ TypeError: expected string or bytes-like object
94
+ torch.version.cuda returns None when no GPU is present
95
+ ```
96
+
97
+ **Solution**:
98
+ - Stage 1 (Base): Everything except GPU compilation - can build anywhere
99
+ - Stage 2 (Final): Only GPU-dependent compilation - builds on HuggingFace with GPU
100
+
101
+ **Benefits**:
102
+ - ✓ Build base image locally without GPU
103
+ - ✓ Faster HuggingFace builds (base layers cached)
104
+ - ✓ Easy to iterate on application code (stage 2 is small)
105
+ - ✓ Base image can be reused across multiple projects
106
+
107
+ ## Troubleshooting
108
+
109
+ ### Platform Mismatch
110
+
111
+ If you're on Apple Silicon (M1/M2/M3), you must specify `--platform linux/amd64`:
112
+
113
+ ```bash
114
+ docker build --platform linux/amd64 -f Dockerfile.base -t gpue/foundationpose-base:latest .
115
+ ```
116
+
117
+ ### DockerHub Authentication
118
+
119
+ ```bash
120
+ docker login
121
+ # Enter username: gpue
122
+ # Enter password: <your-dockerhub-token>
123
+ ```
124
+
125
+ ### Verify Base Image
126
+
127
+ Test the base image locally (without GPU compilation):
128
+
129
+ ```bash
130
+ docker run --rm -it gpue/foundationpose-base:latest bash
131
+
132
+ # Inside container:
133
+ python3 -c "import torch; print(torch.__version__)"
134
+ ls /app/FoundationPose
135
+ cat /app/FoundationPose/bundlesdf/mycuda/setup.py | grep "c++17"
136
+ ```
137
+
138
+ ### HuggingFace Build Logs
139
+
140
+ Monitor the build on HuggingFace:
141
+
142
+ ```bash
143
+ curl -H "Authorization: Bearer $HF_TOKEN" \
144
+ "https://huggingface.co/api/spaces/gpue/foundationpose/logs/build"
145
+ ```
146
+
147
+ ## Updating the Base Image
148
+
149
+ When you need to change dependencies or system packages:
150
+
151
+ 1. Edit `Dockerfile.base`
152
+ 2. Rebuild and push:
153
+ ```bash
154
+ ./build_base.sh
155
+ docker push gpue/foundationpose-base:latest
156
+ ```
157
+ 3. Rebuild on HuggingFace (will pull updated base automatically)
158
+
159
+ ## Local Testing (Without GPU)
160
+
161
+ To test the base image without GPU compilation:
162
+
163
+ ```bash
164
+ # Build base
165
+ docker build -f Dockerfile.base -t foundationpose-base-test .
166
+
167
+ # Run without compiling extensions
168
+ docker run --rm -p 7860:7860 foundationpose-base-test python3 app.py
169
+ ```
170
+
171
+ The app will run in placeholder mode (no real inference) but you can test the UI and API endpoints.
Dockerfile CHANGED
@@ -1,76 +1,21 @@
1
- FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
2
-
3
- # Set environment variables
4
- ENV DEBIAN_FRONTEND=noninteractive
5
- ENV CUDA_HOME=/usr/local/cuda
6
- ENV PATH=${CUDA_HOME}/bin:${PATH}
7
- ENV LD_LIBRARY_PATH=${CUDA_HOME}/lib64:${LD_LIBRARY_PATH}
8
 
9
  # FoundationPose configuration - always use real model
10
  ENV FOUNDATIONPOSE_MODEL_REPO=gpue/foundationpose-weights
11
  ENV USE_REAL_MODEL=true
12
 
13
- # CUDA architecture list for building extensions without GPU present
14
- # Covers most modern GPUs: Turing (75), Ampere (80,86), Ada (89), Hopper (90)
15
- ENV TORCH_CUDA_ARCH_LIST="7.5;8.0;8.6;8.9;9.0"
16
-
17
- # Install system dependencies
18
- RUN apt-get update && apt-get install -y \
19
- git \
20
- wget \
21
- cmake \
22
- build-essential \
23
- python3.10 \
24
- python3.10-dev \
25
- python3-pip \
26
- libgl1-mesa-glx \
27
- libglib2.0-0 \
28
- libsm6 \
29
- libxext6 \
30
- libxrender-dev \
31
- libgomp1 \
32
- libeigen3-dev \
33
- ninja-build \
34
- && rm -rf /var/lib/apt/lists/*
35
-
36
- # Set python3.10 as default
37
- RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 1
38
- RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.10 1
39
-
40
- # Upgrade pip
41
- RUN python3 -m pip install --upgrade pip
42
-
43
- # Set working directory
44
- WORKDIR /app
45
-
46
- # Install Python dependencies first (PyTorch needed for building C++ extensions)
47
- COPY requirements.txt .
48
- RUN pip install --no-cache-dir --upgrade setuptools wheel
49
- RUN pip install --no-cache-dir torch torchvision --index-url https://download.pytorch.org/whl/cu118
50
- RUN pip install --no-cache-dir -r requirements.txt
51
-
52
- # Clone FoundationPose repository
53
- RUN git clone https://github.com/NVlabs/FoundationPose.git /app/FoundationPose
54
-
55
- # Build FoundationPose C++ extensions (skip kaolin - optional dependency)
56
  WORKDIR /app/FoundationPose
57
- # Patch mycuda setup.py to use C++17 instead of C++14 (PyTorch requires C++17)
58
- RUN cd bundlesdf/mycuda && \
59
- sed -i 's/-std=c++14/-std=c++17/g' setup.py && \
60
- pip install . --no-build-isolation
61
  RUN cd mycpp && python setup.py build_ext --inplace
62
 
63
- # Copy application files
64
  WORKDIR /app
65
- COPY app.py client.py estimator.py ./
66
-
67
- # Create weights directory and download model weights
68
- RUN mkdir -p weights
69
  RUN python3 -c "from huggingface_hub import snapshot_download; \
70
  snapshot_download(repo_id='gpue/foundationpose-weights', local_dir='weights', repo_type='model')"
71
 
72
- # Expose Gradio port
73
- EXPOSE 7860
74
-
75
  # Run the application
76
  CMD ["python3", "app.py"]
 
1
+ # Start from base image (build locally, push to DockerHub)
2
+ # To build base: docker build -f Dockerfile.base -t gpue/foundationpose-base:latest .
3
+ # To push base: docker push gpue/foundationpose-base:latest
4
+ FROM gpue/foundationpose-base:latest
 
 
 
5
 
6
  # FoundationPose configuration - always use real model
7
  ENV FOUNDATIONPOSE_MODEL_REPO=gpue/foundationpose-weights
8
  ENV USE_REAL_MODEL=true
9
 
10
+ # Build FoundationPose C++ extensions (requires GPU present)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  WORKDIR /app/FoundationPose
12
+ RUN cd bundlesdf/mycuda && pip install . --no-build-isolation
 
 
 
13
  RUN cd mycpp && python setup.py build_ext --inplace
14
 
15
+ # Download model weights from HuggingFace
16
  WORKDIR /app
 
 
 
 
17
  RUN python3 -c "from huggingface_hub import snapshot_download; \
18
  snapshot_download(repo_id='gpue/foundationpose-weights', local_dir='weights', repo_type='model')"
19
 
 
 
 
20
  # Run the application
21
  CMD ["python3", "app.py"]
Dockerfile.base ADDED
@@ -0,0 +1,65 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu22.04
2
+
3
+ # Set environment variables
4
+ ENV DEBIAN_FRONTEND=noninteractive
5
+ ENV CUDA_HOME=/usr/local/cuda
6
+ ENV PATH=${CUDA_HOME}/bin:${PATH}
7
+ ENV LD_LIBRARY_PATH=${CUDA_HOME}/lib64:${LD_LIBRARY_PATH}
8
+
9
+ # CUDA architecture list for building extensions without GPU present
10
+ # Covers most modern GPUs: Turing (75), Ampere (80,86), Ada (89), Hopper (90)
11
+ ENV TORCH_CUDA_ARCH_LIST="7.5;8.0;8.6;8.9;9.0"
12
+
13
+ # Install system dependencies
14
+ RUN apt-get update && apt-get install -y \
15
+ git \
16
+ wget \
17
+ cmake \
18
+ build-essential \
19
+ python3.10 \
20
+ python3.10-dev \
21
+ python3-pip \
22
+ libgl1-mesa-glx \
23
+ libglib2.0-0 \
24
+ libsm6 \
25
+ libxext6 \
26
+ libxrender-dev \
27
+ libgomp1 \
28
+ libeigen3-dev \
29
+ ninja-build \
30
+ && rm -rf /var/lib/apt/lists/*
31
+
32
+ # Set python3.10 as default
33
+ RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 1
34
+ RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.10 1
35
+
36
+ # Upgrade pip
37
+ RUN python3 -m pip install --upgrade pip
38
+
39
+ # Set working directory
40
+ WORKDIR /app
41
+
42
+ # Copy and install Python dependencies
43
+ COPY requirements.txt .
44
+ RUN pip install --no-cache-dir --upgrade setuptools wheel
45
+ RUN pip install --no-cache-dir torch torchvision --index-url https://download.pytorch.org/whl/cu118
46
+ RUN pip install --no-cache-dir -r requirements.txt
47
+
48
+ # Clone FoundationPose repository (but don't build extensions yet)
49
+ RUN git clone https://github.com/NVlabs/FoundationPose.git /app/FoundationPose
50
+
51
+ # Patch mycuda setup.py to use C++17 (preparation for GPU build)
52
+ WORKDIR /app/FoundationPose
53
+ RUN cd bundlesdf/mycuda && sed -i 's/-std=c++14/-std=c++17/g' setup.py
54
+
55
+ # Reset workdir
56
+ WORKDIR /app
57
+
58
+ # Copy application files
59
+ COPY app.py client.py estimator.py ./
60
+
61
+ # Create weights directory (weights will be downloaded in final image)
62
+ RUN mkdir -p weights
63
+
64
+ # Expose Gradio port
65
+ EXPOSE 7860
build_base.sh ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ # Build and push the base image to DockerHub
3
+
4
+ set -e
5
+
6
+ IMAGE_NAME="gpue/foundationpose-base"
7
+ TAG="latest"
8
+ PLATFORM="linux/amd64" # HuggingFace Spaces use x86_64
9
+
10
+ echo "Building base image for ${PLATFORM}: ${IMAGE_NAME}:${TAG}"
11
+ docker build --platform ${PLATFORM} -f Dockerfile.base -t ${IMAGE_NAME}:${TAG} .
12
+
13
+ echo ""
14
+ echo "✓ Base image built successfully for ${PLATFORM}"
15
+ echo ""
16
+ echo "To push to DockerHub:"
17
+ echo " docker login"
18
+ echo " docker push ${IMAGE_NAME}:${TAG}"
19
+ echo ""
20
+ echo "After pushing, the HuggingFace Space will use this base image."
deploy.sh ADDED
@@ -0,0 +1,107 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ # Two-stage deployment script for FoundationPose
3
+
4
+ set -e
5
+
6
+ IMAGE_NAME="gpue/foundationpose-base"
7
+ TAG="latest"
8
+ PLATFORM="linux/amd64"
9
+ HF_SPACE="gpue/foundationpose"
10
+ HF_TOKEN_FILE="../training/.env.local"
11
+
12
+ echo "==================================="
13
+ echo "FoundationPose Two-Stage Deployment"
14
+ echo "==================================="
15
+ echo ""
16
+
17
+ # Stage 1: Build and push base image (local, no GPU needed)
18
+ echo "Stage 1: Building base image locally (no GPU required)"
19
+ echo "Platform: ${PLATFORM}"
20
+ echo "Image: ${IMAGE_NAME}:${TAG}"
21
+ echo ""
22
+
23
+ # Check Docker login
24
+ if [ ! -f ~/.docker/config.json ] || ! grep -q "index.docker.io" ~/.docker/config.json 2>/dev/null; then
25
+ echo "Error: Not logged in to DockerHub"
26
+ echo "Please run: docker login"
27
+ exit 1
28
+ fi
29
+ echo "✓ DockerHub authentication verified"
30
+ echo ""
31
+
32
+ docker build --platform ${PLATFORM} -f Dockerfile.base -t ${IMAGE_NAME}:${TAG} .
33
+
34
+ echo ""
35
+ echo "✓ Base image built successfully"
36
+ echo ""
37
+ echo "Pushing to DockerHub..."
38
+ docker push ${IMAGE_NAME}:${TAG}
39
+
40
+ echo ""
41
+ echo "✓ Base image pushed to DockerHub: ${IMAGE_NAME}:${TAG}"
42
+ echo ""
43
+
44
+ # Stage 2: Deploy to HuggingFace
45
+ echo "Stage 2: Deploying to HuggingFace Space"
46
+ echo ""
47
+
48
+ # Initialize git repo if needed
49
+ if [ ! -d .git ]; then
50
+ echo "Initializing git repository..."
51
+ git init
52
+ git remote add origin https://huggingface.co/spaces/${HF_SPACE}
53
+ echo "✓ Git repository initialized"
54
+ echo ""
55
+ fi
56
+
57
+ # Check if there are changes to commit
58
+ if [[ -n $(git status -s) ]]; then
59
+ echo "Committing changes..."
60
+ git add Dockerfile Dockerfile.base BUILD.md build_base.sh deploy.sh
61
+ git commit -m "Two-stage Docker build: base image + GPU compilation"
62
+ echo "✓ Changes committed"
63
+ else
64
+ echo "No changes to commit"
65
+ fi
66
+
67
+ # Push to HuggingFace
68
+ echo ""
69
+ echo "Pushing to HuggingFace Space: ${HF_SPACE}"
70
+ git push https://huggingface.co/spaces/${HF_SPACE} main --force
71
+
72
+ echo ""
73
+ echo "✓ Pushed to HuggingFace"
74
+ echo ""
75
+ echo "HuggingFace will now:"
76
+ echo " 1. Pull the base image from DockerHub (${IMAGE_NAME}:${TAG})"
77
+ echo " 2. Build C++ extensions with GPU present"
78
+ echo " 3. Download model weights"
79
+ echo " 4. Start the Gradio app"
80
+ echo ""
81
+
82
+ # Follow build logs
83
+ echo "Following build logs..."
84
+ echo "Press Ctrl+C to stop watching"
85
+ echo ""
86
+
87
+ # Load HF token
88
+ if [ -f "${HF_TOKEN_FILE}" ]; then
89
+ HF_TOKEN=$(grep "^HUGGINGFACE_TOKEN=" "${HF_TOKEN_FILE}" | cut -d'=' -f2)
90
+
91
+ if [ -n "${HF_TOKEN}" ]; then
92
+ curl -N -H "Authorization: Bearer ${HF_TOKEN}" \
93
+ "https://huggingface.co/api/spaces/${HF_SPACE}/logs/build" 2>/dev/null | \
94
+ while IFS= read -r line; do
95
+ # Parse JSON and extract data field
96
+ echo "$line" | grep -o '"data":"[^"]*"' | sed 's/"data":"//;s/"$//' | sed 's/\\n/\n/g'
97
+ done
98
+ else
99
+ echo "Warning: HF_TOKEN not found in ${HF_TOKEN_FILE}"
100
+ echo "To follow logs manually:"
101
+ echo " curl -N -H \"Authorization: Bearer \$HF_TOKEN\" \"https://huggingface.co/api/spaces/${HF_SPACE}/logs/build\""
102
+ fi
103
+ else
104
+ echo "Warning: ${HF_TOKEN_FILE} not found"
105
+ echo "To follow logs manually:"
106
+ echo " curl -N -H \"Authorization: Bearer \$HF_TOKEN\" \"https://huggingface.co/api/spaces/${HF_SPACE}/logs/build\""
107
+ fi