Christopher Tan commited on
Commit
4eedb94
Β·
1 Parent(s): 0409360

Refactor: Move dependency installation to setup.sh and add deployment documentation

Browse files
Files changed (6) hide show
  1. .deployment-notes.md +84 -0
  2. README.md +1 -0
  3. app.py +18 -161
  4. setup.sh +50 -0
  5. start.py +28 -0
  6. upload_checkpoint.py +77 -0
.deployment-notes.md ADDED
@@ -0,0 +1,84 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Deployment Notes for RoboEvalGradio Hugging Face Space
2
+
3
+ ## Current Status
4
+ βœ… All dependencies installing successfully
5
+ βœ… GPU detected (Tesla T4, 15.83 GB)
6
+ βœ… OpenPI, RoboEval, and lerobot all working
7
+ βœ… Checkpoint uploaded to HF Hub: tan7271/pi0_CubeHandover_ckpt
8
+
9
+ ## Architecture
10
+
11
+ ### File Structure
12
+ - `requirements.txt` - Core PyPI dependencies
13
+ - `setup.sh` - Installation script for git dependencies
14
+ - `app.py` - Main Gradio application (clean, no installation logic)
15
+ - `start.py` - Alternative startup wrapper (optional)
16
+ - `upload_checkpoint.py` - Script to upload checkpoints to HF Hub
17
+
18
+ ### Installation Flow
19
+ 1. HF Spaces installs `requirements.txt` (PyPI packages)
20
+ 2. `app.py` starts and calls `check_and_install_dependencies()`
21
+ 3. If needed, runs `setup.sh` to install git dependencies:
22
+ - RoboEval (with submodules)
23
+ - lerobot (specific commit)
24
+ - safetensors upgrade
25
+ - OpenPI + openpi-client
26
+
27
+ ### Key Dependencies
28
+ - **RoboEval**: Installed from git with `--recurse-submodules` to include `thirdparty/mujoco_menagerie`
29
+ - **lerobot**: Installed from specific commit `0cf864870cf29f4738d3ade893e6fd13fbd7cdb5`
30
+ - **OpenPI**: Installed with `--no-deps` (dependencies in requirements.txt)
31
+ - **openpi-client**: Installed from subdirectory `packages/openpi-client`
32
+
33
+ ## Environment Variables
34
+ - `GH_TOKEN`: GitHub personal access token for private repo access
35
+ - `MUJOCO_GL=egl`: Headless rendering
36
+ - `PYOPENGL_PLATFORM=egl`: OpenGL backend
37
+ - `XDG_RUNTIME_DIR=/tmp`: Runtime directory
38
+
39
+ ## Hardware Requirements
40
+ - **Minimum**: T4 small GPU (16GB) - ~$0.60/hour
41
+ - **Recommended**: L4 or A10G for better performance
42
+ - **Free tier**: Won't work (insufficient memory for Pi0 models)
43
+
44
+ ## Checkpoint Usage
45
+ Users can provide checkpoints in three formats:
46
+ 1. **Hugging Face Hub**: `tan7271/pi0_CubeHandover_ckpt` or `hf://tan7271/...`
47
+ 2. **Google Cloud Storage**: `gs://bucket/path/to/checkpoint`
48
+ 3. **Local path**: `/path/to/checkpoint` (must exist on Space filesystem)
49
+
50
+ The app automatically detects HF Hub paths and downloads them using `snapshot_download`.
51
+
52
+ ## Known Issues Fixed
53
+ 1. βœ… `beartype` missing - Added to requirements.txt
54
+ 2. βœ… `safetensors` version conflict - Upgraded after lerobot install
55
+ 3. βœ… `lerobot.common` missing - Install from specific git commit
56
+ 4. βœ… `openpi_client` missing - Install from subdirectory
57
+ 5. βœ… `thirdparty/mujoco_menagerie` missing - Copy after RoboEval install
58
+ 6. βœ… Dependency resolution errors - Simplified requirements.txt
59
+ 7. βœ… PyPI version conflicts - Use exact working versions
60
+
61
+ ## Troubleshooting
62
+
63
+ ### If dependencies fail to install:
64
+ - Check GH_TOKEN is set in Space secrets
65
+ - Check setup.sh has execution permissions
66
+ - Review build logs for specific errors
67
+
68
+ ### If GPU not detected:
69
+ - Verify Space is using T4 small or better in Settings
70
+ - Check logs for "GPU DIAGNOSTICS" section
71
+ - Ensure CUDA drivers are loading
72
+
73
+ ### If out of memory:
74
+ - Model might be too large for T4
75
+ - Try reducing max_steps parameter
76
+ - Consider using model quantization
77
+ - Upgrade to larger GPU (L4, A10G)
78
+
79
+ ## Next Steps
80
+ - Monitor build logs after deployment
81
+ - Test inference with uploaded checkpoint
82
+ - Add more checkpoints for other tasks
83
+ - Consider adding model caching to speed up subsequent loads
84
+
README.md CHANGED
@@ -6,6 +6,7 @@ colorTo: purple
6
  sdk: gradio
7
  sdk_version: "4.44.0"
8
  app_file: app.py
 
9
  pinned: false
10
  license: mit
11
  python_version: "3.11"
 
6
  sdk: gradio
7
  sdk_version: "4.44.0"
8
  app_file: app.py
9
+ startup_duration_timeout: 30m
10
  pinned: false
11
  license: mit
12
  python_version: "3.11"
app.py CHANGED
@@ -21,173 +21,30 @@ os.environ.setdefault("MUJOCO_GL", "egl")
21
  os.environ.setdefault("PYOPENGL_PLATFORM", "egl")
22
  os.environ.setdefault("XDG_RUNTIME_DIR", "/tmp")
23
 
24
- # --- Install RoboEval and OpenPI at runtime ---
25
- def install_roboeval():
26
- """Install RoboEval from GitHub using the GH_TOKEN."""
27
- try:
28
- import roboeval
29
- # Check if thirdparty directory exists
30
- import roboeval.const as const
31
- if not const.THIRD_PARTY_PATH.exists():
32
- print("RoboEval installed but missing thirdparty submodules, reinstalling...")
33
- raise ImportError("Missing thirdparty submodules")
34
- print("RoboEval already installed")
35
- return True
36
- except ImportError:
37
- print("Installing RoboEval with submodules...")
38
- gh_token = os.environ.get("GH_TOKEN")
39
- if not gh_token:
40
- raise RuntimeError("GH_TOKEN environment variable not set")
41
-
42
- # Clone with submodules to /tmp
43
- clone_dir = "/tmp/roboeval_install"
44
- repo_url = f"https://{gh_token}@github.com/helen9975/RoboEval.git"
45
-
46
- # Remove old clone if exists
47
- subprocess.run(["rm", "-rf", clone_dir], capture_output=True)
48
-
49
- # Clone with submodules
50
- print("Cloning RoboEval repository with submodules...")
51
- clone_result = subprocess.run([
52
- "git", "clone", "--recurse-submodules",
53
- repo_url, clone_dir
54
- ], capture_output=True, text=True)
55
-
56
- if clone_result.returncode != 0:
57
- print(f"Clone failed: {clone_result.stderr}")
58
- raise RuntimeError(f"Failed to clone RoboEval: {clone_result.stderr}")
59
-
60
- # Install from local directory (not editable, so files are copied to site-packages)
61
- print("Installing RoboEval from cloned repository...")
62
- result = subprocess.run([
63
- sys.executable, "-m", "pip", "install",
64
- clone_dir, "--no-cache-dir"
65
- ], capture_output=True, text=True)
66
-
67
- if result.returncode != 0:
68
- print(f"Installation failed: {result.stderr}")
69
- raise RuntimeError(f"Failed to install RoboEval: {result.stderr}")
70
-
71
- # Copy thirdparty directory to the installed package location
72
- print("Copying thirdparty submodules to site-packages...")
73
- import site
74
- import shutil
75
- site_packages = site.getsitepackages()[0]
76
- thirdparty_src = os.path.join(clone_dir, "thirdparty")
77
- thirdparty_dst = os.path.join(site_packages, "thirdparty")
78
-
79
- if os.path.exists(thirdparty_src):
80
- shutil.copytree(thirdparty_src, thirdparty_dst, dirs_exist_ok=True)
81
- print(f"Copied thirdparty to {thirdparty_dst}")
82
- else:
83
- print("Warning: thirdparty directory not found in cloned repo")
84
-
85
- print("RoboEval installed successfully with submodules")
86
- return True
87
 
88
- def install_lerobot():
89
- """Install lerobot from specific git commit as required by OpenPI."""
90
- try:
91
- import lerobot.common
92
- print("lerobot already installed")
93
- return True
94
- except ImportError:
95
- print("Installing lerobot from git (specific commit required by OpenPI)...")
96
- # OpenPI requires lerobot from this specific commit
97
- lerobot_url = "git+https://github.com/huggingface/lerobot@0cf864870cf29f4738d3ade893e6fd13fbd7cdb5"
98
- result = subprocess.run([
99
- sys.executable, "-m", "pip", "install",
100
- lerobot_url, "--no-cache-dir"
101
- ], capture_output=True, text=True)
102
-
103
- if result.returncode != 0:
104
- print(f"lerobot installation failed: {result.stderr}")
105
- return False
106
-
107
- print("lerobot installed successfully")
108
- return True
109
-
110
- def install_openpi():
111
- """Install OpenPI from your forked repository using the GH_TOKEN."""
112
  try:
 
 
113
  import openpi
114
- print("OpenPI already installed")
115
  return True
116
- except ImportError:
117
- print("Installing OpenPI from tan7271/OpenPiRoboEval...")
118
- gh_token = os.environ.get("GH_TOKEN")
119
- if not gh_token:
120
- raise RuntimeError("GH_TOKEN environment variable not set")
121
-
122
- repo_url = f"https://{gh_token}@github.com/tan7271/OpenPiRoboEval.git"
123
-
124
- # First install openpi-client from the subdirectory
125
- print("Installing openpi-client...")
126
- client_result = subprocess.run([
127
- sys.executable, "-m", "pip", "install",
128
- f"git+{repo_url}#subdirectory=packages/openpi-client", "--no-cache-dir", "--no-deps"
129
- ], capture_output=True, text=True)
130
-
131
- if client_result.returncode != 0:
132
- print(f"openpi-client installation failed: {client_result.stderr}")
133
- else:
134
- print("openpi-client installed successfully")
135
-
136
- # Then install OpenPI with --no-deps since all dependencies are in requirements.txt
137
- result = subprocess.run([
138
- sys.executable, "-m", "pip", "install",
139
- f"git+{repo_url}", "--no-cache-dir", "--no-deps", "--force-reinstall"
140
- ], capture_output=True, text=True)
141
-
142
  if result.returncode != 0:
143
- print(f"OpenPI installation failed: {result.stderr}")
144
- # Try alternative approach - clone and install manually
145
- print("Trying alternative installation method...")
146
- try:
147
- # Clone the repository
148
- clone_result = subprocess.run([
149
- "git", "clone", "--depth", "1",
150
- f"https://{gh_token}@github.com/tan7271/OpenPiRoboEval.git",
151
- "/tmp/openpi"
152
- ], capture_output=True, text=True)
153
-
154
- if clone_result.returncode != 0:
155
- raise RuntimeError(f"Failed to clone repository: {clone_result.stderr}")
156
-
157
- # Install in development mode with no dependencies
158
- install_result = subprocess.run([
159
- sys.executable, "-m", "pip", "install",
160
- "-e", "/tmp/openpi", "--no-deps", "--force-reinstall"
161
- ], capture_output=True, text=True)
162
-
163
- if install_result.returncode != 0:
164
- raise RuntimeError(f"Failed to install in dev mode: {install_result.stderr}")
165
-
166
- print("OpenPI installed successfully via alternative method")
167
- return True
168
-
169
- except Exception as e:
170
- raise RuntimeError(f"All installation methods failed: {e}")
171
-
172
- print("OpenPI installed successfully")
173
  return True
174
 
175
- # Install packages
176
- install_roboeval()
177
- install_lerobot()
178
-
179
- # Upgrade safetensors to fix version conflict
180
- print("Upgrading safetensors to >=0.4.1...")
181
- result = subprocess.run([
182
- sys.executable, "-m", "pip", "install",
183
- "safetensors>=0.4.1", "--upgrade", "--no-cache-dir"
184
- ], capture_output=True, text=True)
185
- if result.returncode == 0:
186
- print("safetensors upgraded successfully")
187
- else:
188
- print(f"safetensors upgrade failed: {result.stderr}")
189
-
190
- install_openpi()
191
 
192
  # --- OpenPI (local inference) ---
193
  try:
@@ -284,7 +141,7 @@ _ENV_CLASSES = {
284
  DEFAULT_DEVICE = "cuda:0" if os.path.exists("/dev/nvidia0") else "cpu"
285
  DEFAULT_DOWNSAMPLE_RATE = 25
286
  DEFAULT_MAX_STEPS = 200
287
- DEFAULT_FPS = 5
288
 
289
  # Check GPU availability and print diagnostics
290
  def check_gpu_status():
 
21
  os.environ.setdefault("PYOPENGL_PLATFORM", "egl")
22
  os.environ.setdefault("XDG_RUNTIME_DIR", "/tmp")
23
 
24
+ # Note: Dependencies are installed via setup.sh before the app starts
25
+ # This keeps the app code clean and separates installation logic
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
+ # Run setup if dependencies aren't installed
28
+ def check_and_install_dependencies():
29
+ """Check if dependencies are installed, run setup if not."""
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  try:
31
+ import roboeval
32
+ import lerobot
33
  import openpi
34
+ print("All dependencies already installed")
35
  return True
36
+ except ImportError as e:
37
+ print(f"Missing dependency: {e}")
38
+ print("Running setup script...")
39
+ import subprocess
40
+ result = subprocess.run(["bash", "setup.sh"], cwd=os.path.dirname(__file__))
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  if result.returncode != 0:
42
+ raise RuntimeError("Setup script failed")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
43
  return True
44
 
45
+ import datetime
46
+ print(f"===== Application Startup at {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')} =====\n")
47
+ check_and_install_dependencies()
 
 
 
 
 
 
 
 
 
 
 
 
 
48
 
49
  # --- OpenPI (local inference) ---
50
  try:
 
141
  DEFAULT_DEVICE = "cuda:0" if os.path.exists("/dev/nvidia0") else "cpu"
142
  DEFAULT_DOWNSAMPLE_RATE = 25
143
  DEFAULT_MAX_STEPS = 200
144
+ DEFAULT_FPS = 25
145
 
146
  # Check GPU availability and print diagnostics
147
  def check_gpu_status():
setup.sh ADDED
@@ -0,0 +1,50 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+ set -e
3
+
4
+ echo "===== Installing Dependencies ====="
5
+
6
+ # Install RoboEval with submodules
7
+ echo "Installing RoboEval with submodules..."
8
+ CLONE_DIR="/tmp/roboeval_install"
9
+ rm -rf $CLONE_DIR
10
+
11
+ # Clone with submodules
12
+ echo "Cloning RoboEval repository with submodules..."
13
+ git clone --recurse-submodules https://${GH_TOKEN}@github.com/helen9975/RoboEval.git $CLONE_DIR
14
+
15
+ # Install
16
+ echo "Installing RoboEval from cloned repository..."
17
+ pip install $CLONE_DIR --no-cache-dir
18
+
19
+ # Copy thirdparty to site-packages
20
+ echo "Copying thirdparty submodules to site-packages..."
21
+ SITE_PACKAGES=$(python -c "import site; print(site.getsitepackages()[0])")
22
+ cp -r $CLONE_DIR/thirdparty $SITE_PACKAGES/
23
+ echo "Copied thirdparty to $SITE_PACKAGES/thirdparty"
24
+
25
+ echo "RoboEval installed successfully with submodules"
26
+
27
+ # Install lerobot from specific commit
28
+ echo "Installing lerobot from git (specific commit required by OpenPI)..."
29
+ pip install git+https://github.com/huggingface/lerobot@0cf864870cf29f4738d3ade893e6fd13fbd7cdb5 --no-cache-dir
30
+ echo "lerobot installed successfully"
31
+
32
+ # Upgrade safetensors to fix version conflict
33
+ echo "Upgrading safetensors to >=0.4.1..."
34
+ pip install "safetensors>=0.4.1" --upgrade --no-cache-dir
35
+ echo "safetensors upgraded successfully"
36
+
37
+ # Install OpenPI with openpi-client
38
+ echo "Installing OpenPI from tan7271/OpenPiRoboEval..."
39
+
40
+ # Install openpi-client
41
+ echo "Installing openpi-client..."
42
+ pip install git+https://${GH_TOKEN}@github.com/tan7271/OpenPiRoboEval.git#subdirectory=packages/openpi-client --no-cache-dir --no-deps
43
+ echo "openpi-client installed successfully"
44
+
45
+ # Install OpenPI
46
+ pip install git+https://${GH_TOKEN}@github.com/tan7271/OpenPiRoboEval.git --no-cache-dir --no-deps --force-reinstall
47
+ echo "OpenPI installed successfully"
48
+
49
+ echo "===== All dependencies installed ====="
50
+
start.py ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Startup script that runs setup.sh before launching the Gradio app.
4
+ """
5
+ import subprocess
6
+ import sys
7
+ import os
8
+
9
+ def main():
10
+ """Run setup script then launch app."""
11
+ print("Running dependency setup...")
12
+
13
+ # Run setup script
14
+ result = subprocess.run(["bash", "setup.sh"], cwd=os.path.dirname(__file__))
15
+
16
+ if result.returncode != 0:
17
+ print("Setup failed!")
18
+ sys.exit(1)
19
+
20
+ print("\nLaunching Gradio app...")
21
+
22
+ # Import and run the app
23
+ from app import demo
24
+ demo.launch()
25
+
26
+ if __name__ == "__main__":
27
+ main()
28
+
upload_checkpoint.py ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Upload pi0_CubeHandover checkpoint to Hugging Face Hub
4
+ """
5
+
6
+ from huggingface_hub import HfApi, create_repo
7
+ import os
8
+
9
+ # Configuration
10
+ CHECKPOINT_DIR = "pi0_CubeHandover_ckpt"
11
+ REPO_ID = "tan7271/pi0_CubeHandover_ckpt" # Change "tan7271" to your HF username if different
12
+ REPO_TYPE = "model"
13
+
14
+ def upload_checkpoint():
15
+ """Upload checkpoint to Hugging Face Hub"""
16
+
17
+ # Check if checkpoint directory exists
18
+ if not os.path.exists(CHECKPOINT_DIR):
19
+ print(f"Error: Checkpoint directory '{CHECKPOINT_DIR}' not found!")
20
+ return
21
+
22
+ print(f"πŸ“¦ Uploading checkpoint from: {CHECKPOINT_DIR}")
23
+ print(f"🎯 Target repository: {REPO_ID}")
24
+ print()
25
+
26
+ # Initialize Hugging Face API
27
+ api = HfApi()
28
+
29
+ try:
30
+ # Create repository (if it doesn't exist)
31
+ print("Creating repository on Hugging Face Hub...")
32
+ create_repo(
33
+ repo_id=REPO_ID,
34
+ repo_type=REPO_TYPE,
35
+ exist_ok=True, # Don't fail if repo already exists
36
+ private=False, # Set to True if you want a private repo
37
+ )
38
+ print(f"βœ… Repository created/verified: https://huggingface.co/{REPO_ID}")
39
+ print()
40
+
41
+ # Upload the checkpoint folder (using upload_large_folder for large checkpoints)
42
+ print("πŸ“€ Uploading checkpoint files...")
43
+ print("(This may take a while depending on checkpoint size)")
44
+ print("Using upload_large_folder for better handling of large files...")
45
+ print()
46
+
47
+ api.upload_large_folder(
48
+ folder_path=CHECKPOINT_DIR,
49
+ repo_id=REPO_ID,
50
+ repo_type=REPO_TYPE,
51
+ num_workers=4, # Upload files in parallel
52
+ )
53
+
54
+ print()
55
+ print("=" * 60)
56
+ print("πŸŽ‰ Upload completed successfully!")
57
+ print("=" * 60)
58
+ print()
59
+ print(f"πŸ“ Your checkpoint is now available at:")
60
+ print(f" https://huggingface.co/{REPO_ID}")
61
+ print()
62
+ print(f"πŸ”— To use in your Gradio app, use this path:")
63
+ print(f' checkpoint_path = "hf://datasets/{REPO_ID}"')
64
+ print()
65
+ print("πŸ“ Note: It may take a few minutes for the files to be fully processed")
66
+ print()
67
+
68
+ except Exception as e:
69
+ print(f"❌ Error during upload: {e}")
70
+ print()
71
+ print("πŸ’‘ Make sure you're logged in to Hugging Face:")
72
+ print(" Run: huggingface-cli login")
73
+ print(" Or set HF_TOKEN environment variable")
74
+
75
+ if __name__ == "__main__":
76
+ upload_checkpoint()
77
+