Spaces:

tan7271
/

RoboEvalGradio

Sleeping

App Files Files Community

Christopher Tan commited on Oct 22, 2025

Commit

4eedb94

1 Parent(s): 0409360

Refactor: Move dependency installation to setup.sh and add deployment documentation

Browse files

Files changed (6) hide show

.deployment-notes.md +84 -0
README.md +1 -0
app.py +18 -161
setup.sh +50 -0
start.py +28 -0
upload_checkpoint.py +77 -0

.deployment-notes.md ADDED Viewed

	@@ -0,0 +1,84 @@

+# Deployment Notes for RoboEvalGradio Hugging Face Space
+## Current Status
+✅ All dependencies installing successfully
+✅ GPU detected (Tesla T4, 15.83 GB)
+✅ OpenPI, RoboEval, and lerobot all working
+✅ Checkpoint uploaded to HF Hub: tan7271/pi0_CubeHandover_ckpt
+## Architecture
+### File Structure
+- `requirements.txt` - Core PyPI dependencies
+- `setup.sh` - Installation script for git dependencies
+- `app.py` - Main Gradio application (clean, no installation logic)
+- `start.py` - Alternative startup wrapper (optional)
+- `upload_checkpoint.py` - Script to upload checkpoints to HF Hub
+### Installation Flow
+1. HF Spaces installs `requirements.txt` (PyPI packages)
+2. `app.py` starts and calls `check_and_install_dependencies()`
+3. If needed, runs `setup.sh` to install git dependencies:
+   - RoboEval (with submodules)
+   - lerobot (specific commit)
+   - safetensors upgrade
+   - OpenPI + openpi-client
+### Key Dependencies
+- **RoboEval**: Installed from git with `--recurse-submodules` to include `thirdparty/mujoco_menagerie`
+- **lerobot**: Installed from specific commit `0cf864870cf29f4738d3ade893e6fd13fbd7cdb5`
+- **OpenPI**: Installed with `--no-deps` (dependencies in requirements.txt)
+- **openpi-client**: Installed from subdirectory `packages/openpi-client`
+## Environment Variables
+- `GH_TOKEN`: GitHub personal access token for private repo access
+- `MUJOCO_GL=egl`: Headless rendering
+- `PYOPENGL_PLATFORM=egl`: OpenGL backend
+- `XDG_RUNTIME_DIR=/tmp`: Runtime directory
+## Hardware Requirements
+- **Minimum**: T4 small GPU (16GB) - ~$0.60/hour
+- **Recommended**: L4 or A10G for better performance
+- **Free tier**: Won't work (insufficient memory for Pi0 models)
+## Checkpoint Usage
+Users can provide checkpoints in three formats:
+1. **Hugging Face Hub**: `tan7271/pi0_CubeHandover_ckpt` or `hf://tan7271/...`
+2. **Google Cloud Storage**: `gs://bucket/path/to/checkpoint`
+3. **Local path**: `/path/to/checkpoint` (must exist on Space filesystem)
+The app automatically detects HF Hub paths and downloads them using `snapshot_download`.
+## Known Issues Fixed
+1. ✅ `beartype` missing - Added to requirements.txt
+2. ✅ `safetensors` version conflict - Upgraded after lerobot install
+3. ✅ `lerobot.common` missing - Install from specific git commit
+4. ✅ `openpi_client` missing - Install from subdirectory
+5. ✅ `thirdparty/mujoco_menagerie` missing - Copy after RoboEval install
+6. ✅ Dependency resolution errors - Simplified requirements.txt
+7. ✅ PyPI version conflicts - Use exact working versions
+## Troubleshooting
+### If dependencies fail to install:
+- Check GH_TOKEN is set in Space secrets
+- Check setup.sh has execution permissions
+- Review build logs for specific errors
+### If GPU not detected:
+- Verify Space is using T4 small or better in Settings
+- Check logs for "GPU DIAGNOSTICS" section
+- Ensure CUDA drivers are loading
+### If out of memory:
+- Model might be too large for T4
+- Try reducing max_steps parameter
+- Consider using model quantization
+- Upgrade to larger GPU (L4, A10G)
+## Next Steps
+- Monitor build logs after deployment
+- Test inference with uploaded checkpoint
+- Add more checkpoints for other tasks
+- Consider adding model caching to speed up subsequent loads

README.md CHANGED Viewed

@@ -6,6 +6,7 @@ colorTo: purple
 sdk: gradio
 sdk_version: "4.44.0"
 app_file: app.py
 pinned: false
 license: mit
 python_version: "3.11"

 sdk: gradio
 sdk_version: "4.44.0"
 app_file: app.py
+startup_duration_timeout: 30m
 pinned: false
 license: mit
 python_version: "3.11"

app.py CHANGED Viewed

@@ -21,173 +21,30 @@ os.environ.setdefault("MUJOCO_GL", "egl")
 os.environ.setdefault("PYOPENGL_PLATFORM", "egl")
 os.environ.setdefault("XDG_RUNTIME_DIR", "/tmp")
-# --- Install RoboEval and OpenPI at runtime ---
-def install_roboeval():
-    """Install RoboEval from GitHub using the GH_TOKEN."""
-    try:
-        import roboeval
-        # Check if thirdparty directory exists
-        import roboeval.const as const
-        if not const.THIRD_PARTY_PATH.exists():
-            print("RoboEval installed but missing thirdparty submodules, reinstalling...")
-            raise ImportError("Missing thirdparty submodules")
-        print("RoboEval already installed")
-        return True
-    except ImportError:
-        print("Installing RoboEval with submodules...")
-        gh_token = os.environ.get("GH_TOKEN")
-        if not gh_token:
-            raise RuntimeError("GH_TOKEN environment variable not set")
-        # Clone with submodules to /tmp
-        clone_dir = "/tmp/roboeval_install"
-        repo_url = f"https://{gh_token}@github.com/helen9975/RoboEval.git"
-        # Remove old clone if exists
-        subprocess.run(["rm", "-rf", clone_dir], capture_output=True)
-        # Clone with submodules
-        print("Cloning RoboEval repository with submodules...")
-        clone_result = subprocess.run([
-            "git", "clone", "--recurse-submodules",
-            repo_url, clone_dir
-        ], capture_output=True, text=True)
-        if clone_result.returncode != 0:
-            print(f"Clone failed: {clone_result.stderr}")
-            raise RuntimeError(f"Failed to clone RoboEval: {clone_result.stderr}")
-        # Install from local directory (not editable, so files are copied to site-packages)
-        print("Installing RoboEval from cloned repository...")
-        result = subprocess.run([
-            sys.executable, "-m", "pip", "install",
-            clone_dir, "--no-cache-dir"
-        ], capture_output=True, text=True)
-        if result.returncode != 0:
-            print(f"Installation failed: {result.stderr}")
-            raise RuntimeError(f"Failed to install RoboEval: {result.stderr}")
-        # Copy thirdparty directory to the installed package location
-        print("Copying thirdparty submodules to site-packages...")
-        import site
-        import shutil
-        site_packages = site.getsitepackages()[0]
-        thirdparty_src = os.path.join(clone_dir, "thirdparty")
-        thirdparty_dst = os.path.join(site_packages, "thirdparty")
-        if os.path.exists(thirdparty_src):
-            shutil.copytree(thirdparty_src, thirdparty_dst, dirs_exist_ok=True)
-            print(f"Copied thirdparty to {thirdparty_dst}")
-        else:
-            print("Warning: thirdparty directory not found in cloned repo")
-        print("RoboEval installed successfully with submodules")
-        return True
-def install_lerobot():
-    """Install lerobot from specific git commit as required by OpenPI."""
-    try:
-        import lerobot.common
-        print("lerobot already installed")
-        return True
-    except ImportError:
-        print("Installing lerobot from git (specific commit required by OpenPI)...")
-        # OpenPI requires lerobot from this specific commit
-        lerobot_url = "git+https://github.com/huggingface/lerobot@0cf864870cf29f4738d3ade893e6fd13fbd7cdb5"
-        result = subprocess.run([
-            sys.executable, "-m", "pip", "install",
-            lerobot_url, "--no-cache-dir"
-        ], capture_output=True, text=True)
-        if result.returncode != 0:
-            print(f"lerobot installation failed: {result.stderr}")
-            return False
-        print("lerobot installed successfully")
-        return True
-def install_openpi():
-    """Install OpenPI from your forked repository using the GH_TOKEN."""
     try:
         import openpi
-        print("OpenPI already installed")
         return True
-    except ImportError:
-        print("Installing OpenPI from tan7271/OpenPiRoboEval...")
-        gh_token = os.environ.get("GH_TOKEN")
-        if not gh_token:
-            raise RuntimeError("GH_TOKEN environment variable not set")
-        repo_url = f"https://{gh_token}@github.com/tan7271/OpenPiRoboEval.git"
-        # First install openpi-client from the subdirectory
-        print("Installing openpi-client...")
-        client_result = subprocess.run([
-            sys.executable, "-m", "pip", "install",
-            f"git+{repo_url}#subdirectory=packages/openpi-client", "--no-cache-dir", "--no-deps"
-        ], capture_output=True, text=True)
-        if client_result.returncode != 0:
-            print(f"openpi-client installation failed: {client_result.stderr}")
-        else:
-            print("openpi-client installed successfully")
-        # Then install OpenPI with --no-deps since all dependencies are in requirements.txt
-        result = subprocess.run([
-            sys.executable, "-m", "pip", "install",
-            f"git+{repo_url}", "--no-cache-dir", "--no-deps", "--force-reinstall"
-        ], capture_output=True, text=True)
         if result.returncode != 0:
-            print(f"OpenPI installation failed: {result.stderr}")
-            # Try alternative approach - clone and install manually
-            print("Trying alternative installation method...")
-            try:
-                # Clone the repository
-                clone_result = subprocess.run([
-                    "git", "clone", "--depth", "1",
-                    f"https://{gh_token}@github.com/tan7271/OpenPiRoboEval.git",
-                    "/tmp/openpi"
-                ], capture_output=True, text=True)
-                if clone_result.returncode != 0:
-                    raise RuntimeError(f"Failed to clone repository: {clone_result.stderr}")
-                # Install in development mode with no dependencies
-                install_result = subprocess.run([
-                    sys.executable, "-m", "pip", "install",
-                    "-e", "/tmp/openpi", "--no-deps", "--force-reinstall"
-                ], capture_output=True, text=True)
-                if install_result.returncode != 0:
-                    raise RuntimeError(f"Failed to install in dev mode: {install_result.stderr}")
-                print("OpenPI installed successfully via alternative method")
-                return True
-            except Exception as e:
-                raise RuntimeError(f"All installation methods failed: {e}")
-        print("OpenPI installed successfully")
         return True
-# Install packages
-install_roboeval()
-install_lerobot()
-# Upgrade safetensors to fix version conflict
-print("Upgrading safetensors to >=0.4.1...")
-result = subprocess.run([
-    sys.executable, "-m", "pip", "install",
-    "safetensors>=0.4.1", "--upgrade", "--no-cache-dir"
-], capture_output=True, text=True)
-if result.returncode == 0:
-    print("safetensors upgraded successfully")
-else:
-    print(f"safetensors upgrade failed: {result.stderr}")
-install_openpi()
 # --- OpenPI (local inference) ---
 try:
@@ -284,7 +141,7 @@ _ENV_CLASSES = {
 DEFAULT_DEVICE = "cuda:0" if os.path.exists("/dev/nvidia0") else "cpu"
 DEFAULT_DOWNSAMPLE_RATE = 25
 DEFAULT_MAX_STEPS = 200
-DEFAULT_FPS = 5
 # Check GPU availability and print diagnostics
 def check_gpu_status():

 os.environ.setdefault("PYOPENGL_PLATFORM", "egl")
 os.environ.setdefault("XDG_RUNTIME_DIR", "/tmp")
+# Note: Dependencies are installed via setup.sh before the app starts
+# This keeps the app code clean and separates installation logic
+# Run setup if dependencies aren't installed
+def check_and_install_dependencies():
+    """Check if dependencies are installed, run setup if not."""
     try:
+        import roboeval
+        import lerobot
         import openpi
+        print("All dependencies already installed")
         return True
+    except ImportError as e:
+        print(f"Missing dependency: {e}")
+        print("Running setup script...")
+        import subprocess
+        result = subprocess.run(["bash", "setup.sh"], cwd=os.path.dirname(__file__))
         if result.returncode != 0:
+            raise RuntimeError("Setup script failed")
         return True
+import datetime
+print(f"===== Application Startup at {datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')} =====\n")
+check_and_install_dependencies()
 # --- OpenPI (local inference) ---
 try:
 DEFAULT_DEVICE = "cuda:0" if os.path.exists("/dev/nvidia0") else "cpu"
 DEFAULT_DOWNSAMPLE_RATE = 25
 DEFAULT_MAX_STEPS = 200
+DEFAULT_FPS = 25
 # Check GPU availability and print diagnostics
 def check_gpu_status():

setup.sh ADDED Viewed

	@@ -0,0 +1,50 @@

+#!/bin/bash
+set -e
+echo "===== Installing Dependencies ====="
+# Install RoboEval with submodules
+echo "Installing RoboEval with submodules..."
+CLONE_DIR="/tmp/roboeval_install"
+rm -rf $CLONE_DIR
+# Clone with submodules
+echo "Cloning RoboEval repository with submodules..."
+git clone --recurse-submodules https://${GH_TOKEN}@github.com/helen9975/RoboEval.git $CLONE_DIR
+# Install
+echo "Installing RoboEval from cloned repository..."
+pip install $CLONE_DIR --no-cache-dir
+# Copy thirdparty to site-packages
+echo "Copying thirdparty submodules to site-packages..."
+SITE_PACKAGES=$(python -c "import site; print(site.getsitepackages()[0])")
+cp -r $CLONE_DIR/thirdparty $SITE_PACKAGES/
+echo "Copied thirdparty to $SITE_PACKAGES/thirdparty"
+echo "RoboEval installed successfully with submodules"
+# Install lerobot from specific commit
+echo "Installing lerobot from git (specific commit required by OpenPI)..."
+pip install git+https://github.com/huggingface/lerobot@0cf864870cf29f4738d3ade893e6fd13fbd7cdb5 --no-cache-dir
+echo "lerobot installed successfully"
+# Upgrade safetensors to fix version conflict
+echo "Upgrading safetensors to >=0.4.1..."
+pip install "safetensors>=0.4.1" --upgrade --no-cache-dir
+echo "safetensors upgraded successfully"
+# Install OpenPI with openpi-client
+echo "Installing OpenPI from tan7271/OpenPiRoboEval..."
+# Install openpi-client
+echo "Installing openpi-client..."
+pip install git+https://${GH_TOKEN}@github.com/tan7271/OpenPiRoboEval.git#subdirectory=packages/openpi-client --no-cache-dir --no-deps
+echo "openpi-client installed successfully"
+# Install OpenPI
+pip install git+https://${GH_TOKEN}@github.com/tan7271/OpenPiRoboEval.git --no-cache-dir --no-deps --force-reinstall
+echo "OpenPI installed successfully"
+echo "===== All dependencies installed ====="

start.py ADDED Viewed

	@@ -0,0 +1,28 @@

+#!/usr/bin/env python3
+"""
+Startup script that runs setup.sh before launching the Gradio app.
+"""
+import subprocess
+import sys
+import os
+def main():
+    """Run setup script then launch app."""
+    print("Running dependency setup...")
+    # Run setup script
+    result = subprocess.run(["bash", "setup.sh"], cwd=os.path.dirname(__file__))
+    if result.returncode != 0:
+        print("Setup failed!")
+        sys.exit(1)
+    print("\nLaunching Gradio app...")
+    # Import and run the app
+    from app import demo
+    demo.launch()
+if __name__ == "__main__":
+    main()

upload_checkpoint.py ADDED Viewed

	@@ -0,0 +1,77 @@

+#!/usr/bin/env python3
+"""
+Upload pi0_CubeHandover checkpoint to Hugging Face Hub
+"""
+from huggingface_hub import HfApi, create_repo
+import os
+# Configuration
+CHECKPOINT_DIR = "pi0_CubeHandover_ckpt"
+REPO_ID = "tan7271/pi0_CubeHandover_ckpt"  # Change "tan7271" to your HF username if different
+REPO_TYPE = "model"
+def upload_checkpoint():
+    """Upload checkpoint to Hugging Face Hub"""
+    # Check if checkpoint directory exists
+    if not os.path.exists(CHECKPOINT_DIR):
+        print(f"Error: Checkpoint directory '{CHECKPOINT_DIR}' not found!")
+        return
+    print(f"📦 Uploading checkpoint from: {CHECKPOINT_DIR}")
+    print(f"🎯 Target repository: {REPO_ID}")
+    print()
+    # Initialize Hugging Face API
+    api = HfApi()
+    try:
+        # Create repository (if it doesn't exist)
+        print("Creating repository on Hugging Face Hub...")
+        create_repo(
+            repo_id=REPO_ID,
+            repo_type=REPO_TYPE,
+            exist_ok=True,  # Don't fail if repo already exists
+            private=False,  # Set to True if you want a private repo
+        )
+        print(f"✅ Repository created/verified: https://huggingface.co/{REPO_ID}")
+        print()
+        # Upload the checkpoint folder (using upload_large_folder for large checkpoints)
+        print("📤 Uploading checkpoint files...")
+        print("(This may take a while depending on checkpoint size)")
+        print("Using upload_large_folder for better handling of large files...")
+        print()
+        api.upload_large_folder(
+            folder_path=CHECKPOINT_DIR,
+            repo_id=REPO_ID,
+            repo_type=REPO_TYPE,
+            num_workers=4,  # Upload files in parallel
+        )
+        print()
+        print("=" * 60)
+        print("🎉 Upload completed successfully!")
+        print("=" * 60)
+        print()
+        print(f"📍 Your checkpoint is now available at:")
+        print(f"   https://huggingface.co/{REPO_ID}")
+        print()
+        print(f"🔗 To use in your Gradio app, use this path:")
+        print(f'   checkpoint_path = "hf://datasets/{REPO_ID}"')
+        print()
+        print("📝 Note: It may take a few minutes for the files to be fully processed")
+        print()
+    except Exception as e:
+        print(f"❌ Error during upload: {e}")
+        print()
+        print("💡 Make sure you're logged in to Hugging Face:")
+        print("   Run: huggingface-cli login")
+        print("   Or set HF_TOKEN environment variable")
+if __name__ == "__main__":
+    upload_checkpoint()