action / README.md
GGSheng's picture
fix: improve SSH service stability and backup.py error handling
139c740 verified
metadata
title: action
emoji: 🦞
colorFrom: blue
colorTo: indigo
sdk: docker
sdk_version: 29.0.4
python_version: 3.14.4
app_port: 7860
app_file: mian.py
pinned: false

OpenClaw on Hugging Face Space (Docker)

Languages: English · 简体中文 Deployment Guide: DEPLOY_GUIDE.md | 中文部署指南

This setup is designed to provide the following:

  • Build the OpenClaw container on top of ubuntu:24.04
  • Serve the OpenClaw dashboard directly on port 7860 (default Space access port)
  • Use third-party OpenAI-compatible base_url + api_key by default (injected via environment variables)
  • Store OpenClaw config/workspace under /root/.openclaw
  • Restore state automatically from a Hugging Face Dataset on startup
  • Run scheduled backups of OpenClaw data to a Hugging Face Dataset via cron (as root user)
  • Incremental backup + dynamic strategy + AES-256-CBC encryption + large file splitting
  • Backup watchdog (auto-triggers backup when cron fails)
  • SSH service with auto-healing watchdog + host key generation
  • CCMR (Claude Code Model Router) with 10 platform API key support
  • Multi-dataset restore (restore from a different dataset)
  • Preinstall python3, uv, vim, neovim, chromium (via Chrome for Testing archive), gh, hf, opencode, codex, claude (Claude Code CLI), @larksuite/cli (with npx skills add larksuite/cli -y -g), and sshx in the image for interactive terminal use

Repository Layout

  • Dockerfile: Runtime image for the Space
  • scripts/openclaw-entrypoint.sh: Main startup flow (restore, config generation, cron setup, gateway start)
  • scripts/hf-entrypoint.sh: HF Spaces container entrypoint (PID 1, manages supervisord + SSH + PM2 + BT Panel)
  • scripts/supervisord.conf: Supervisord config, manages cron, backup-watchdog, openclaw-gateway, ccmr-gateway
  • openclaw_hf/backup.py: Backup/restore implementation (full/incremental, encryption, split, dynamic strategy, resume)
  • scripts/openclaw-backup-cron.sh: Cron entrypoint for backup jobs
  • scripts/openclaw-backup-watchdog.sh: Backup watchdog, auto-triggers backup when overdue
  • scripts/openclaw-backup-health.sh: Backup health check & auto-repair
  • scripts/openclaw-restore.sh: Startup restore entrypoint
  • scripts/openclaw-gateway-ctl: Gateway process management (start/stop/restart/reload)
  • scripts/openclaw-env-sync.sh: Sync environment variables from HF API
  • scripts/update-env-from-secrets.sh: Fetch latest env vars from HF API
  • scripts/bt_install_panel_custom.sh: BT Panel installation script
  • scripts/bootstrap-hf.sh: Interactive bootstrap for Space/Dataset creation, upload, and Space variables/secrets setup (macOS/Linux)
  • scripts/bootstrap-hf.ps1: Interactive bootstrap for Space/Dataset creation, upload, and Space variables/secrets setup (Windows PowerShell)
  • scripts/rebuild-space.sh: Force push latest code to Space and trigger rebuild
  • scripts/delete-backups.sh: Batch cleanup old backups from Dataset
  • scripts/delete-hf.py: HF resource deletion tool (Space/Dataset/files/storage)
  • scripts/find-largest-backup.py: Find best backup in Dataset
  • scripts/ssh_service_watchdog.sh: SSH service watchdog (process monitor + auto-recovery)
  • scripts/check_ssh_health.sh: SSH health check (used by Docker HEALTHCHECK)
  • scripts/ssh-agent-autostart.sh: SSH agent auto-start and key loading
  • scripts/optimize_ssh.sh: SSH configuration optimization
  • scripts/save-env.sh: Save environment to /etc/profile.d
  • scripts/hf-storage.sh / scripts/hf-storage.py: HuggingFace storage utilities
  • scripts/ccmr-setup.sh: CCMR configuration generation
  • scripts/ccmr-wrapper.sh: CCMR Supervisor wrapper (hot-reload + crash recovery)
  • scripts/server.js: PID 1 keep-alive HTTP server
  • pm2/ecosystem.config.js: PM2 configuration (optional extension)
  • tests/test_backup.py: Unit tests for the backup module
  • tests/test_entrypoint_config.py: Unit tests for gateway config generation behavior

Required Variables (Space Settings)

In your Hugging Face Space (Settings -> Variables and secrets), configure at least:

  • Variable: OPENCLAW_BACKUP_DATASET_REPO: Backup target Dataset in username/dataset-name format
  • Secret: HF_TOKEN: Used to write backups to the Dataset (must have write permission to that Dataset)
  • Secret: OPENCLAW_GATEWAY_TOKEN: Gateway token (recommended; if omitted in deployment workflow, generate a random 32-character value)
  • Secret: OPENCLAW_GATEWAY_PASSWORD: Gateway password (optional; if omitted in deployment workflow, generate a random 16-character value)

When using ./scripts/bootstrap-hf.sh (macOS/Linux) or ./scripts/bootstrap-hf.ps1 (Windows PowerShell), these values are configured automatically on the target Space.

Optional LLM Variables (All-Or-None)

Set all of these together only when you want OpenClaw to preconfigure a custom third-party model:

  • Variable: OPENCLAW_LLM_BASE_URL: Third-party base URL (for example OpenAI-compatible /v1)
  • Variable: OPENCLAW_LLM_MODEL: Third-party model ID
  • Secret: OPENCLAW_LLM_API_KEY: Third-party API key

If any of the three is missing, entrypoint skips custom model generation. In that case, you can still configure from inside the container (for example via sshx).

Common Optional Variables

Variable Default Description
OPENCLAW_VERSION latest OpenClaw version for Docker install
OPENCLAW_GATEWAY_PORT 18789 Gateway listen port
OPENCLAW_GATEWAY_BIND lan Gateway bind mode (lan/local)
OPENCLAW_STATE_DIR /root/.openclaw OpenClaw state directory
OPENCLAW_USER root Runtime user for gateway and cron
OPENCLAW_GROUP root Runtime group
OPENCLAW_CONFIG_PATH /root/.openclaw/openclaw.json Gateway config path
OPENCLAW_WORKSPACE_DIR /root/.openclaw/workspace Workspace directory
OPENCLAW_BACKUP_CRON */10 * * * * Backup cron expression
OPENCLAW_BACKUP_SOURCE_DIR /root/.openclaw Backup/restore base directory
OPENCLAW_BACKUP_ROOT_*_DIR Various Extra backup dirs (config, codex, claude, agents, ssh, env, npm, lark-cli)
OPENCLAW_BACKUP_PATH_PREFIX backups Backup path prefix
OPENCLAW_BACKUP_KEEP_COUNT 24 Number of backups to keep
OPENCLAW_BACKUP_ENCRYPTION_ENABLED false Enable AES-256-CBC encryption
OPENCLAW_BACKUP_SPLIT_SIZE 500M Large file split volume size
OPENCLAW_INCREMENTAL_BACKUP true Enable incremental backup
OPENCLAW_DYNAMIC_BACKUP true Enable dynamic backup strategy
OPENCLAW_FULL_BACKUP_INTERVAL_HOURS 1 Force full backup interval
OPENCLAW_MAX_INCREMENTAL_BACKUPS 15 Max incremental backups before full
OPENCLAW_RESTORE_TIMEOUT 5400 Restore timeout (seconds, 90 min)
WATCHDOG_INTERVAL 600 Backup watchdog check interval (s)
MAX_BACKUP_AGE_MINUTES 30 Max backup age (minutes)
FORCE_BACKUP_INTERVAL 14400 Force backup interval (seconds)
OPENCLAW_SSHX_AUTO_START false Auto-start sshx on boot
OPENCLAW_GATEWAY_AUTH_MODE token Auth mode (token/password)
ROOT_PASSWORD lauer3912 SSH root password
CCMR_ENABLED false Enable Claude Code Model Router
CCMR_PORT 8080 CCMR gateway port

Quick Deployment

Run the interactive bootstrap script from repo root:

./scripts/bootstrap-hf.sh
powershell -ExecutionPolicy ByPass -File .\scripts\bootstrap-hf.ps1

bootstrap-hf.sh / bootstrap-hf.ps1 will:

  • Check/install hf CLI:
    • macOS/Linux: curl -LsSf https://hf.co/cli/install.sh | bash
    • Windows PowerShell: powershell -ExecutionPolicy ByPass -c "irm https://hf.co/cli/install.ps1 | iex"
  • Resolve HF auth first (before all other variables):
    • if hf auth whoami is not logged in: prompt HF_TOKEN and run hf auth login --token <HF_TOKEN>
    • if already logged in: ask whether to use current user
      • choose yes: continue
      • choose no: backup current token, prompt new HF_TOKEN, run hf auth login --token <HF_TOKEN>, and restore the previous token at the end
  • Ask for space_name, dataset_name, OPENCLAW_VERSION, gateway token/password, and optional LLM settings
  • Default OPENCLAW_VERSION to latest detected from npm registry (openclaw), fallback latest when detection fails
  • Auto-generate OPENCLAW_GATEWAY_TOKEN (32 chars) and OPENCLAW_GATEWAY_PASSWORD (16 chars) if left empty
  • Create private Space + Dataset and upload this repository
  • Configure Space Variables and secrets automatically, including:
    • OPENCLAW_BACKUP_DATASET_REPO
    • OPENCLAW_VERSION
    • HF_TOKEN
    • OPENCLAW_GATEWAY_TOKEN
    • OPENCLAW_GATEWAY_PASSWORD
    • OPENCLAW_GATEWAY_CONTROLUI_ALLOW_INSECURE_AUTH=false
    • OPENCLAW_GATEWAY_CONTROLUI_DANGEROUSLY_DISABLE_DEVICE_AUTH=false
  • Optionally configure LLM triplet and set OPENCLAW_SSHX_AUTO_START from prompt choice (true/false)
  • Print planned deployment settings and require a final confirmation before creating/updating Space/Dataset resources
  • Print Hugging Face Space page URL, app URL, and /healthz

If gateway token/password were auto-generated, the script prints them at the end.

Agent Hand-off Prompt

Copy and send to your agent:

Please deploy OpenClaw to Hugging Face by strictly following the deployment skill in https://github.com/tenfyzhong/openclaw-hf/blob/main/SKILL.md

Hugging Face Keep-Alive

How to keep a Space available depends on hardware tier:

  • Free cpu-basic: the Space sleeps after inactivity (currently around 48h). It cannot be configured to run forever on free hardware.
  • Paid hardware: the Space runs continuously by default. In Settings -> Hardware, set Sleep time to Never (or use API with sleep_time=-1) for true 24/7 availability.
  • Cost-saving mode on paid hardware: set a custom Sleep time (for example 3600 seconds) so it auto-sleeps and auto-wakes on the next visit.

Space URL composition:

  • Space repo ID format: <owner>/<space_name> (example: tenfyzhong/openclaw-hf)
  • Public runtime host format: https://<owner>-<space_name>.hf.space
  • OpenClaw health check URL: https://<owner>-<space_name>.hf.space/healthz
  • Inside the Space runtime, Hugging Face also provides SPACE_HOST, so health URL can be built as https://${SPACE_HOST}/healthz.

Example:

OPENCLAW_HF_SPACE_ID="tenfyzhong/openclaw-hf"
SPACE_HOST="${OPENCLAW_HF_SPACE_ID/\//-}.hf.space"
HEALTH_URL="https://${SPACE_HOST}/healthz"
echo "$HEALTH_URL"

Keep-alive by periodic health checks:

*/12 * * * * HF_TOKEN=hf_xxx /path/to/repo/scripts/check-space-health.sh tenfyzhong/openclaw-hf >/dev/null || true

Notes:

  • For private Spaces, unauthenticated calls to https://<owner>-<space_name>.hf.space/healthz return a Hub 404 page. This is expected access control behavior.
  • For private Spaces, include Authorization: Bearer <HF_TOKEN> (the helper script above does this automatically via HF_TOKEN or HUGGINGFACE_HUB_TOKEN).
  • This ping strategy is a practical workaround for reducing idle sleep on free hardware, but it is not a guaranteed always-on method.
  • If you need strict 24/7 uptime, use paid hardware and set sleep time to Never.

References:

Programmatic options (owner token required):

from huggingface_hub import HfApi

api = HfApi(token="hf_xxx")
repo_id = "your-username/your-space"

# Keep running (paid hardware)
api.set_space_sleep_time(repo_id=repo_id, sleep_time=-1)

# Or sleep after 1 hour of inactivity
api.set_space_sleep_time(repo_id=repo_id, sleep_time=3600)

# Manual control
api.pause_space(repo_id=repo_id)
api.restart_space(repo_id=repo_id)

For this project, if you need stable dashboard access without cold starts, use paid hardware and set sleep time to Never.

SSH Service

The container has a comprehensive SSH service guarding system to ensure continuous availability:

  • Auto-start: Entrypoint generates host keys, cleans stale PID files, starts sshd
  • SSH Watchdog (ssh_service_watchdog.sh): Monitors sshd every 30s, auto-recovers on failure
  • Multi-level repair: Config corruption → backup config → minimal config → auto-reinstall openssh-server
  • Exponential backoff: Gradually increases wait time on consecutive failures
  • Health check (check_ssh_health.sh): Used by Docker HEALTHCHECK
  • SSH Agent auto-load: Auto-starts ssh-agent and loads keys from /root/.ssh/
  • Root password: Set via ROOT_PASSWORD environment variable

CCMR (Claude Code Model Router)

CCMR gateway is integrated and managed by Supervisord with hot-reload support:

  • Auto-config: Set CCMR_*_API_KEY env vars to enable
  • 10 API Key slots: DeepSeek, Qwen, Kimi, GLM, MiniMax (CN/Global), MiMo (SGP/CN/AMS/PAYG)
  • File hot-reload: Edit /root/.env.d/ccmr.env and changes apply immediately without restart
  • Crash recovery: Supervisord auto-restarts CCMR process

Backup/Restore Flow

Restore

Automatic restore on startup (always runs on container restart/rebuild):

  • openclaw-state -> OPENCLAW_BACKUP_SOURCE_DIR (default /root/.openclaw)
  • root-config -> OPENCLAW_BACKUP_ROOT_CONFIG_DIR (default /root/.config)
  • root-codex -> OPENCLAW_BACKUP_ROOT_CODEX_DIR (default /root/.codex)
  • root-claude -> OPENCLAW_BACKUP_ROOT_CLAUDE_DIR (default /root/.claude)
  • root-agents -> OPENCLAW_BACKUP_ROOT_AGENTS_DIR (default /root/.agents)
  • root-ssh -> OPENCLAW_BACKUP_ROOT_SSH_DIR (default /root/.ssh)
  • root-env -> OPENCLAW_BACKUP_ROOT_ENV_DIR (default /root/.env.d)
  • root-npm -> OPENCLAW_BACKUP_ROOT_NPM_DIR (default /root/.npm)
  • root-lark-cli -> OPENCLAW_BACKUP_ROOT_LARK_CLI_DIR (default /root/.lark-cli)

Multi-dataset restore: set OPENCLAW_RESTORE_DATASET_REPO to restore from a different dataset.

Backup

  • Scheduled backup: Runs based on OPENCLAW_BACKUP_CRON (default every 10 min)
  • Incremental backup (default on): Only backs up changed files after a full backup
  • Dynamic strategy (default on): Auto-adjusts compression and splitting based on file size and change rate
  • AES-256-CBC encryption: Optional, allows secure storage on public datasets
  • Large file splitting: Default 500MB per volume, avoids upload failures
  • Resume support: Creates checkpoint files during upload, allows resume on interruption
  • Shutdown backup: Final backup before container exit on stop signal
  • Retention: Keeps newest OPENCLAW_BACKUP_KEEP_COUNT (default 24) archives, auto-deletes older ones

Backup Watchdog

openclaw-backup-watchdog.sh acts as the last line of defense:

  • Auto-triggers backup when no backup for MAX_BACKUP_AGE_MINUTES (default 30 min)
  • Force backup every FORCE_BACKUP_INTERVAL (default 4 hours)
  • File lock prevents concurrent execution
  • Automatic backoff on consecutive failures

Use sshx Inside the Container

sshx is preinstalled in the image.

  1. Auto-start sshx in background via environment variables:
OPENCLAW_SSHX_AUTO_START=true

When enabled, entrypoint starts sshx in background and sends sshx output directly to container stdout/stderr logs (no file logging).

  1. Manual start inside container:
sshx
  1. Let OpenClaw start a process itself (run in OpenClaw terminal/tool):
nohup sshx >/proc/1/fd/1 2>/proc/1/fd/2 &
  1. After use, close sshx process promptly:
pgrep -fa sshx
pkill -TERM -f '(^|/)sshx($| )'

Local Test

python3 -m unittest discover -s tests -p 'test_*.py'

Pull Requests to main run GitHub Actions CI automatically (.github/workflows/pr-ci.yml):

  • Unit tests: python3 -m unittest discover -s tests -p 'test_*.py'
  • Docker image build: docker build (via Buildx) with OPENCLAW_VERSION=latest

License

MIT. See LICENSE.