Spaces:

GGSheng
/

action

Sleeping

App Files Files Community

action / README.md

GGSheng

fix: improve SSH service stability and backup.py error handling

139c740 verified 1 day ago

preview code

raw

history blame contribute delete

16.3 kB

metadata

title: action
emoji: 🦞
colorFrom: blue
colorTo: indigo
sdk: docker
sdk_version: 29.0.4
python_version: 3.14.4
app_port: 7860
app_file: mian.py
pinned: false

OpenClaw on Hugging Face Space (Docker)

Languages: English · 简体中文 Deployment Guide: DEPLOY_GUIDE.md | 中文部署指南

This setup is designed to provide the following:

Build the OpenClaw container on top of ubuntu:24.04
Serve the OpenClaw dashboard directly on port 7860 (default Space access port)
Use third-party OpenAI-compatible base_url + api_key by default (injected via environment variables)
Store OpenClaw config/workspace under /root/.openclaw
Restore state automatically from a Hugging Face Dataset on startup
Run scheduled backups of OpenClaw data to a Hugging Face Dataset via cron (as root user)
Incremental backup + dynamic strategy + AES-256-CBC encryption + large file splitting
Backup watchdog (auto-triggers backup when cron fails)
SSH service with auto-healing watchdog + host key generation
CCMR (Claude Code Model Router) with 10 platform API key support
Multi-dataset restore (restore from a different dataset)
Preinstall python3, uv, vim, neovim, chromium (via Chrome for Testing archive), gh, hf, opencode, codex, claude (Claude Code CLI), @larksuite/cli (with npx skills add larksuite/cli -y -g), and sshx in the image for interactive terminal use

Repository Layout

Dockerfile: Runtime image for the Space
scripts/openclaw-entrypoint.sh: Main startup flow (restore, config generation, cron setup, gateway start)
scripts/hf-entrypoint.sh: HF Spaces container entrypoint (PID 1, manages supervisord + SSH + PM2 + BT Panel)
scripts/supervisord.conf: Supervisord config, manages cron, backup-watchdog, openclaw-gateway, ccmr-gateway
openclaw_hf/backup.py: Backup/restore implementation (full/incremental, encryption, split, dynamic strategy, resume)
scripts/openclaw-backup-cron.sh: Cron entrypoint for backup jobs
scripts/openclaw-backup-watchdog.sh: Backup watchdog, auto-triggers backup when overdue
scripts/openclaw-backup-health.sh: Backup health check & auto-repair
scripts/openclaw-restore.sh: Startup restore entrypoint
scripts/openclaw-gateway-ctl: Gateway process management (start/stop/restart/reload)
scripts/openclaw-env-sync.sh: Sync environment variables from HF API
scripts/update-env-from-secrets.sh: Fetch latest env vars from HF API
scripts/bt_install_panel_custom.sh: BT Panel installation script
scripts/bootstrap-hf.sh: Interactive bootstrap for Space/Dataset creation, upload, and Space variables/secrets setup (macOS/Linux)
scripts/bootstrap-hf.ps1: Interactive bootstrap for Space/Dataset creation, upload, and Space variables/secrets setup (Windows PowerShell)
scripts/rebuild-space.sh: Force push latest code to Space and trigger rebuild
scripts/delete-backups.sh: Batch cleanup old backups from Dataset
scripts/delete-hf.py: HF resource deletion tool (Space/Dataset/files/storage)
scripts/find-largest-backup.py: Find best backup in Dataset
scripts/ssh_service_watchdog.sh: SSH service watchdog (process monitor + auto-recovery)
scripts/check_ssh_health.sh: SSH health check (used by Docker HEALTHCHECK)
scripts/ssh-agent-autostart.sh: SSH agent auto-start and key loading
scripts/optimize_ssh.sh: SSH configuration optimization
scripts/save-env.sh: Save environment to /etc/profile.d
scripts/hf-storage.sh / scripts/hf-storage.py: HuggingFace storage utilities
scripts/ccmr-setup.sh: CCMR configuration generation
scripts/ccmr-wrapper.sh: CCMR Supervisor wrapper (hot-reload + crash recovery)
scripts/server.js: PID 1 keep-alive HTTP server
pm2/ecosystem.config.js: PM2 configuration (optional extension)
tests/test_backup.py: Unit tests for the backup module
tests/test_entrypoint_config.py: Unit tests for gateway config generation behavior

Required Variables (Space Settings)

In your Hugging Face Space (Settings -> Variables and secrets), configure at least:

Variable: OPENCLAW_BACKUP_DATASET_REPO: Backup target Dataset in username/dataset-name format
Secret: HF_TOKEN: Used to write backups to the Dataset (must have write permission to that Dataset)
Secret: OPENCLAW_GATEWAY_TOKEN: Gateway token (recommended; if omitted in deployment workflow, generate a random 32-character value)
Secret: OPENCLAW_GATEWAY_PASSWORD: Gateway password (optional; if omitted in deployment workflow, generate a random 16-character value)

When using ./scripts/bootstrap-hf.sh (macOS/Linux) or ./scripts/bootstrap-hf.ps1 (Windows PowerShell), these values are configured automatically on the target Space.

Optional LLM Variables (All-Or-None)

Set all of these together only when you want OpenClaw to preconfigure a custom third-party model:

Variable: OPENCLAW_LLM_BASE_URL: Third-party base URL (for example OpenAI-compatible /v1)
Variable: OPENCLAW_LLM_MODEL: Third-party model ID
Secret: OPENCLAW_LLM_API_KEY: Third-party API key

If any of the three is missing, entrypoint skips custom model generation. In that case, you can still configure from inside the container (for example via sshx).

Common Optional Variables

Variable	Default	Description
`OPENCLAW_VERSION`	`latest`	OpenClaw version for Docker install
`OPENCLAW_GATEWAY_PORT`	`18789`	Gateway listen port
`OPENCLAW_GATEWAY_BIND`	`lan`	Gateway bind mode (`lan`/`local`)
`OPENCLAW_STATE_DIR`	`/root/.openclaw`	OpenClaw state directory
`OPENCLAW_USER`	`root`	Runtime user for gateway and cron
`OPENCLAW_GROUP`	`root`	Runtime group
`OPENCLAW_CONFIG_PATH`	`/root/.openclaw/openclaw.json`	Gateway config path
`OPENCLAW_WORKSPACE_DIR`	`/root/.openclaw/workspace`	Workspace directory
`OPENCLAW_BACKUP_CRON`	`/10 * * *`	Backup cron expression
`OPENCLAW_BACKUP_SOURCE_DIR`	`/root/.openclaw`	Backup/restore base directory
`OPENCLAW_BACKUP_ROOT_*_DIR`	Various	Extra backup dirs (config, codex, claude, agents, ssh, env, npm, lark-cli)
`OPENCLAW_BACKUP_PATH_PREFIX`	`backups`	Backup path prefix
`OPENCLAW_BACKUP_KEEP_COUNT`	`24`	Number of backups to keep
`OPENCLAW_BACKUP_ENCRYPTION_ENABLED`	`false`	Enable AES-256-CBC encryption
`OPENCLAW_BACKUP_SPLIT_SIZE`	`500M`	Large file split volume size
`OPENCLAW_INCREMENTAL_BACKUP`	`true`	Enable incremental backup
`OPENCLAW_DYNAMIC_BACKUP`	`true`	Enable dynamic backup strategy
`OPENCLAW_FULL_BACKUP_INTERVAL_HOURS`	`1`	Force full backup interval
`OPENCLAW_MAX_INCREMENTAL_BACKUPS`	`15`	Max incremental backups before full
`OPENCLAW_RESTORE_TIMEOUT`	`5400`	Restore timeout (seconds, 90 min)
`WATCHDOG_INTERVAL`	`600`	Backup watchdog check interval (s)
`MAX_BACKUP_AGE_MINUTES`	`30`	Max backup age (minutes)
`FORCE_BACKUP_INTERVAL`	`14400`	Force backup interval (seconds)
`OPENCLAW_SSHX_AUTO_START`	`false`	Auto-start `sshx` on boot
`OPENCLAW_GATEWAY_AUTH_MODE`	`token`	Auth mode (`token`/`password`)
`ROOT_PASSWORD`	`lauer3912`	SSH root password
`CCMR_ENABLED`	`false`	Enable Claude Code Model Router
`CCMR_PORT`	`8080`	CCMR gateway port

Quick Deployment

Run the interactive bootstrap script from repo root:

./scripts/bootstrap-hf.sh

powershell -ExecutionPolicy ByPass -File .\scripts\bootstrap-hf.ps1

bootstrap-hf.sh / bootstrap-hf.ps1 will:

Check/install hf CLI:
- macOS/Linux: curl -LsSf https://hf.co/cli/install.sh | bash
- Windows PowerShell: powershell -ExecutionPolicy ByPass -c "irm https://hf.co/cli/install.ps1 | iex"
Resolve HF auth first (before all other variables):
- if hf auth whoami is not logged in: prompt HF_TOKEN and run hf auth login --token <HF_TOKEN>
- if already logged in: ask whether to use current user
  - choose yes: continue
  - choose no: backup current token, prompt new HF_TOKEN, run hf auth login --token <HF_TOKEN>, and restore the previous token at the end
Ask for space_name, dataset_name, OPENCLAW_VERSION, gateway token/password, and optional LLM settings
Default OPENCLAW_VERSION to latest detected from npm registry (openclaw), fallback latest when detection fails
Auto-generate OPENCLAW_GATEWAY_TOKEN (32 chars) and OPENCLAW_GATEWAY_PASSWORD (16 chars) if left empty
Create private Space + Dataset and upload this repository
Configure Space Variables and secrets automatically, including:
- OPENCLAW_BACKUP_DATASET_REPO
- OPENCLAW_VERSION
- HF_TOKEN
- OPENCLAW_GATEWAY_TOKEN
- OPENCLAW_GATEWAY_PASSWORD
- OPENCLAW_GATEWAY_CONTROLUI_ALLOW_INSECURE_AUTH=false
- OPENCLAW_GATEWAY_CONTROLUI_DANGEROUSLY_DISABLE_DEVICE_AUTH=false
Optionally configure LLM triplet and set OPENCLAW_SSHX_AUTO_START from prompt choice (true/false)
Print planned deployment settings and require a final confirmation before creating/updating Space/Dataset resources
Print Hugging Face Space page URL, app URL, and /healthz

If gateway token/password were auto-generated, the script prints them at the end.

Agent Hand-off Prompt

Copy and send to your agent:

Please deploy OpenClaw to Hugging Face by strictly following the deployment skill in https://github.com/tenfyzhong/openclaw-hf/blob/main/SKILL.md

Hugging Face Keep-Alive

How to keep a Space available depends on hardware tier:

Free cpu-basic: the Space sleeps after inactivity (currently around 48h). It cannot be configured to run forever on free hardware.
Paid hardware: the Space runs continuously by default. In Settings -> Hardware, set Sleep time to Never (or use API with sleep_time=-1) for true 24/7 availability.
Cost-saving mode on paid hardware: set a custom Sleep time (for example 3600 seconds) so it auto-sleeps and auto-wakes on the next visit.

Space URL composition:

Space repo ID format: <owner>/<space_name> (example: tenfyzhong/openclaw-hf)
Public runtime host format: https://<owner>-<space_name>.hf.space
OpenClaw health check URL: https://<owner>-<space_name>.hf.space/healthz
Inside the Space runtime, Hugging Face also provides SPACE_HOST, so health URL can be built as https://${SPACE_HOST}/healthz.

Example:

OPENCLAW_HF_SPACE_ID="tenfyzhong/openclaw-hf"
SPACE_HOST="${OPENCLAW_HF_SPACE_ID/\//-}.hf.space"
HEALTH_URL="https://${SPACE_HOST}/healthz"
echo "$HEALTH_URL"

Keep-alive by periodic health checks:

*/12 * * * * HF_TOKEN=hf_xxx /path/to/repo/scripts/check-space-health.sh tenfyzhong/openclaw-hf >/dev/null || true

Notes:

For private Spaces, unauthenticated calls to https://<owner>-<space_name>.hf.space/healthz return a Hub 404 page. This is expected access control behavior.
For private Spaces, include Authorization: Bearer <HF_TOKEN> (the helper script above does this automatically via HF_TOKEN or HUGGINGFACE_HUB_TOKEN).
This ping strategy is a practical workaround for reducing idle sleep on free hardware, but it is not a guaranteed always-on method.
If you need strict 24/7 uptime, use paid hardware and set sleep time to Never.

References:

Programmatic options (owner token required):

from huggingface_hub import HfApi

api = HfApi(token="hf_xxx")
repo_id = "your-username/your-space"

# Keep running (paid hardware)
api.set_space_sleep_time(repo_id=repo_id, sleep_time=-1)

# Or sleep after 1 hour of inactivity
api.set_space_sleep_time(repo_id=repo_id, sleep_time=3600)

# Manual control
api.pause_space(repo_id=repo_id)
api.restart_space(repo_id=repo_id)

For this project, if you need stable dashboard access without cold starts, use paid hardware and set sleep time to Never.

SSH Service

The container has a comprehensive SSH service guarding system to ensure continuous availability:

Auto-start: Entrypoint generates host keys, cleans stale PID files, starts sshd
SSH Watchdog (ssh_service_watchdog.sh): Monitors sshd every 30s, auto-recovers on failure
Multi-level repair: Config corruption → backup config → minimal config → auto-reinstall openssh-server
Exponential backoff: Gradually increases wait time on consecutive failures
Health check (check_ssh_health.sh): Used by Docker HEALTHCHECK
SSH Agent auto-load: Auto-starts ssh-agent and loads keys from /root/.ssh/
Root password: Set via ROOT_PASSWORD environment variable

CCMR (Claude Code Model Router)

CCMR gateway is integrated and managed by Supervisord with hot-reload support:

Auto-config: Set CCMR_*_API_KEY env vars to enable
10 API Key slots: DeepSeek, Qwen, Kimi, GLM, MiniMax (CN/Global), MiMo (SGP/CN/AMS/PAYG)
File hot-reload: Edit /root/.env.d/ccmr.env and changes apply immediately without restart
Crash recovery: Supervisord auto-restarts CCMR process

Backup/Restore Flow

Restore

Automatic restore on startup (always runs on container restart/rebuild):

openclaw-state -> OPENCLAW_BACKUP_SOURCE_DIR (default /root/.openclaw)
root-config -> OPENCLAW_BACKUP_ROOT_CONFIG_DIR (default /root/.config)
root-codex -> OPENCLAW_BACKUP_ROOT_CODEX_DIR (default /root/.codex)
root-claude -> OPENCLAW_BACKUP_ROOT_CLAUDE_DIR (default /root/.claude)
root-agents -> OPENCLAW_BACKUP_ROOT_AGENTS_DIR (default /root/.agents)
root-ssh -> OPENCLAW_BACKUP_ROOT_SSH_DIR (default /root/.ssh)
root-env -> OPENCLAW_BACKUP_ROOT_ENV_DIR (default /root/.env.d)
root-npm -> OPENCLAW_BACKUP_ROOT_NPM_DIR (default /root/.npm)
root-lark-cli -> OPENCLAW_BACKUP_ROOT_LARK_CLI_DIR (default /root/.lark-cli)

Multi-dataset restore: set OPENCLAW_RESTORE_DATASET_REPO to restore from a different dataset.

Backup

Scheduled backup: Runs based on OPENCLAW_BACKUP_CRON (default every 10 min)
Incremental backup (default on): Only backs up changed files after a full backup
Dynamic strategy (default on): Auto-adjusts compression and splitting based on file size and change rate
AES-256-CBC encryption: Optional, allows secure storage on public datasets
Large file splitting: Default 500MB per volume, avoids upload failures
Resume support: Creates checkpoint files during upload, allows resume on interruption
Shutdown backup: Final backup before container exit on stop signal
Retention: Keeps newest OPENCLAW_BACKUP_KEEP_COUNT (default 24) archives, auto-deletes older ones

Backup Watchdog

openclaw-backup-watchdog.sh acts as the last line of defense:

Auto-triggers backup when no backup for MAX_BACKUP_AGE_MINUTES (default 30 min)
Force backup every FORCE_BACKUP_INTERVAL (default 4 hours)
File lock prevents concurrent execution
Automatic backoff on consecutive failures

Use sshx Inside the Container

sshx is preinstalled in the image.

Auto-start sshx in background via environment variables:

OPENCLAW_SSHX_AUTO_START=true

When enabled, entrypoint starts sshx in background and sends sshx output directly to container stdout/stderr logs (no file logging).

Manual start inside container:

sshx

Let OpenClaw start a process itself (run in OpenClaw terminal/tool):

nohup sshx >/proc/1/fd/1 2>/proc/1/fd/2 &

After use, close sshx process promptly:

pgrep -fa sshx
pkill -TERM -f '(^|/)sshx($| )'

Local Test

python3 -m unittest discover -s tests -p 'test_*.py'

Pull Requests to main run GitHub Actions CI automatically (.github/workflows/pr-ci.yml):

Unit tests: python3 -m unittest discover -s tests -p 'test_*.py'
Docker image build: docker build (via Buildx) with OPENCLAW_VERSION=latest

License

MIT. See LICENSE.