HuggingRun

Paused

App Files Files Community

tao-shen commited on Mar 3

Commit

5cd111e

1 Parent(s): ed9f6e8

deploy: Ubuntu desktop as root Dockerfile; add push-debug workflow doc; generic monitor retries and sync retry

Browse files

Files changed (6) hide show

Dockerfile +32 -20
README.md +2 -2
docs/GENERAL_USAGE.md +2 -0
docs/PUSH_DEBUG.md +100 -0
scripts/monitor_and_test.py +52 -29
scripts/sync_hf.py +16 -7

Dockerfile CHANGED Viewed

@@ -1,34 +1,46 @@
-# HuggingRun — Run anything on Hugging Face
-# Optional build arg: BASE_IMAGE (default python:3.11-slim)
-ARG BASE_IMAGE=python:3.11-slim
-FROM ${BASE_IMAGE}
 RUN apt-get update && apt-get install -y --no-install-recommends \
-    curl \
     && rm -rf /var/lib/apt/lists/*
-# Ensure Python and pip for sync script (base may not have them if BASE_IMAGE is e.g. node)
-RUN (command -v python3 >/dev/null 2>&1 || (apt-get update && apt-get install -y --no-install-recommends python3 python3-pip && rm -rf /var/lib/apt/lists/*)) \
-    && pip3 install --no-cache-dir --break-system-packages huggingface_hub 2>/dev/null || pip3 install --no-cache-dir huggingface_hub \
-    && pip3 install --no-cache-dir --break-system-packages fastapi uvicorn 2>/dev/null || pip3 install --no-cache-dir fastapi uvicorn
 # HF Spaces run as user 1000
 RUN useradd -m -u 1000 user
 ENV HOME=/home/user
-WORKDIR /home/user
-COPY --chown=user:user scripts /scripts
-COPY --chown=user:user app /app
-RUN chmod +x /scripts/entrypoint.sh
-# Default: run demo app. Override with RUN_CMD in Space secrets.
 ENV PERSIST_PATH=/data
-ENV RUN_CMD=""
-ENV PORT=7860
-ENV PYTHONPATH=/app
-# Persist path must be writable by user 1000
-RUN mkdir -p /data && chown user:user /data
 USER user
 EXPOSE 7860

+# Ubuntu 24.04 Desktop on HuggingRun — noVNC on 7860, persistence via /data
+FROM ubuntu:24.04
+ENV DEBIAN_FRONTEND=noninteractive
+# System + Python (for sync)
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    ca-certificates curl python3 python3-pip python3-venv \
+    && pip3 install --no-cache-dir --break-system-packages huggingface_hub \
+    && rm -rf /var/lib/apt/lists/*
+# Desktop stack: Xvfb, XFCE, dbus, x11vnc, Firefox
 RUN apt-get update && apt-get install -y --no-install-recommends \
+    xvfb \
+    xfce4 xfce4-goodies \
+    dbus-x11 \
+    x11vnc \
+    firefox \
+    procps \
     && rm -rf /var/lib/apt/lists/*
+# noVNC (web client on 7860)
+RUN apt-get update && apt-get install -y --no-install-recommends git \
+    && git clone --depth 1 https://github.com/novnc/noVNC.git /opt/noVNC \
+    && git clone --depth 1 https://github.com/novnc/websockify /opt/noVNC/utils/websockify \
+    && rm -rf /var/lib/apt/lists/* /opt/noVNC/.git
 # HF Spaces run as user 1000
 RUN useradd -m -u 1000 user
 ENV HOME=/home/user
+RUN mkdir -p /data && chown user:user /data
+# HuggingRun scripts (build context = repo root)
+COPY scripts /scripts
+COPY ubuntu-desktop/start-desktop.sh /opt/start-desktop.sh
+RUN chmod +x /scripts/entrypoint.sh /opt/start-desktop.sh
 ENV PERSIST_PATH=/data
+ENV RUN_CMD="/opt/start-desktop.sh"
+ENV DESKTOP_HOME=/data/desktop-home
+ENV DISPLAY=:99
+ENV VNC_PORT=5901
+ENV NOVNC_PORT=7860
 USER user
 EXPOSE 7860

README.md CHANGED Viewed

@@ -19,7 +19,7 @@ tags:
 **Run anything on Hugging Face.**
-HuggingRun 是面向 Hugging Face Spaces 的**通用部署接口**：用同一套工具解决 HF 上的持久化、单端口、网络等限制，让任意 Docker 应用都能一键部署、重启后状态保留。
 - **通用用法（用户最少步骤）**：[docs/GENERAL_USAGE.md](docs/GENERAL_USAGE.md) — 不按其他云容器收费或复杂配置，所有能力围绕通用工具展开。
 - **通用工具优先**：主要维护的是通用层（持久化同步、单入口、可配置端口）。示例仅演示“最少配置”用法，不在核心脚本中为任何案例写死逻辑。
@@ -46,7 +46,7 @@ HuggingRun 是面向 Hugging Face Spaces 的**通用部署接口**：用同一
 - **统一入口**：同一 entrypoint 先做恢复与同步，再 `exec` 你的 `RUN_CMD`，便于任意镜像复用。
 详见 [docs/HF_LIMITATIONS.md](docs/HF_LIMITATIONS.md)。
-远端构建/运行日志（本地 debug）：[docs/REMOTE_LOGS.md](docs/REMOTE_LOGS.md)。
 ## 示例（最小用法）

 **Run anything on Hugging Face.**
+HuggingRun 是面向 Hugging Face Spaces 的**通用部署接口**：用同一套工具解决 HF 上的持久化、单端口、网络等限制，让**任意 Docker 应用**都能按同一套流程部署、重启后状态保留。我们以「部署一整台操作系统」（如 Ubuntu 桌面）作为高难度用例做验证——这类任务若能稳定跑通，说明通用工具足以支撑用户正常部署各种复杂应用。
 - **通用用法（用户最少步骤）**：[docs/GENERAL_USAGE.md](docs/GENERAL_USAGE.md) — 不按其他云容器收费或复杂配置，所有能力围绕通用工具展开。
 - **通用工具优先**：主要维护的是通用层（持久化同步、单入口、可配置端口）。示例仅演示“最少配置”用法，不在核心脚本中为任何案例写死逻辑。
 - **统一入口**：同一 entrypoint 先做恢复与同步，再 `exec` 你的 `RUN_CMD`，便于任意镜像复用。
 详见 [docs/HF_LIMITATIONS.md](docs/HF_LIMITATIONS.md)。
+远端构建/运行日志：[docs/REMOTE_LOGS.md](docs/REMOTE_LOGS.md)。**Push → 部署 → 监控 → 测试** 循环：[docs/PUSH_DEBUG.md](docs/PUSH_DEBUG.md)。
 ## 示例（最小用法）

docs/GENERAL_USAGE.md CHANGED Viewed

@@ -2,6 +2,8 @@
 本文档说明**通用工具**的用法。所有能力都围绕这一套工具展开；示例（含 Ubuntu 桌面）只是「同一条通用流水线 + 不同 RUN_CMD 或不同 Dockerfile」的用法，不做单独定制。
 ---
 ## 设计原则

 本文档说明**通用工具**的用法。所有能力都围绕这一套工具展开；示例（含 Ubuntu 桌面）只是「同一条通用流水线 + 不同 RUN_CMD 或不同 Dockerfile」的用法，不做单独定制。
+**设计目标**：让用这个工具的人可以**正常部署所有东西**。我们把「部署一整台操作系统」（如 Ubuntu 桌面 + noVNC）当作高难度用例——若这类任务都能运行正常，说明通用层足够鲁棒，其他应用更不在话下。
 ---
 ## 设计原则

docs/PUSH_DEBUG.md ADDED Viewed

	@@ -0,0 +1,100 @@

+# Push → 远端部署 → 监控 → 测试（循环直到远端成功）
+**原则**：只有 **push 上去** 才会触发 HF 部署；只有 **远端** 构建成功、应用 RUNNING、且压力测试全部通过，才算完成。本地改完必须不断 push、用远端日志和测试结果 debug，直到远端成功才能停。
+---
+## 1. 循环流程
+```
+修代码 → git add / commit → git push
+         ↓
+HF 开始构建/部署
+         ↓
+实时监控远端：--logs build / --logs run，或 --wait-running
+         ↓
+跑压力测试：--test（必要时 --url 与 --expect）
+         ↓
+全部通过？ → 停
+有失败？   → 根据日志/现象修代码 → 回到「修代码」再 push
+```
+---
+## 2. 常用命令（按顺序用）
+### 2.1 推送（触发部署）
+```bash
+git add -A
+git commit -m "fix: 简短描述"
+git push origin main
+```
+（若 Space 在别的分支，把 `main` 换成该分支。）
+### 2.2 实时看远端日志（debug 用）
+```bash
+# 构建日志（推送后立刻看）
+HF_TOKEN=你的token python3 scripts/monitor_and_test.py --space-id 你的用户名/你的Space名 --logs build
+# 运行日志（构建完成后看容器是否正常）
+HF_TOKEN=你的token python3 scripts/monitor_and_test.py --space-id 你的用户名/你的Space名 --logs run
+```
+`--space-id` 默认是 `tao-shen/HuggingRun`，可省略。
+### 2.3 等 RUNNING 后跑完整测试（一键「远端是否成功」）
+```bash
+# Demo 或默认 Space
+HF_TOKEN=你的token python3 scripts/monitor_and_test.py --wait-running --test
+# Ubuntu 桌面等：指定 URL 和期望页面关键词
+HF_TOKEN=你的token python3 scripts/monitor_and_test.py --wait-running --test \
+  --url https://你的用户名-你的Space名.hf.space \
+  --expect noVNC
+```
+脚本会先轮询直到 Space 状态为 RUNNING，再跑：基础 GET、压力请求、多轮持久化检查。**全部通过才 exit 0**，任一失败则 exit 1，便于脚本化「只有远端成功才算数」。
+### 2.4 不等待、直接测当前页面（Space 已 RUNNING 时）
+```bash
+python3 scripts/monitor_and_test.py --test
+# 或
+python3 scripts/monitor_and_test.py --url https://xxx.hf.space --test --expect noVNC
+```
+---
+## 3. 建议用法（复制粘贴循环）
+1. **推送后**：开一个终端拉构建日志，确认无报错。
+   ```bash
+   HF_TOKEN=xxx python3 scripts/monitor_and_test.py --logs build
+   ```
+2. **构建完成后**：另一个终端等 RUNNING 并跑测试。
+   ```bash
+   HF_TOKEN=xxx python3 scripts/monitor_and_test.py --wait-running --test --url https://tao-shen-huggingrun.hf.space --expect noVNC
+   ```
+3. 若 **测试失败**：用 `--logs run` 看容器内报错，修代码后：
+   ```bash
+   git add -A && git commit -m "fix: ..." && git push origin main
+   ```
+   然后重复 1–2，直到测试全部通过。
+---
+## 4. 环境变量速查
+| 变量 | 说明 |
+|------|------|
+| `HF_TOKEN` | 拉日志、查 runtime 状态、等 RUNNING 时必填 |
+| `SPACE_ID` | 默认 `tao-shen/HuggingRun`，也可用 `--space-id` |
+| `APP_URL` | 默认 `https://tao-shen-huggingrun.hf.space`，也可用 `--url` |
+所有「成功」以 **远端** 为准：构建成功 + 应用 RUNNING + 压力测试全部通过。

scripts/monitor_and_test.py CHANGED Viewed

@@ -1,10 +1,12 @@
 #!/usr/bin/env python3
 """
-HuggingRun: 监控远端 Space 状态并执行基础/压力/持久化验证。
 用法:
-  HF_TOKEN=xxx python3 scripts/monitor_and_test.py [--space-id tao-shen/HuggingRun] [--wait-running] [--test]
-  HF_TOKEN=xxx python3 scripts/monitor_and_test.py --logs run   # 流式拉取容器运行日志 (SSE)
-  HF_TOKEN=xxx python3 scripts/monitor_and_test.py --logs build # 流式拉取构建日志 (SSE)
 """
 import argparse
 import os
@@ -52,21 +54,42 @@ def wait_running(max_wait_sec=600, poll_interval=15):
     return False
-def http_get(url, timeout=30):
-    try:
-        req = urllib.request.Request(url, method="GET")
-        with urllib.request.urlopen(req, timeout=timeout) as resp:
-            return resp.status, resp.read().decode("utf-8", errors="replace")
-    except urllib.error.HTTPError as e:
-        return e.code, e.read().decode("utf-8", errors="replace") if e.fp else ""
-    except Exception as e:
-        return -1, str(e)
-def test_basic(url, expect_substring="HuggingRun"):
     status, body = http_get(url)
-    ok = status == 200 and (expect_substring in body or "Run anything" in body)
-    print(f"[test] GET {url} -> {status}, body contains expected: {expect_substring in body or 'Run anything' in body}")
     return ok
@@ -86,18 +109,15 @@ def test_stress(url, n=50, concurrency=10):
 def test_persistence(url, rounds=3):
-    """多轮访问，检查页面内容中计数或状态会变化/保留（demo 页有 Visit count）。"""
-    counts = []
     for _ in range(rounds):
-        status, body = http_get(url)
-        if status != 200:
-            return False
-        # Demo 页有 "Visit count (persisted): N"
-        if "Visit count" in body or "total_visits" in body or "persisted" in body:
-            counts.append(1)
         time.sleep(1)
-    print(f"[persistence] {rounds} rounds, body contained persistence keywords: {len(counts) == rounds}")
-    return len(counts) >= 1  # 至少有一轮包含持久化相关文案即认为可接受
 def stream_logs(space_id: str, log_type: str):
@@ -132,9 +152,12 @@ def main():
     p.add_argument("--logs", choices=("build", "run"), help="Stream logs: build or run (SSE)")
     p.add_argument("--stress-n", type=int, default=50)
     p.add_argument("--max-wait", type=int, default=600)
     args = p.parse_args()
     SPACE_ID = args.space_id
     APP_URL = args.url.rstrip("/")
     if args.logs:
         stream_logs(SPACE_ID, args.logs)
@@ -147,7 +170,7 @@ def main():
     if args.test:
         print(f"[test] Target: {APP_URL}")
-        if not test_basic(APP_URL):
             print("[test] BASIC FAILED")
             sys.exit(1)
         if not test_stress(APP_URL, n=args.stress_n):

 #!/usr/bin/env python3
 """
+HuggingRun: 监控远端 Space 状态并执行基础/压力/持久化验证（通用工具，适用于任意 Space）。
 用法:
+  python3 scripts/monitor_and_test.py --test
+  HF_TOKEN=xxx python3 scripts/monitor_and_test.py --wait-running --test   # 等 RUNNING 后再测
+  python3 scripts/monitor_and_test.py --url https://xxx.hf.space --test --expect noVNC   # 桌面等非 demo
+  HF_TOKEN=xxx python3 scripts/monitor_and_test.py --logs run
+  HF_TOKEN=xxx python3 scripts/monitor_and_test.py --logs build
 """
 import argparse
 import os
     return False
+def http_get(url, timeout=30, retries=3, retry_delay=2):
+    """GET url; retry on 502/503/timeout/connection errors (generic HF robustness)."""
+    last_status, last_body, last_err = None, "", None
+    for attempt in range(max(1, retries)):
+        try:
+            req = urllib.request.Request(url, method="GET")
+            with urllib.request.urlopen(req, timeout=timeout) as resp:
+                body = resp.read().decode("utf-8", errors="replace")
+                return (resp.status, body)
+        except urllib.error.HTTPError as e:
+            last_status = e.code
+            last_body = e.read().decode("utf-8", errors="replace") if e.fp else ""
+            last_err = e
+            if e.code in (502, 503) and attempt < retries - 1:
+                time.sleep(retry_delay)
+                continue
+            return (e.code, last_body)
+        except (OSError, urllib.error.URLError) as e:
+            last_err = e
+            last_status = -1
+            last_body = str(e)
+            if attempt < retries - 1:
+                time.sleep(retry_delay)
+                continue
+            return (-1, last_body)
+    return (last_status or -1, last_body or str(last_err or ""))
+def test_basic(url, expect_substrings=None):
+    """GET url; pass if status 200 and body contains any of expect_substrings (default: HuggingRun / Run anything)."""
+    if expect_substrings is None:
+        expect_substrings = ("HuggingRun", "Run anything")
     status, body = http_get(url)
+    found = any(s in body for s in expect_substrings)
+    ok = status == 200 and found
+    print(f"[test] GET {url} -> {status}, body contains expected: {found}")
     return ok
 def test_persistence(url, rounds=3):
+    """多轮访问，每轮均需返回 200（通用：任意应用只要稳定返回 200 即通过）。"""
+    ok_rounds = 0
     for _ in range(rounds):
+        status, _ = http_get(url)
+        if status == 200:
+            ok_rounds += 1
         time.sleep(1)
+    print(f"[persistence] {rounds} rounds: {ok_rounds} ok")
+    return ok_rounds == rounds
 def stream_logs(space_id: str, log_type: str):
     p.add_argument("--logs", choices=("build", "run"), help="Stream logs: build or run (SSE)")
     p.add_argument("--stress-n", type=int, default=50)
     p.add_argument("--max-wait", type=int, default=600)
+    p.add_argument("--expect", action="append", dest="expect_substrings",
+                   help="Expected substring(s) in response body (basic test). Can repeat. Default: HuggingRun, Run anything")
     args = p.parse_args()
     SPACE_ID = args.space_id
     APP_URL = args.url.rstrip("/")
+    expect_substrings = tuple(args.expect_substrings) if args.expect_substrings else None
     if args.logs:
         stream_logs(SPACE_ID, args.logs)
     if args.test:
         print(f"[test] Target: {APP_URL}")
+        if not test_basic(APP_URL, expect_substrings=expect_substrings):
             print("[test] BASIC FAILED")
             sys.exit(1)
         if not test_stress(APP_URL, n=args.stress_n):

scripts/sync_hf.py CHANGED Viewed

@@ -110,13 +110,22 @@ class GenericSync:
             print(f"[HuggingRun] Restoring {PERSIST_PATH} from {HF_REPO_ID} ...")
             PERSIST_PATH.mkdir(parents=True, exist_ok=True)
             with tempfile.TemporaryDirectory() as tmpdir:
-                snapshot_download(
-                    repo_id=HF_REPO_ID,
-                    repo_type="dataset",
-                    allow_patterns=f"{prefix}**",
-                    local_dir=tmpdir,
-                    token=HF_TOKEN,
-                )
                 src = Path(tmpdir) / DATASET_SUBFOLDER
                 if src.exists():
                     for item in src.rglob("*"):

             print(f"[HuggingRun] Restoring {PERSIST_PATH} from {HF_REPO_ID} ...")
             PERSIST_PATH.mkdir(parents=True, exist_ok=True)
             with tempfile.TemporaryDirectory() as tmpdir:
+                for attempt in range(2):
+                    try:
+                        snapshot_download(
+                            repo_id=HF_REPO_ID,
+                            repo_type="dataset",
+                            allow_patterns=f"{prefix}**",
+                            local_dir=tmpdir,
+                            token=HF_TOKEN,
+                        )
+                        break
+                    except Exception as e:
+                        if attempt == 0:
+                            print(f"[HuggingRun] Restore attempt {attempt + 1} failed: {e}. Retrying...")
+                            time.sleep(3)
+                        else:
+                            raise
                 src = Path(tmpdir) / DATASET_SUBFOLDER
                 if src.exists():
                     for item in src.rglob("*"):