tao-shen commited on
Commit
5cd111e
·
1 Parent(s): ed9f6e8

deploy: Ubuntu desktop as root Dockerfile; add push-debug workflow doc; generic monitor retries and sync retry

Browse files
Dockerfile CHANGED
@@ -1,34 +1,46 @@
1
- # HuggingRun — Run anything on Hugging Face
2
- # Optional build arg: BASE_IMAGE (default python:3.11-slim)
3
- ARG BASE_IMAGE=python:3.11-slim
4
- FROM ${BASE_IMAGE}
5
 
 
 
 
 
 
 
 
 
 
6
  RUN apt-get update && apt-get install -y --no-install-recommends \
7
- curl \
 
 
 
 
 
8
  && rm -rf /var/lib/apt/lists/*
9
 
10
- # Ensure Python and pip for sync script (base may not have them if BASE_IMAGE is e.g. node)
11
- RUN (command -v python3 >/dev/null 2>&1 || (apt-get update && apt-get install -y --no-install-recommends python3 python3-pip && rm -rf /var/lib/apt/lists/*)) \
12
- && pip3 install --no-cache-dir --break-system-packages huggingface_hub 2>/dev/null || pip3 install --no-cache-dir huggingface_hub \
13
- && pip3 install --no-cache-dir --break-system-packages fastapi uvicorn 2>/dev/null || pip3 install --no-cache-dir fastapi uvicorn
 
14
 
15
  # HF Spaces run as user 1000
16
  RUN useradd -m -u 1000 user
17
  ENV HOME=/home/user
18
- WORKDIR /home/user
19
 
20
- COPY --chown=user:user scripts /scripts
21
- COPY --chown=user:user app /app
22
- RUN chmod +x /scripts/entrypoint.sh
 
23
 
24
- # Default: run demo app. Override with RUN_CMD in Space secrets.
25
  ENV PERSIST_PATH=/data
26
- ENV RUN_CMD=""
27
- ENV PORT=7860
28
- ENV PYTHONPATH=/app
29
-
30
- # Persist path must be writable by user 1000
31
- RUN mkdir -p /data && chown user:user /data
32
 
33
  USER user
34
  EXPOSE 7860
 
1
+ # Ubuntu 24.04 Desktop on HuggingRun — noVNC on 7860, persistence via /data
2
+ FROM ubuntu:24.04
 
 
3
 
4
+ ENV DEBIAN_FRONTEND=noninteractive
5
+
6
+ # System + Python (for sync)
7
+ RUN apt-get update && apt-get install -y --no-install-recommends \
8
+ ca-certificates curl python3 python3-pip python3-venv \
9
+ && pip3 install --no-cache-dir --break-system-packages huggingface_hub \
10
+ && rm -rf /var/lib/apt/lists/*
11
+
12
+ # Desktop stack: Xvfb, XFCE, dbus, x11vnc, Firefox
13
  RUN apt-get update && apt-get install -y --no-install-recommends \
14
+ xvfb \
15
+ xfce4 xfce4-goodies \
16
+ dbus-x11 \
17
+ x11vnc \
18
+ firefox \
19
+ procps \
20
  && rm -rf /var/lib/apt/lists/*
21
 
22
+ # noVNC (web client on 7860)
23
+ RUN apt-get update && apt-get install -y --no-install-recommends git \
24
+ && git clone --depth 1 https://github.com/novnc/noVNC.git /opt/noVNC \
25
+ && git clone --depth 1 https://github.com/novnc/websockify /opt/noVNC/utils/websockify \
26
+ && rm -rf /var/lib/apt/lists/* /opt/noVNC/.git
27
 
28
  # HF Spaces run as user 1000
29
  RUN useradd -m -u 1000 user
30
  ENV HOME=/home/user
31
+ RUN mkdir -p /data && chown user:user /data
32
 
33
+ # HuggingRun scripts (build context = repo root)
34
+ COPY scripts /scripts
35
+ COPY ubuntu-desktop/start-desktop.sh /opt/start-desktop.sh
36
+ RUN chmod +x /scripts/entrypoint.sh /opt/start-desktop.sh
37
 
 
38
  ENV PERSIST_PATH=/data
39
+ ENV RUN_CMD="/opt/start-desktop.sh"
40
+ ENV DESKTOP_HOME=/data/desktop-home
41
+ ENV DISPLAY=:99
42
+ ENV VNC_PORT=5901
43
+ ENV NOVNC_PORT=7860
 
44
 
45
  USER user
46
  EXPOSE 7860
README.md CHANGED
@@ -19,7 +19,7 @@ tags:
19
 
20
  **Run anything on Hugging Face.**
21
 
22
- HuggingRun 是面向 Hugging Face Spaces 的**通用部署接口**:用同一套工具解决 HF 上的持久化、单端口、网络等限制,让任意 Docker 应用都能一部署、重启后状态保留。
23
 
24
  - **通用用法(用户最少步骤)**:[docs/GENERAL_USAGE.md](docs/GENERAL_USAGE.md) — 不按其他云容器收费或复杂配置,所有能力围绕通用工具展开。
25
  - **通用工具优先**:主要维护的是通用层(持久化同步、单入口、可配置端口)。示例仅演示“最少配置”用法,不在核心脚本中为任何案例写死逻辑。
@@ -46,7 +46,7 @@ HuggingRun 是面向 Hugging Face Spaces 的**通用部署接口**:用同一
46
  - **统一入口**:同一 entrypoint 先做恢复与同步,再 `exec` 你的 `RUN_CMD`,便于任意镜像复用。
47
 
48
  详见 [docs/HF_LIMITATIONS.md](docs/HF_LIMITATIONS.md)。
49
- 远端构建/运行日志(本地 debug):[docs/REMOTE_LOGS.md](docs/REMOTE_LOGS.md)。
50
 
51
  ## 示例(最小用法)
52
 
 
19
 
20
  **Run anything on Hugging Face.**
21
 
22
+ HuggingRun 是面向 Hugging Face Spaces 的**通用部署接口**:用同一套工具解决 HF 上的持久化、单端口、网络等限制,让**任意 Docker 应用**都能按同套流程部署、重启后状态保留。我们以「部署一整台操作系统」(如 Ubuntu 桌面)作为高难度用例做验证——这类任务若能稳定跑通,说明通用工具足以支撑用户正常部署各种复杂应用。
23
 
24
  - **通用用法(用户最少步骤)**:[docs/GENERAL_USAGE.md](docs/GENERAL_USAGE.md) — 不按其他云容器收费或复杂配置,所有能力围绕通用工具展开。
25
  - **通用工具优先**:主要维护的是通用层(持久化同步、单入口、可配置端口)。示例仅演示“最少配置”用法,不在核心脚本中为任何案例写死逻辑。
 
46
  - **统一入口**:同一 entrypoint 先做恢复与同步,再 `exec` 你的 `RUN_CMD`,便于任意镜像复用。
47
 
48
  详见 [docs/HF_LIMITATIONS.md](docs/HF_LIMITATIONS.md)。
49
+ 远端构建/运行日志:[docs/REMOTE_LOGS.md](docs/REMOTE_LOGS.md)。**Push → 部署 → 监控 → 测试** 循环:[docs/PUSH_DEBUG.md](docs/PUSH_DEBUG.md)。
50
 
51
  ## 示例(最小用法)
52
 
docs/GENERAL_USAGE.md CHANGED
@@ -2,6 +2,8 @@
2
 
3
  本文档说明**通用工具**的用法。所有能力都围绕这一套工具展开;示例(含 Ubuntu 桌面)只是「同一条通用流水线 + 不同 RUN_CMD 或不同 Dockerfile」的用法,不做单独定制。
4
 
 
 
5
  ---
6
 
7
  ## 设计原则
 
2
 
3
  本文档说明**通用工具**的用法。所有能力都围绕这一套工具展开;示例(含 Ubuntu 桌面)只是「同一条通用流水线 + 不同 RUN_CMD 或不同 Dockerfile」的用法,不做单独定制。
4
 
5
+ **设计目标**:让用这个工具的人可以**正常部署所有东西**。我们把「部署一整台操作系统」(如 Ubuntu 桌面 + noVNC)当作高难度用例——若这类任务都能运行正常,说明通用层足够鲁棒,其他应用更不在话下。
6
+
7
  ---
8
 
9
  ## 设计原则
docs/PUSH_DEBUG.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Push → 远端部署 → 监控 → 测试(循环直到远端成功)
2
+
3
+ **原则**:只有 **push 上去** 才会触发 HF 部署;只有 **远端** 构建成功、应用 RUNNING、且压力测试全部通过,才算完成。本地改完必须不断 push、用远端日志和测试结果 debug,直到远端成功才能停。
4
+
5
+ ---
6
+
7
+ ## 1. 循环流程
8
+
9
+ ```
10
+ 修代码 → git add / commit → git push
11
+
12
+ HF 开始构建/部署
13
+
14
+ 实时监控远端:--logs build / --logs run,或 --wait-running
15
+
16
+ 跑压力测试:--test(必要时 --url 与 --expect)
17
+
18
+ 全部通过? → 停
19
+ 有失败? → 根据日志/现象修代码 → 回到「修代码」再 push
20
+ ```
21
+
22
+ ---
23
+
24
+ ## 2. 常用命令(按顺序用)
25
+
26
+ ### 2.1 推送(触发部署)
27
+
28
+ ```bash
29
+ git add -A
30
+ git commit -m "fix: 简短描述"
31
+ git push origin main
32
+ ```
33
+
34
+ (若 Space 在别的分支,把 `main` 换成该分支。)
35
+
36
+ ### 2.2 实时看远端日志(debug 用)
37
+
38
+ ```bash
39
+ # 构建日志(推送后立刻看)
40
+ HF_TOKEN=你的token python3 scripts/monitor_and_test.py --space-id 你的用户名/你的Space名 --logs build
41
+
42
+ # 运行日志(构建完成后看容器是否正常)
43
+ HF_TOKEN=你的token python3 scripts/monitor_and_test.py --space-id 你的用户名/你的Space名 --logs run
44
+ ```
45
+
46
+ `--space-id` 默认是 `tao-shen/HuggingRun`,可省略。
47
+
48
+ ### 2.3 等 RUNNING 后跑完整测试(一键「远端是否成功」)
49
+
50
+ ```bash
51
+ # Demo 或默认 Space
52
+ HF_TOKEN=你的token python3 scripts/monitor_and_test.py --wait-running --test
53
+
54
+ # Ubuntu 桌面等:指定 URL 和期望页面关键词
55
+ HF_TOKEN=你的token python3 scripts/monitor_and_test.py --wait-running --test \
56
+ --url https://你的用户名-你的Space名.hf.space \
57
+ --expect noVNC
58
+ ```
59
+
60
+ 脚本会先轮询直到 Space 状态为 RUNNING,再跑:基础 GET、压力请求、多轮持久化检查。**全部通过才 exit 0**,任一失败则 exit 1,便于脚本化「只有远端成功才算数」。
61
+
62
+ ### 2.4 不等待、直接测当前页面(Space 已 RUNNING 时)
63
+
64
+ ```bash
65
+ python3 scripts/monitor_and_test.py --test
66
+ # 或
67
+ python3 scripts/monitor_and_test.py --url https://xxx.hf.space --test --expect noVNC
68
+ ```
69
+
70
+ ---
71
+
72
+ ## 3. 建议用法(复制粘贴循环)
73
+
74
+ 1. **推送后**:开一个终端拉构建日志,确认无报错。
75
+ ```bash
76
+ HF_TOKEN=xxx python3 scripts/monitor_and_test.py --logs build
77
+ ```
78
+
79
+ 2. **构建完成后**:另一个终端等 RUNNING 并跑测试。
80
+ ```bash
81
+ HF_TOKEN=xxx python3 scripts/monitor_and_test.py --wait-running --test --url https://tao-shen-huggingrun.hf.space --expect noVNC
82
+ ```
83
+
84
+ 3. 若 **测试失败**:用 `--logs run` 看容器内报错,修代码后:
85
+ ```bash
86
+ git add -A && git commit -m "fix: ..." && git push origin main
87
+ ```
88
+ 然后重复 1–2,直到测试全部通过。
89
+
90
+ ---
91
+
92
+ ## 4. 环境变量速查
93
+
94
+ | 变量 | 说明 |
95
+ |------|------|
96
+ | `HF_TOKEN` | 拉日志、查 runtime 状态、等 RUNNING 时必填 |
97
+ | `SPACE_ID` | 默认 `tao-shen/HuggingRun`,也可用 `--space-id` |
98
+ | `APP_URL` | 默认 `https://tao-shen-huggingrun.hf.space`,也可用 `--url` |
99
+
100
+ 所有「成功」以 **远端** 为准:构建成功 + 应用 RUNNING + 压力测试全部通过。
scripts/monitor_and_test.py CHANGED
@@ -1,10 +1,12 @@
1
  #!/usr/bin/env python3
2
  """
3
- HuggingRun: 监控远端 Space 状态并执行基础/压力/持久化验证。
4
  用法:
5
- HF_TOKEN=xxx python3 scripts/monitor_and_test.py [--space-id tao-shen/HuggingRun] [--wait-running] [--test]
6
- HF_TOKEN=xxx python3 scripts/monitor_and_test.py --logs run # 流式拉取容器运行日志 (SSE)
7
- HF_TOKEN=xxx python3 scripts/monitor_and_test.py --logs build # 流式拉取构建日志 (SSE)
 
 
8
  """
9
  import argparse
10
  import os
@@ -52,21 +54,42 @@ def wait_running(max_wait_sec=600, poll_interval=15):
52
  return False
53
 
54
 
55
- def http_get(url, timeout=30):
56
- try:
57
- req = urllib.request.Request(url, method="GET")
58
- with urllib.request.urlopen(req, timeout=timeout) as resp:
59
- return resp.status, resp.read().decode("utf-8", errors="replace")
60
- except urllib.error.HTTPError as e:
61
- return e.code, e.read().decode("utf-8", errors="replace") if e.fp else ""
62
- except Exception as e:
63
- return -1, str(e)
64
-
65
-
66
- def test_basic(url, expect_substring="HuggingRun"):
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
  status, body = http_get(url)
68
- ok = status == 200 and (expect_substring in body or "Run anything" in body)
69
- print(f"[test] GET {url} -> {status}, body contains expected: {expect_substring in body or 'Run anything' in body}")
 
70
  return ok
71
 
72
 
@@ -86,18 +109,15 @@ def test_stress(url, n=50, concurrency=10):
86
 
87
 
88
  def test_persistence(url, rounds=3):
89
- """多轮访问,检查页面内容中计数或状态会变化/保留demo 页有 Visit count)。"""
90
- counts = []
91
  for _ in range(rounds):
92
- status, body = http_get(url)
93
- if status != 200:
94
- return False
95
- # Demo 页有 "Visit count (persisted): N"
96
- if "Visit count" in body or "total_visits" in body or "persisted" in body:
97
- counts.append(1)
98
  time.sleep(1)
99
- print(f"[persistence] {rounds} rounds, body contained persistence keywords: {len(counts) == rounds}")
100
- return len(counts) >= 1 # 至少有一轮包含持久化相关文案即认为可接受
101
 
102
 
103
  def stream_logs(space_id: str, log_type: str):
@@ -132,9 +152,12 @@ def main():
132
  p.add_argument("--logs", choices=("build", "run"), help="Stream logs: build or run (SSE)")
133
  p.add_argument("--stress-n", type=int, default=50)
134
  p.add_argument("--max-wait", type=int, default=600)
 
 
135
  args = p.parse_args()
136
  SPACE_ID = args.space_id
137
  APP_URL = args.url.rstrip("/")
 
138
 
139
  if args.logs:
140
  stream_logs(SPACE_ID, args.logs)
@@ -147,7 +170,7 @@ def main():
147
 
148
  if args.test:
149
  print(f"[test] Target: {APP_URL}")
150
- if not test_basic(APP_URL):
151
  print("[test] BASIC FAILED")
152
  sys.exit(1)
153
  if not test_stress(APP_URL, n=args.stress_n):
 
1
  #!/usr/bin/env python3
2
  """
3
+ HuggingRun: 监控远端 Space 状态并执行基础/压力/持久化验证(通用工具,适用于任意 Space)
4
  用法:
5
+ python3 scripts/monitor_and_test.py --test
6
+ HF_TOKEN=xxx python3 scripts/monitor_and_test.py --wait-running --test # RUNNING 后再测
7
+ python3 scripts/monitor_and_test.py --url https://xxx.hf.space --test --expect noVNC # 桌面等非 demo
8
+ HF_TOKEN=xxx python3 scripts/monitor_and_test.py --logs run
9
+ HF_TOKEN=xxx python3 scripts/monitor_and_test.py --logs build
10
  """
11
  import argparse
12
  import os
 
54
  return False
55
 
56
 
57
+ def http_get(url, timeout=30, retries=3, retry_delay=2):
58
+ """GET url; retry on 502/503/timeout/connection errors (generic HF robustness)."""
59
+ last_status, last_body, last_err = None, "", None
60
+ for attempt in range(max(1, retries)):
61
+ try:
62
+ req = urllib.request.Request(url, method="GET")
63
+ with urllib.request.urlopen(req, timeout=timeout) as resp:
64
+ body = resp.read().decode("utf-8", errors="replace")
65
+ return (resp.status, body)
66
+ except urllib.error.HTTPError as e:
67
+ last_status = e.code
68
+ last_body = e.read().decode("utf-8", errors="replace") if e.fp else ""
69
+ last_err = e
70
+ if e.code in (502, 503) and attempt < retries - 1:
71
+ time.sleep(retry_delay)
72
+ continue
73
+ return (e.code, last_body)
74
+ except (OSError, urllib.error.URLError) as e:
75
+ last_err = e
76
+ last_status = -1
77
+ last_body = str(e)
78
+ if attempt < retries - 1:
79
+ time.sleep(retry_delay)
80
+ continue
81
+ return (-1, last_body)
82
+ return (last_status or -1, last_body or str(last_err or ""))
83
+
84
+
85
+ def test_basic(url, expect_substrings=None):
86
+ """GET url; pass if status 200 and body contains any of expect_substrings (default: HuggingRun / Run anything)."""
87
+ if expect_substrings is None:
88
+ expect_substrings = ("HuggingRun", "Run anything")
89
  status, body = http_get(url)
90
+ found = any(s in body for s in expect_substrings)
91
+ ok = status == 200 and found
92
+ print(f"[test] GET {url} -> {status}, body contains expected: {found}")
93
  return ok
94
 
95
 
 
109
 
110
 
111
  def test_persistence(url, rounds=3):
112
+ """多轮访问,每轮均需返回 200通用:任意应用只要稳定返回 200 即通过)。"""
113
+ ok_rounds = 0
114
  for _ in range(rounds):
115
+ status, _ = http_get(url)
116
+ if status == 200:
117
+ ok_rounds += 1
 
 
 
118
  time.sleep(1)
119
+ print(f"[persistence] {rounds} rounds: {ok_rounds} ok")
120
+ return ok_rounds == rounds
121
 
122
 
123
  def stream_logs(space_id: str, log_type: str):
 
152
  p.add_argument("--logs", choices=("build", "run"), help="Stream logs: build or run (SSE)")
153
  p.add_argument("--stress-n", type=int, default=50)
154
  p.add_argument("--max-wait", type=int, default=600)
155
+ p.add_argument("--expect", action="append", dest="expect_substrings",
156
+ help="Expected substring(s) in response body (basic test). Can repeat. Default: HuggingRun, Run anything")
157
  args = p.parse_args()
158
  SPACE_ID = args.space_id
159
  APP_URL = args.url.rstrip("/")
160
+ expect_substrings = tuple(args.expect_substrings) if args.expect_substrings else None
161
 
162
  if args.logs:
163
  stream_logs(SPACE_ID, args.logs)
 
170
 
171
  if args.test:
172
  print(f"[test] Target: {APP_URL}")
173
+ if not test_basic(APP_URL, expect_substrings=expect_substrings):
174
  print("[test] BASIC FAILED")
175
  sys.exit(1)
176
  if not test_stress(APP_URL, n=args.stress_n):
scripts/sync_hf.py CHANGED
@@ -110,13 +110,22 @@ class GenericSync:
110
  print(f"[HuggingRun] Restoring {PERSIST_PATH} from {HF_REPO_ID} ...")
111
  PERSIST_PATH.mkdir(parents=True, exist_ok=True)
112
  with tempfile.TemporaryDirectory() as tmpdir:
113
- snapshot_download(
114
- repo_id=HF_REPO_ID,
115
- repo_type="dataset",
116
- allow_patterns=f"{prefix}**",
117
- local_dir=tmpdir,
118
- token=HF_TOKEN,
119
- )
 
 
 
 
 
 
 
 
 
120
  src = Path(tmpdir) / DATASET_SUBFOLDER
121
  if src.exists():
122
  for item in src.rglob("*"):
 
110
  print(f"[HuggingRun] Restoring {PERSIST_PATH} from {HF_REPO_ID} ...")
111
  PERSIST_PATH.mkdir(parents=True, exist_ok=True)
112
  with tempfile.TemporaryDirectory() as tmpdir:
113
+ for attempt in range(2):
114
+ try:
115
+ snapshot_download(
116
+ repo_id=HF_REPO_ID,
117
+ repo_type="dataset",
118
+ allow_patterns=f"{prefix}**",
119
+ local_dir=tmpdir,
120
+ token=HF_TOKEN,
121
+ )
122
+ break
123
+ except Exception as e:
124
+ if attempt == 0:
125
+ print(f"[HuggingRun] Restore attempt {attempt + 1} failed: {e}. Retrying...")
126
+ time.sleep(3)
127
+ else:
128
+ raise
129
  src = Path(tmpdir) / DATASET_SUBFOLDER
130
  if src.exists():
131
  for item in src.rglob("*"):