tao-shen Claude Opus 4.6 commited on
Commit
1b35906
·
1 Parent(s): 83a0241

clean: remove all VNC/desktop files and references

Browse files

Remove ubuntu-desktop/, Dockerfile.ubuntu-desktop, desktop design docs,
and all VNC/noVNC/XFCE references from README and docs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Dockerfile.ubuntu-desktop DELETED
@@ -1,59 +0,0 @@
1
- # Ubuntu 24.04 Desktop on HuggingRun — noVNC on 7860, SSH on 2222, persistence via /data
2
- FROM ubuntu:24.04
3
-
4
- ENV DEBIAN_FRONTEND=noninteractive
5
-
6
- # System + Python (for sync)
7
- RUN apt-get update && apt-get install -y --no-install-recommends \
8
- ca-certificates curl python3 python3-pip python3-venv \
9
- && pip3 install --no-cache-dir --break-system-packages huggingface_hub \
10
- && rm -rf /var/lib/apt/lists/*
11
-
12
- # Desktop stack: Xvfb, XFCE, dbus, x11vnc, Firefox; OpenSSH for local/reverse SSH
13
- RUN apt-get update && apt-get install -y --no-install-recommends \
14
- xvfb \
15
- xfce4 xfce4-goodies \
16
- dbus-x11 \
17
- x11vnc \
18
- firefox \
19
- procps \
20
- openssh-server openssh-client \
21
- && rm -rf /var/lib/apt/lists/*
22
-
23
- # noVNC (web client on 7860)
24
- RUN apt-get update && apt-get install -y --no-install-recommends git \
25
- && git clone --depth 1 https://github.com/novnc/noVNC.git /opt/noVNC \
26
- && git clone --depth 1 https://github.com/novnc/websockify /opt/noVNC/utils/websockify \
27
- && rm -rf /var/lib/apt/lists/* /opt/noVNC/.git
28
-
29
- # HF Spaces run as user 1000; UID 1000 may exist (e.g. ubuntu)
30
- RUN (useradd -m -u 1000 user 2>/dev/null) || \
31
- (EXISTING=$(getent passwd 1000 | cut -d: -f1); \
32
- usermod -l user $EXISTING; usermod -d /home/user user; \
33
- mkdir -p /home/user && chown 1000:1000 /home/user)
34
- ENV HOME=/home/user
35
- RUN mkdir -p /data && chown user:user /data
36
-
37
- # Pre-generate SSH host key so sshd can start without root
38
- RUN mkdir -p /home/user/.ssh && \
39
- ssh-keygen -t ed25519 -f /home/user/.ssh/ssh_host_ed25519_key -N "" -C "" && \
40
- chown -R 1000:1000 /home/user/.ssh
41
-
42
- # HuggingRun scripts (build context = repo root)
43
- COPY scripts /scripts
44
- COPY ubuntu-desktop/start-desktop.sh /opt/start-desktop.sh
45
- RUN chmod +x /scripts/entrypoint.sh /opt/start-desktop.sh
46
-
47
- ENV PERSIST_PATH=/data
48
- ENV RUN_CMD="/opt/start-desktop.sh"
49
- ENV DESKTOP_HOME=/data/desktop-home
50
- ENV DISPLAY=:99
51
- ENV VNC_PORT=5901
52
- ENV NOVNC_PORT=7860
53
- # SSH_LISTEN: 0.0.0.0 for local Docker testing, 127.0.0.1 for HF (reverse SSH only)
54
- ENV SSH_LISTEN=0.0.0.0
55
- ENV SSH_PORT=2222
56
-
57
- USER user
58
- EXPOSE 7860 2222
59
- ENTRYPOINT ["/scripts/entrypoint.sh"]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
README.md CHANGED
@@ -19,7 +19,7 @@ tags:
19
 
20
  **Run anything on Hugging Face.**
21
 
22
- HuggingRun 是面向 Hugging Face Spaces 的**通用部署接口**:用同一套工具解决 HF 上的持久化、单端口、网络等限制,让**任意 Docker 应用**都能按同一套流程部署、重启后状态保留。我们以「部署一整台操作系统」(如 Ubuntu 桌面)作为高难度用例做验证——这类任务若能稳定跑通,说明通用工具足以支撑用户正常部署各种复杂应用。
23
 
24
  - **通用用法(用户最少步骤)**:[docs/GENERAL_USAGE.md](docs/GENERAL_USAGE.md) — 不按其他云容器收费或复杂配置,所有能力围绕通用工具展开。
25
  - **通用工具优先**:主要维护的是通用层(持久化同步、单入口、可配置端口)。示例仅演示“最少配置”用法,不在核心脚本中为任何案例写死逻辑。
@@ -64,10 +64,9 @@ HuggingRun 是面向 Hugging Face Spaces 的**通用部署接口**:用同一
64
 
65
  (可选)设 `HF_TOKEN` 与 `AUTO_CREATE_DATASET=true`,重启后 SQLite 数据仍在。
66
 
67
- ### Ubuntu 桌面noVNC
68
 
69
- **仅作为通用工具示例**:使用同一套 `scripts/`(同步 + entrypoint)通过不同 Dockerfile 设置 `RUN_CMD=/opt/start-desktop.sh`
70
- 用法:Duplicate 本 Space 后,用 [ubuntu-desktop/Dockerfile](ubuntu-desktop/Dockerfile) 的内容**替换**仓库根目录的 `Dockerfile`,保存后构建即可;无需改通用脚本。详见 [ubuntu-desktop/README.md](ubuntu-desktop/README.md)。
71
 
72
  ## 环境变量速查
73
 
 
19
 
20
  **Run anything on Hugging Face.**
21
 
22
+ HuggingRun 是面向 Hugging Face Spaces 的**通用部署接口**:用同一套工具解决 HF 上的持久化、单端口、网络等限制,让**任意 Docker 应用**都能按同一套流程部署、重启后状态保留。
23
 
24
  - **通用用法(用户最少步骤)**:[docs/GENERAL_USAGE.md](docs/GENERAL_USAGE.md) — 不按其他云容器收费或复杂配置,所有能力围绕通用工具展开。
25
  - **通用工具优先**:主要维护的是通用层(持久化同步、单入口、可配置端口)。示例仅演示“最少配置”用法,不在核心脚本中为任何案例写死逻辑。
 
64
 
65
  (可选)设 `HF_TOKEN` 与 `AUTO_CREATE_DATASET=true`,重启后 SQLite 数据仍在。
66
 
67
+ ### Ubuntu ServerWeb Terminal + SSH
68
 
69
+ 使用同一套 `scripts/`,通过 ttyd 提供浏览器 Web Terminal,nginx 反代 + WebSocket-SSH 桥接支持远程 SSH 登录全盘持久化:整个文件系统镜像同步到 HF Dataset。
 
70
 
71
  ## 环境变量速查
72
 
docs/GENERAL_USAGE.md CHANGED
@@ -1,8 +1,8 @@
1
  # HuggingRun 通用用法
2
 
3
- 本文档说明**通用工具**的用法。所有能力都围绕这一套工具展开;示例(含 Ubuntu 桌面)只是「同一条通用流水线 + 不同 RUN_CMD 或不同 Dockerfile」的用法,不做单独定制。
4
 
5
- **设计目标**:让用这个工具的人可以**正常部署所有东西**。我们把「部署一整台操作系统」(如 Ubuntu 桌面 + noVNC)当作高难度用例——若这类任务都能运行正常,说明通用层足够鲁棒,其他应用更不在话下。
6
 
7
  ---
8
 
@@ -38,13 +38,11 @@
38
  3. 打开 Space 链接即可。
39
  无需改代码、无需付费、无需像其他云容器那样单独买持久盘或做复杂配置。
40
 
41
- ### 场景 B:跑「另一种镜像」示例(例如 Ubuntu 桌面)
42
 
43
- - 仍用**同一套通用工具**:只是把「要跑的东西」 Ubuntu 桌面
44
- - 操作:Duplicate 本 Space 后,用 **ubuntu-desktop 示例的 Dockerfile** 替换仓库根目录的 `Dockerfile` 内容(仓库里已有 `scripts/` 和 `ubuntu-desktop/start-desktop.sh`,构建上下文不变)
45
- - 之后同样只需在 Settings 里配 Secrets(如 `HF_TOKEN`),无需在通用脚本里加任何 Ubuntu 专用逻辑。
46
-
47
- **Ubuntu 桌面示例步骤**:见 [ubuntu-desktop/README.md](../ubuntu-desktop/README.md)。方式一:用 `ubuntu-desktop/Dockerfile` 的内容替换根目录 `Dockerfile` 后推送。方式二:新建一个 Space,将本仓库的 **deploy-ubuntu-desktop** 分支推送到该 Space 的 main(该分支根目录已是桌面 Dockerfile,仍使用同一套 `scripts/`)。
48
 
49
  ---
50
 
@@ -67,5 +65,4 @@
67
  ## 和「其他云容器」的对比
68
 
69
  - **其它云**:往往要选机型、买持久盘、配网络/密钥等,步骤多、有持续费用。
70
- - **HuggingRun**:Duplicate Space → 按需设 `HF_TOKEN` / `RUN_CMD`(或换示例 Dockerfile),即可跑任意兼容 Docker 的应用,持久化用 HF Dataset,不额外付费。
71
- 所有修改都围绕这套**通用工具**展开;示例(包括 Ubuntu 桌面)只演示用法,不扩展通用层为「专用逻辑」。
 
1
  # HuggingRun 通用用法
2
 
3
+ 本文档说明**通用工具**的用法。所有能力都围绕这一套工具展开;示例只是「同一条通用流水线 + 不同 RUN_CMD 或不同 Dockerfile」的用法,不做单独定制。
4
 
5
+ **设计目标**:让用这个工具的人可以**正常部署所有东西**。
6
 
7
  ---
8
 
 
38
  3. 打开 Space 链接即可。
39
  无需改代码、无需付费、无需像其他云容器那样单独买持久盘或做复杂配置。
40
 
41
+ ### 场景 B:跑「另一种镜像」示例
42
 
43
+ - 仍用**同一套通用工具**:只是换 Dockerfile
44
+ - 操作:Duplicate 本 Space 后,替换根目录的 `Dockerfile` 内容。
45
+ - 之后同样只需在 Settings 里配 Secrets(如 `HF_TOKEN`),无需在通用脚本里加专用逻辑。
 
 
46
 
47
  ---
48
 
 
65
  ## 和「其他云容器」的对比
66
 
67
  - **其它云**:往往要选机型、买持久盘、配网络/密钥等,步骤多、有持续费用。
68
+ - **HuggingRun**:Duplicate Space → 按需设 `HF_TOKEN` / `RUN_CMD`(或换示例 Dockerfile),即可跑任意兼容 Docker 的应用,持久化用 HF Dataset,不额外付费。
 
docs/PUSH_DEBUG.md CHANGED
@@ -87,10 +87,10 @@ curl -N -H "Authorization: Bearer $HF_TOKEN" \
87
  # Demo 或默认 Space
88
  HF_TOKEN=你的token python3 scripts/monitor_and_test.py --wait-running --test
89
 
90
- # Ubuntu 桌面等:根路径返回 noVNC 目录列表,用 Directory listing;桌面在 /vnc.html
91
  HF_TOKEN=你的token python3 scripts/monitor_and_test.py --wait-running --test \
92
  --url https://你的用户名-你的Space名.hf.space \
93
- --expect "Directory listing"
94
  ```
95
 
96
  **方式 B:无 HF_TOKEN 时**(只轮询 URL 直到页面出现期望内容)
@@ -98,17 +98,17 @@ HF_TOKEN=你的token python3 scripts/monitor_and_test.py --wait-running --test \
98
  ```bash
99
  python3 scripts/monitor_and_test.py --wait-url --test \
100
  --url https://你的用户名-你的Space名.hf.space \
101
- --expect "Directory listing" --max-wait 900
102
  ```
103
 
104
- 脚本会先轮询直到 GET 200 且 body 含你给的 `--expect`(Ubuntu 桌面根路径返回目录列表用 `--expect "Directory listing"`;桌面客户端在 `/vnc.html`),再跑:基础 GET、压力请求、多轮持久化检查。**全部通过才 exit 0**,任一失败则 exit 1。
105
 
106
  ### 2.5 不等待、直接测当前页面(Space 已 RUNNING 时)
107
 
108
  ```bash
109
  python3 scripts/monitor_and_test.py --test
110
  # 或
111
- python3 scripts/monitor_and_test.py --url https://xxx.hf.space --test --expect "Directory listing"
112
  ```
113
 
114
  ---
@@ -122,7 +122,7 @@ python3 scripts/monitor_and_test.py --url https://xxx.hf.space --test --expect "
122
 
123
  2. **构建完成后**:另一个终端等 RUNNING 并跑测试。
124
  ```bash
125
- HF_TOKEN=xxx python3 scripts/monitor_and_test.py --until-ok --url https://tao-shen-huggingrun.hf.space --expect "Directory listing"
126
  ```
127
 
128
  3. 若 **测试失败或一直 503**:用 `--logs run`(以及 `--logs build`)看容器内报错,修代码后:
 
87
  # Demo 或默认 Space
88
  HF_TOKEN=你的token python3 scripts/monitor_and_test.py --wait-running --test
89
 
90
+ # 自定义 expect 内容
91
  HF_TOKEN=你的token python3 scripts/monitor_and_test.py --wait-running --test \
92
  --url https://你的用户名-你的Space名.hf.space \
93
+ --expect "ttyd"
94
  ```
95
 
96
  **方式 B:无 HF_TOKEN 时**(只轮询 URL 直到页面出现期望内容)
 
98
  ```bash
99
  python3 scripts/monitor_and_test.py --wait-url --test \
100
  --url https://你的用户名-你的Space名.hf.space \
101
+ --expect "ttyd" --max-wait 900
102
  ```
103
 
104
+ 脚本会先轮询直到 GET 200 且 body 含你给的 `--expect`,再跑:基础 GET、压力请求、多轮持久化检查。**全部通过才 exit 0**,任一失败则 exit 1。
105
 
106
  ### 2.5 不等待、直接测当前页面(Space 已 RUNNING 时)
107
 
108
  ```bash
109
  python3 scripts/monitor_and_test.py --test
110
  # 或
111
+ python3 scripts/monitor_and_test.py --url https://xxx.hf.space --test --expect "ttyd"
112
  ```
113
 
114
  ---
 
122
 
123
  2. **构建完成后**:另一个终端等 RUNNING 并跑测试。
124
  ```bash
125
+ HF_TOKEN=xxx python3 scripts/monitor_and_test.py --until-ok --url https://tao-shen-huggingrun.hf.space --expect "ttyd"
126
  ```
127
 
128
  3. 若 **测试失败或一直 503**:用 `--logs run`(以及 `--logs build`)看容器内报错,修代码后:
docs/plans/2025-03-03-ubuntu-desktop-design.md DELETED
@@ -1,26 +0,0 @@
1
- # Ubuntu 桌面版 on HuggingRun 设计
2
-
3
- **目标**: 在 HuggingRun 上部署最新版 Ubuntu 桌面(浏览器内 noVNC 完整桌面),打通常用功能,重启后状态完整保留。
4
-
5
- ## 方案
6
-
7
- - **基础镜像**: Ubuntu 24.04 LTS
8
- - **桌面**: XFCE(轻量,适合 2 vCPU / 16GB)
9
- - **显示**: Xvfb 虚拟显示 + TigerVNC + noVNC(noVNC 监听 7860,满足 HF Spaces)
10
- - **持久化**: 桌面用户 HOME 放在 `PERSIST_PATH`(默认 `/data/desktop-home`),由现有 sync_hf.py 同步到 HF Dataset;启动时先恢复再挂载/HOME 指向该目录
11
- - **入口**: 独立 `ubuntu-desktop/` 目录,自有 Dockerfile;entrypoint 先执行 sync 恢复,再启动 Xvfb → 桌面 → VNC → noVNC
12
-
13
- ## 完成标准(迭代开发)
14
-
15
- - [ ] `ubuntu-desktop/` 可独立构建并运行,浏览器访问 7860 看到完整 XFCE 桌面
16
- - [ ] 桌面功能可用:文件管理器、终端、浏览器(Firefox)、文本编辑器
17
- - [ ] 设置 HF_TOKEN + AUTO_CREATE_DATASET 后,重启 Space 后桌面状态(桌面文件、配置、已装软件状态)保留,无报错
18
- - [ ] 周期性同步与退出时同步正常,无遗漏
19
-
20
- ## 实现要点
21
-
22
- 1. **Dockerfile.ubuntu-desktop**: FROM ubuntu:24.04,装 python3、huggingface_hub、XFCE、TigerVNC、noVNC、Firefox;复制 HuggingRun scripts;用户 uid 1000;HOME 指向持久化目录
23
- 2. **entrypoint_desktop**: 恢复 `/data` → 创建并绑定 `/data/desktop-home` 为桌面 HOME → 启动 sync 后台 → 启动 Xvfb、dbus、XFCE、x11vnc/tigervnc、noVNC(监听 7860)
24
- 3. **PERSIST_PATH**: 使用 `/data`,`/data/desktop-home` 存桌面主目录;sync 继续上传/下载整个 `/data`
25
-
26
- 日期: 2025-03-03
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
scripts/monitor_and_test.py DELETED
@@ -1,633 +0,0 @@
1
- #!/usr/bin/env python3
2
- """
3
- HuggingRun: 监控远端 Space 状态并执行基础/压力/持久化验证(通用工具,适用于任意 Space)。
4
- 轮询用 HF API(runtime 状态 + build/run 日志),不是只轮询 URL。
5
-
6
- 用法:
7
- python3 scripts/monitor_and_test.py --test
8
- python3 scripts/monitor_and_test.py --ssh-test --ssh-host localhost --ssh-port 2222 --ssh-user user
9
- python3 scripts/monitor_and_test.py --ssh-test --ssh-stress-n 30 --ssh-host localhost
10
- HF_TOKEN=xxx python3 scripts/monitor_and_test.py --watch
11
- HF_TOKEN=xxx python3 scripts/monitor_and_test.py --until-ok --url https://xxx.hf.space --expect noVNC
12
- HF_TOKEN=xxx python3 scripts/monitor_and_test.py --logs run
13
- HF_TOKEN=xxx python3 scripts/monitor_and_test.py --logs build
14
- 等价 curl(需 Bearer token):
15
- curl -N -H "Authorization: Bearer $HF_TOKEN" "https://huggingface.co/api/spaces/<SPACE_ID>/logs/run"
16
- curl -N -H "Authorization: Bearer $HF_TOKEN" "https://huggingface.co/api/spaces/<SPACE_ID>/logs/build"
17
- """
18
- import argparse
19
- import os
20
- import sys
21
- import time
22
- import urllib.request
23
- import urllib.error
24
-
25
- # Load .env from repo root if present (HF_TOKEN etc.); never commit .env
26
- def _load_dotenv():
27
- if os.environ.get("HF_TOKEN"):
28
- return
29
- root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
30
- env_file = os.path.join(root, ".env")
31
- if not os.path.isfile(env_file):
32
- return
33
- with open(env_file) as f:
34
- for line in f:
35
- line = line.strip()
36
- if not line or line.startswith("#"):
37
- continue
38
- if "=" in line:
39
- k, v = line.split("=", 1)
40
- k, v = k.strip(), v.strip().strip('"').strip("'")
41
- if k and v and k not in os.environ:
42
- os.environ[k] = v
43
-
44
- _load_dotenv()
45
-
46
- SPACE_ID = os.environ.get("SPACE_ID", "tao-shen/HuggingRun")
47
- HF_LOGS_BASE = "https://huggingface.co/api/spaces"
48
- # HF Space app URL (replace / with - and often lowercase)
49
- APP_URL = os.environ.get("APP_URL", "https://tao-shen-huggingrun.hf.space")
50
-
51
-
52
- def get_runtime():
53
- try:
54
- from huggingface_hub import HfApi
55
- token = os.environ.get("HF_TOKEN")
56
- if not token:
57
- return None, "HF_TOKEN not set"
58
- api = HfApi(token=token)
59
- rt = api.get_space_runtime(SPACE_ID)
60
- return rt, None
61
- except Exception as e:
62
- return None, str(e)
63
-
64
-
65
- def get_stage():
66
- """当前 state 一次查询,立即返回。返回 (stage, err)。"""
67
- rt, err = get_runtime()
68
- if err:
69
- return None, err
70
- stage = getattr(rt, "stage", None) or (getattr(rt, "raw", None) or {}).get("stage")
71
- return stage, None
72
-
73
-
74
- def wait_running(max_wait_sec=600, poll_interval=15, app_url=None, expect_substrings=None):
75
- """轮询直到 stage == RUNNING 或 APP_STARTING 且 URL 已 200+期望内容;先立即查一次,已失败则马上返回。"""
76
- start = time.time()
77
- first = True
78
- while (time.time() - start) < max_wait_sec:
79
- if not first:
80
- time.sleep(poll_interval)
81
- first = False
82
- stage, err = get_stage()
83
- if err:
84
- print(f"[monitor] get_runtime error: {err}")
85
- continue
86
- print(f"[monitor] Space {SPACE_ID} stage={stage}")
87
- if stage == "RUNNING":
88
- return True
89
- if stage == "ERROR" or stage == "BUILD_ERROR":
90
- print(f"[monitor] Space in error state: {stage}")
91
- return False
92
- # APP_STARTING 时若 URL 已可访问则视为就绪(HF 可能迟迟不标 RUNNING)
93
- if stage == "APP_STARTING" and app_url and expect_substrings:
94
- status, body = http_get(app_url, timeout=10)
95
- if status == 200 and any(s in body for s in expect_substrings):
96
- print(f"[monitor] App URL ready (stage still APP_STARTING)")
97
- return True
98
- print("[monitor] Timeout waiting for RUNNING")
99
- return False
100
-
101
-
102
- def wait_url(url, expect_substrings=None, max_wait_sec=900, poll_interval=20):
103
- """轮询 URL 直到 GET 200 且 body 含任一 expect_substrings;无 HF_TOKEN 时用。"""
104
- if expect_substrings is None:
105
- expect_substrings = ("HuggingRun", "Run anything", "noVNC")
106
- start = time.time()
107
- while (time.time() - start) < max_wait_sec:
108
- status, body = http_get(url, timeout=30)
109
- if status == 200 and any(s in body for s in expect_substrings):
110
- print(f"[monitor] URL ready: {url}")
111
- return True
112
- print(f"[monitor] URL not ready: status={status}, waiting {poll_interval}s ...")
113
- time.sleep(poll_interval)
114
- print("[monitor] Timeout waiting for URL content")
115
- return False
116
-
117
-
118
- def http_get(url, timeout=30, retries=3, retry_delay=2):
119
- """GET url; retry on 502/503/timeout/connection errors (generic HF robustness)."""
120
- last_status, last_body, last_err = None, "", None
121
- for attempt in range(max(1, retries)):
122
- try:
123
- req = urllib.request.Request(url, method="GET")
124
- with urllib.request.urlopen(req, timeout=timeout) as resp:
125
- body = resp.read().decode("utf-8", errors="replace")
126
- return (resp.status, body)
127
- except urllib.error.HTTPError as e:
128
- last_status = e.code
129
- last_body = e.read().decode("utf-8", errors="replace") if e.fp else ""
130
- last_err = e
131
- if e.code in (502, 503) and attempt < retries - 1:
132
- time.sleep(retry_delay)
133
- continue
134
- return (e.code, last_body)
135
- except (OSError, urllib.error.URLError) as e:
136
- last_err = e
137
- last_status = -1
138
- last_body = str(e)
139
- if attempt < retries - 1:
140
- time.sleep(retry_delay)
141
- continue
142
- return (-1, last_body)
143
- return (last_status or -1, last_body or str(last_err or ""))
144
-
145
-
146
- def test_basic(url, expect_substrings=None):
147
- """GET url; pass if status 200 and body contains any of expect_substrings (default: HuggingRun / Run anything)."""
148
- if expect_substrings is None:
149
- expect_substrings = ("HuggingRun", "Run anything")
150
- status, body = http_get(url)
151
- found = any(s in body for s in expect_substrings)
152
- ok = status == 200 and found
153
- print(f"[test] GET {url} -> {status}, body contains expected: {found}")
154
- return ok
155
-
156
-
157
- def test_stress(url, n=50, concurrency=10):
158
- """连续请求 n 次(简单串行),检查均返回 200。"""
159
- import concurrent.futures
160
- failed = 0
161
- def one(i):
162
- s, _ = http_get(url, timeout=15)
163
- return s == 200
164
- with concurrent.futures.ThreadPoolExecutor(max_workers=concurrency) as ex:
165
- results = list(ex.map(one, range(n)))
166
- passed = sum(results)
167
- failed = n - passed
168
- print(f"[stress] {n} requests: {passed} ok, {failed} failed")
169
- return failed == 0
170
-
171
-
172
- def test_persistence(url, rounds=3):
173
- """多轮访问,每轮均需返回 200(通用:任意应用只要稳定返回 200 即通过)。"""
174
- ok_rounds = 0
175
- for _ in range(rounds):
176
- status, _ = http_get(url)
177
- if status == 200:
178
- ok_rounds += 1
179
- time.sleep(1)
180
- print(f"[persistence] {rounds} rounds: {ok_rounds} ok")
181
- return ok_rounds == rounds
182
-
183
-
184
- # ── SSH Tests ────────────────────────────────────────────────────────────────
185
-
186
- def _ssh_cmd(host, port, user, command, timeout=15, identity_file=None):
187
- """Run a command over SSH. Returns (returncode, stdout, stderr)."""
188
- import subprocess
189
- cmd = [
190
- "ssh", "-o", "StrictHostKeyChecking=no",
191
- "-o", "UserKnownHostsFile=/dev/null",
192
- "-o", f"ConnectTimeout={timeout}",
193
- "-o", "LogLevel=ERROR",
194
- "-p", str(port),
195
- ]
196
- if identity_file:
197
- cmd += ["-i", identity_file]
198
- cmd += [f"{user}@{host}", command]
199
- try:
200
- proc = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout + 5)
201
- return proc.returncode, proc.stdout, proc.stderr
202
- except subprocess.TimeoutExpired:
203
- return -1, "", "SSH command timed out"
204
- except Exception as e:
205
- return -1, "", str(e)
206
-
207
-
208
- def test_ssh_connect(host, port, user, identity_file=None):
209
- """Test SSH connectivity: run 'echo ok' and verify output."""
210
- rc, out, err = _ssh_cmd(host, port, user, "echo ok", identity_file=identity_file)
211
- ok = rc == 0 and "ok" in out
212
- print(f"[ssh-test] connect {user}@{host}:{port} -> rc={rc}, output={'ok' if ok else repr(out.strip())}")
213
- if not ok and err:
214
- print(f"[ssh-test] stderr: {err.strip()}")
215
- return ok
216
-
217
-
218
- def test_ssh_command(host, port, user, identity_file=None):
219
- """Test SSH command execution: run several diagnostic commands."""
220
- checks = [
221
- ("whoami", lambda out: user in out),
222
- ("uname -s", lambda out: "Linux" in out),
223
- ("which claude || echo no-claude", lambda out: "claude" in out.lower()),
224
- ("pgrep -a ttyd || pgrep -a sshd", lambda out: len(out.strip()) > 0),
225
- ]
226
- all_ok = True
227
- for cmd, validate in checks:
228
- rc, out, err = _ssh_cmd(host, port, user, cmd, identity_file=identity_file)
229
- passed = rc == 0 and validate(out)
230
- status = "PASS" if passed else "FAIL"
231
- print(f"[ssh-test] cmd '{cmd}' -> {status} (rc={rc}, out={out.strip()[:80]})")
232
- if not passed:
233
- all_ok = False
234
- return all_ok
235
-
236
-
237
- def test_ssh_stress(host, port, user, n=30, concurrency=10, identity_file=None):
238
- """SSH stress test: n concurrent SSH sessions each running a command."""
239
- import concurrent.futures
240
-
241
- def one_session(i):
242
- rc, out, _ = _ssh_cmd(host, port, user, f"echo session-{i} && uptime",
243
- timeout=20, identity_file=identity_file)
244
- return rc == 0 and f"session-{i}" in out
245
-
246
- with concurrent.futures.ThreadPoolExecutor(max_workers=concurrency) as ex:
247
- results = list(ex.map(one_session, range(n)))
248
- passed = sum(results)
249
- failed = n - passed
250
- print(f"[ssh-stress] {n} sessions (concurrency={concurrency}): {passed} ok, {failed} failed")
251
- return failed == 0
252
-
253
-
254
- def test_ssh_bruteforce(host, port, user, rounds=3, ramp_up=None, identity_file=None):
255
- """Multi-round SSH stress with increasing concurrency (brute-force style)."""
256
- if ramp_up is None:
257
- ramp_up = [(20, 5), (40, 10), (60, 20)]
258
- all_ok = True
259
- for r in range(rounds):
260
- n, conc = ramp_up[r % len(ramp_up)]
261
- print(f"[ssh-bruteforce] Round {r+1}/{rounds}: {n} sessions, concurrency={conc}")
262
- ok = test_ssh_stress(host, port, user, n=n, concurrency=conc, identity_file=identity_file)
263
- if not ok:
264
- all_ok = False
265
- print(f"[ssh-bruteforce] Round {r+1} FAILED")
266
- break
267
- time.sleep(1)
268
- if all_ok:
269
- print(f"[ssh-bruteforce] ALL {rounds} rounds PASSED")
270
- return all_ok
271
-
272
-
273
- def test_ssh_persistence_stress(host, port, user, persist_path="/data",
274
- n_files=100, concurrency=10, identity_file=None):
275
- """Persistence stress test: write many files via SSH, verify they exist, check integrity.
276
-
277
- Tests the operating system's persistent storage under load:
278
- 1. Write n_files with known content (concurrent)
279
- 2. Verify all files exist and content matches
280
- 3. Write large files to test storage capacity
281
- 4. Verify checksums
282
- """
283
- import concurrent.futures
284
- import hashlib
285
-
286
- test_dir = f"{persist_path}/stress-test-{int(time.time())}"
287
- print(f"[persist-stress] Creating {n_files} files in {test_dir} ...")
288
-
289
- # Phase 1: Create test directory
290
- rc, _, err = _ssh_cmd(host, port, user, f"mkdir -p {test_dir}", identity_file=identity_file)
291
- if rc != 0:
292
- print(f"[persist-stress] FAIL: cannot mkdir {test_dir}: {err}")
293
- return False
294
-
295
- # Phase 2: Write files concurrently
296
- def write_file(i):
297
- content = f"persistence-test-file-{i}-{time.time()}"
298
- cmd = f"echo '{content}' > {test_dir}/file_{i:04d}.txt"
299
- rc, _, _ = _ssh_cmd(host, port, user, cmd, timeout=20, identity_file=identity_file)
300
- return rc == 0, content
301
-
302
- with concurrent.futures.ThreadPoolExecutor(max_workers=concurrency) as ex:
303
- results = list(ex.map(write_file, range(n_files)))
304
- written = sum(1 for ok, _ in results if ok)
305
- print(f"[persist-stress] Written: {written}/{n_files} files")
306
- if written < n_files:
307
- print(f"[persist-stress] FAIL: only {written}/{n_files} files written")
308
- return False
309
-
310
- # Phase 3: Verify all files exist
311
- rc, out, _ = _ssh_cmd(host, port, user, f"ls {test_dir}/ | wc -l",
312
- timeout=30, identity_file=identity_file)
313
- count = int(out.strip()) if rc == 0 and out.strip().isdigit() else 0
314
- print(f"[persist-stress] Verified: {count} files exist on disk")
315
- if count < n_files:
316
- print(f"[persist-stress] FAIL: expected {n_files}, found {count}")
317
- return False
318
-
319
- # Phase 4: Write a large file (1MB) to test storage
320
- rc, _, err = _ssh_cmd(host, port, user,
321
- f"dd if=/dev/urandom of={test_dir}/large_1mb.bin bs=1024 count=1024 2>/dev/null && "
322
- f"ls -la {test_dir}/large_1mb.bin",
323
- timeout=30, identity_file=identity_file)
324
- if rc != 0:
325
- print(f"[persist-stress] FAIL: cannot write large file: {err}")
326
- return False
327
- print(f"[persist-stress] Large file (1MB) written OK")
328
-
329
- # Phase 5: Compute and verify checksum
330
- rc, out, _ = _ssh_cmd(host, port, user,
331
- f"sha256sum {test_dir}/large_1mb.bin",
332
- timeout=30, identity_file=identity_file)
333
- if rc != 0 or not out.strip():
334
- print(f"[persist-stress] FAIL: cannot compute checksum")
335
- return False
336
- checksum1 = out.strip().split()[0]
337
-
338
- # Re-read and verify checksum matches
339
- rc, out, _ = _ssh_cmd(host, port, user,
340
- f"sha256sum {test_dir}/large_1mb.bin",
341
- timeout=30, identity_file=identity_file)
342
- checksum2 = out.strip().split()[0] if rc == 0 else ""
343
- if checksum1 != checksum2:
344
- print(f"[persist-stress] FAIL: checksum mismatch {checksum1} != {checksum2}")
345
- return False
346
- print(f"[persist-stress] Checksum verified: {checksum1[:16]}...")
347
-
348
- # Phase 6: Concurrent read-write (simulates real usage)
349
- def read_write(i):
350
- # Read existing file, write new one
351
- rc1, out, _ = _ssh_cmd(host, port, user,
352
- f"cat {test_dir}/file_{i:04d}.txt",
353
- timeout=20, identity_file=identity_file)
354
- rc2, _, _ = _ssh_cmd(host, port, user,
355
- f"echo 'updated-{i}' >> {test_dir}/file_{i:04d}.txt",
356
- timeout=20, identity_file=identity_file)
357
- return rc1 == 0 and rc2 == 0
358
-
359
- print(f"[persist-stress] Concurrent read-write test ({n_files} files, {concurrency} workers)...")
360
- with concurrent.futures.ThreadPoolExecutor(max_workers=concurrency) as ex:
361
- results = list(ex.map(read_write, range(n_files)))
362
- rw_ok = sum(results)
363
- print(f"[persist-stress] Read-write: {rw_ok}/{n_files} ok")
364
-
365
- # Cleanup
366
- _ssh_cmd(host, port, user, f"rm -rf {test_dir}", timeout=30, identity_file=identity_file)
367
-
368
- all_ok = rw_ok == n_files
369
- if all_ok:
370
- print(f"[persist-stress] ALL PERSISTENCE TESTS PASSED")
371
- return all_ok
372
-
373
-
374
- def _curl_logs_url(space_id: str, log_type: str) -> str:
375
- """Build the logs API URL (same as user's curl command)."""
376
- return f"https://huggingface.co/api/spaces/{space_id}/logs/{log_type}"
377
-
378
-
379
- def stream_logs(space_id: str, log_type: str):
380
- """Stream build or run logs via curl (user's command). Requires HF_TOKEN."""
381
- import subprocess
382
- token = os.environ.get("HF_TOKEN")
383
- if not token:
384
- print("HF_TOKEN required for --logs", file=sys.stderr)
385
- sys.exit(1)
386
- url = _curl_logs_url(space_id, log_type)
387
- # curl -N -H "Authorization: Bearer $HF_TOKEN" "https://huggingface.co/api/spaces/<SPACE_ID>/logs/run|build"
388
- try:
389
- proc = subprocess.Popen(
390
- ["curl", "-N", "-sS", "-H", f"Authorization: Bearer {token}", url],
391
- stdout=subprocess.stdout,
392
- stderr=subprocess.stderr,
393
- )
394
- proc.wait()
395
- if proc.returncode != 0:
396
- sys.exit(proc.returncode or 1)
397
- except FileNotFoundError:
398
- print("curl not found; falling back to urllib", file=sys.stderr)
399
- req = urllib.request.Request(url, method="GET")
400
- req.add_header("Authorization", f"Bearer {token}")
401
- with urllib.request.urlopen(req, timeout=5) as resp:
402
- while True:
403
- chunk = resp.read(4096)
404
- if not chunk:
405
- break
406
- sys.stdout.buffer.write(chunk)
407
- sys.stdout.flush()
408
- except Exception as e:
409
- print(f"Logs error: {e}", file=sys.stderr)
410
- sys.exit(1)
411
-
412
-
413
- def fetch_log_tail(space_id: str, log_type: str, read_timeout=60, keep_tail_chars=25000):
414
- """Fetch log via curl (user's command), return last keep_tail_chars. Used when build/run fails."""
415
- import subprocess
416
- token = os.environ.get("HF_TOKEN")
417
- if not token:
418
- return "(HF_TOKEN not set — set it and run again to see logs)"
419
- url = _curl_logs_url(space_id, log_type)
420
- try:
421
- proc = subprocess.run(
422
- ["curl", "-N", "-sS", "-H", f"Authorization: Bearer {token}", "--max-time", str(read_timeout), url],
423
- capture_output=True,
424
- text=True,
425
- timeout=read_timeout + 10,
426
- )
427
- out = (proc.stdout or "") + (proc.stderr or "")
428
- return out[-keep_tail_chars:] if len(out) > keep_tail_chars else out
429
- except FileNotFoundError:
430
- # fallback to urllib
431
- req = urllib.request.Request(url, method="GET")
432
- req.add_header("Authorization", f"Bearer {token}")
433
- with urllib.request.urlopen(req, timeout=read_timeout) as resp:
434
- out = resp.read().decode("utf-8", errors="replace")
435
- return out[-keep_tail_chars:] if len(out) > keep_tail_chars else out
436
- except Exception as e:
437
- return f"(failed to fetch log: {e})"
438
-
439
-
440
- def main():
441
- global SPACE_ID, APP_URL
442
- p = argparse.ArgumentParser()
443
- p.add_argument("--space-id", default=SPACE_ID)
444
- p.add_argument("--url", default=APP_URL)
445
- p.add_argument("--wait-running", action="store_true", help="Poll until Space is RUNNING")
446
- p.add_argument("--test", action="store_true", help="Run basic + stress + persistence tests")
447
- p.add_argument("--logs", choices=("build", "run"), help="Stream logs: build or run (SSE)")
448
- p.add_argument("--stress-n", type=int, default=50)
449
- p.add_argument("--max-wait", type=int, default=600)
450
- p.add_argument("--expect", action="append", dest="expect_substrings",
451
- help="Expected substring(s) in response body (basic test). Can repeat. Default: HuggingRun, Run anything")
452
- p.add_argument("--wait-url", action="store_true",
453
- help="Poll URL until 200 and body contains one of --expect (no HF_TOKEN needed)")
454
- p.add_argument("--until-ok", action="store_true",
455
- help="Poll API until RUNNING, then test; on any fail print log tail and exit 1. Loop until this exits 0.")
456
- p.add_argument("--watch", action="store_true",
457
- help="Use curl to poll run (and optional build) logs + app URL every N sec; don't stop (Ctrl+C to exit)")
458
- p.add_argument("--watch-interval", type=int, default=20, help="Seconds between --watch polls (default 20)")
459
- # SSH test options
460
- p.add_argument("--ssh-test", action="store_true",
461
- help="Run SSH tests: connect + command + stress + bruteforce")
462
- p.add_argument("--ssh-host", default="localhost", help="SSH host (default: localhost)")
463
- p.add_argument("--ssh-port", type=int, default=2222, help="SSH port (default: 2222)")
464
- p.add_argument("--ssh-user", default="user", help="SSH user (default: user)")
465
- p.add_argument("--ssh-key", default=None, help="Path to SSH private key (optional)")
466
- p.add_argument("--ssh-stress-n", type=int, default=30, help="SSH stress: total sessions (default: 30)")
467
- p.add_argument("--ssh-concurrency", type=int, default=10, help="SSH stress: concurrent sessions (default: 10)")
468
- args = p.parse_args()
469
- SPACE_ID = args.space_id
470
- APP_URL = args.url.rstrip("/")
471
- expect_substrings = tuple(args.expect_substrings) if args.expect_substrings else None
472
-
473
- if args.logs:
474
- stream_logs(SPACE_ID, args.logs)
475
- return
476
-
477
- if args.watch:
478
- # 用 curl + Bearer token 持续查看远端状态,不退出
479
- if not os.environ.get("HF_TOKEN"):
480
- print("HF_TOKEN required for --watch (use .env or export)", file=sys.stderr)
481
- sys.exit(1)
482
- import subprocess
483
- interval = max(10, args.watch_interval)
484
- run_url = _curl_logs_url(SPACE_ID, "run")
485
- build_url = _curl_logs_url(SPACE_ID, "build")
486
- token = os.environ.get("HF_TOKEN")
487
- curl_h = ["-H", f"Authorization: Bearer {token}", "-N", "-sS", "--max-time", str(interval + 5)]
488
- n = 0
489
- while True:
490
- n += 1
491
- ts = time.strftime("%H:%M:%S", time.gmtime())
492
- print(f"\n[watch #{n} {ts}] === runtime stage ===")
493
- stage, _ = get_stage()
494
- print(f"[watch] stage={stage}")
495
- print(f"[watch] === GET {APP_URL} ===")
496
- status, body = http_get(APP_URL, timeout=15)
497
- print(f"[watch] HTTP {status}, body len={len(body)}, has noVNC={('noVNC' in body)}")
498
- print(f"[watch] === run log (tail, curl --max-time {interval}) ===")
499
- proc = subprocess.run(
500
- ["curl"] + curl_h + ["--max-time", str(interval), run_url],
501
- capture_output=True, text=True, timeout=interval + 10,
502
- )
503
- out = (proc.stdout or "") + (proc.stderr or "")
504
- tail = out[-4000:] if len(out) > 4000 else out
505
- for line in tail.strip().split("\n")[-25:]:
506
- print(line)
507
- print(f"[watch] next in {interval}s (Ctrl+C to stop)...")
508
- time.sleep(interval)
509
- return
510
-
511
- if args.until_ok:
512
- # 先立即查一次当前状态;已报错则马上用 curl 拉日志并退出,不空等
513
- if not os.environ.get("HF_TOKEN"):
514
- print("HF_TOKEN required for --until-ok (poll runtime + fetch logs)", file=sys.stderr)
515
- sys.exit(1)
516
- stage, err = get_stage()
517
- if err:
518
- print(f"[monitor] {err}")
519
- sys.exit(1)
520
- print(f"[monitor] Space {SPACE_ID} stage={stage}")
521
- if stage == "ERROR" or stage == "BUILD_ERROR":
522
- print(f"[monitor] 远端已报错,立即拉取日志 (curl)")
523
- print("\n[monitor] === Build log (tail) ===")
524
- print(fetch_log_tail(SPACE_ID, "build", read_timeout=15))
525
- print("\n[monitor] === Run log (tail) ===")
526
- print(fetch_log_tail(SPACE_ID, "run", read_timeout=15))
527
- sys.exit(1)
528
- if stage != "RUNNING":
529
- ok = wait_running(
530
- max_wait_sec=args.max_wait,
531
- poll_interval=5,
532
- app_url=APP_URL,
533
- expect_substrings=expect_substrings or ("HuggingRun", "Run anything", "noVNC"),
534
- )
535
- if not ok:
536
- print("\n[monitor] === Build log (tail) ===")
537
- print(fetch_log_tail(SPACE_ID, "build", read_timeout=15))
538
- print("\n[monitor] === Run log (tail) ===")
539
- print(fetch_log_tail(SPACE_ID, "run", read_timeout=15))
540
- sys.exit(1)
541
- print(f"[test] Target: {APP_URL}")
542
- if not test_basic(APP_URL, expect_substrings=expect_substrings):
543
- print("[test] BASIC FAILED")
544
- print("\n[monitor] === Run log (tail) ===")
545
- print(fetch_log_tail(SPACE_ID, "run"))
546
- sys.exit(1)
547
- if not test_stress(APP_URL, n=args.stress_n):
548
- print("[test] STRESS FAILED")
549
- print("\n[monitor] === Run log (tail) ===")
550
- print(fetch_log_tail(SPACE_ID, "run"))
551
- sys.exit(1)
552
- if not test_persistence(APP_URL):
553
- print("[test] PERSISTENCE FAILED")
554
- print("\n[monitor] === Run log (tail) ===")
555
- print(fetch_log_tail(SPACE_ID, "run"))
556
- sys.exit(1)
557
- print("[test] ALL PASSED")
558
- return
559
-
560
- if args.wait_running:
561
- ok = wait_running(max_wait_sec=args.max_wait)
562
- if not ok:
563
- print("\n[monitor] === Build log (tail) ===")
564
- print(fetch_log_tail(SPACE_ID, "build"))
565
- print("\n[monitor] === Run log (tail) ===")
566
- print(fetch_log_tail(SPACE_ID, "run"))
567
- sys.exit(1)
568
-
569
- if args.wait_url:
570
- ok = wait_url(APP_URL, expect_substrings=expect_substrings or ("HuggingRun", "Run anything", "noVNC"),
571
- max_wait_sec=args.max_wait, poll_interval=20)
572
- if not ok:
573
- sys.exit(1)
574
-
575
- if args.ssh_test:
576
- print(f"[ssh-test] Target: {args.ssh_user}@{args.ssh_host}:{args.ssh_port}")
577
- print("=" * 60)
578
- print("[Phase 1] SSH Connect")
579
- if not test_ssh_connect(args.ssh_host, args.ssh_port, args.ssh_user, identity_file=args.ssh_key):
580
- print("[ssh-test] CONNECT FAILED")
581
- sys.exit(1)
582
- print()
583
- print("[Phase 2] SSH Command Execution")
584
- if not test_ssh_command(args.ssh_host, args.ssh_port, args.ssh_user, identity_file=args.ssh_key):
585
- print("[ssh-test] COMMAND EXEC FAILED")
586
- sys.exit(1)
587
- print()
588
- print("[Phase 3] SSH Stress Test")
589
- if not test_ssh_stress(args.ssh_host, args.ssh_port, args.ssh_user,
590
- n=args.ssh_stress_n, concurrency=args.ssh_concurrency,
591
- identity_file=args.ssh_key):
592
- print("[ssh-test] STRESS FAILED")
593
- sys.exit(1)
594
- print()
595
- print("[Phase 4] SSH Brute-force Ramp-up")
596
- if not test_ssh_bruteforce(args.ssh_host, args.ssh_port, args.ssh_user,
597
- identity_file=args.ssh_key):
598
- print("[ssh-test] BRUTEFORCE FAILED")
599
- sys.exit(1)
600
- print()
601
- print("[Phase 5] Persistence Stress Test")
602
- if not test_ssh_persistence_stress(args.ssh_host, args.ssh_port, args.ssh_user,
603
- n_files=args.ssh_stress_n,
604
- concurrency=args.ssh_concurrency,
605
- identity_file=args.ssh_key):
606
- print("[ssh-test] PERSISTENCE STRESS FAILED")
607
- sys.exit(1)
608
- print("=" * 60)
609
- print("[ssh-test] ALL SSH TESTS PASSED")
610
- return
611
-
612
- if args.test:
613
- print(f"[test] Target: {APP_URL}")
614
- if not test_basic(APP_URL, expect_substrings=expect_substrings):
615
- print("[test] BASIC FAILED")
616
- sys.exit(1)
617
- if not test_stress(APP_URL, n=args.stress_n):
618
- print("[test] STRESS FAILED")
619
- sys.exit(1)
620
- if not test_persistence(APP_URL):
621
- print("[test] PERSISTENCE CHECK (keyword) FAILED")
622
- sys.exit(1)
623
- print("[test] ALL PASSED")
624
- else:
625
- rt, err = get_runtime()
626
- if err:
627
- print("Runtime:", err)
628
- else:
629
- print("Runtime:", getattr(rt, "stage", rt.raw))
630
-
631
-
632
- if __name__ == "__main__":
633
- main()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
scripts/verify_overnight.sh DELETED
@@ -1,38 +0,0 @@
1
- #!/usr/bin/env bash
2
- # Overnight verification: 3 full --until-ok runs. Exit 0 only if all pass.
3
- # Usage: from repo root, with .env containing HF_TOKEN:
4
- # bash scripts/verify_overnight.sh
5
- set -e
6
- REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
7
- cd "$REPO_ROOT"
8
- LOG="$REPO_ROOT/docs/verification_run.log"
9
- APP_URL="${APP_URL:-https://tao-shen-huggingrun.hf.space}"
10
- EXPECT="${EXPECT:-Directory listing}"
11
- ROUNDS="${ROUNDS:-3}"
12
-
13
- if [ ! -f .env ]; then
14
- echo "Missing .env (HF_TOKEN required)" >&2
15
- exit 1
16
- fi
17
- export $(grep -v '^#' .env | xargs)
18
-
19
- echo "=== Overnight verification started $(date -u +%Y-%m-%dT%H:%M:%SZ) ===" | tee -a "$LOG"
20
- echo "APP_URL=$APP_URL EXPECT=$EXPECT ROUNDS=$ROUNDS" | tee -a "$LOG"
21
-
22
- PASSED=0
23
- for r in $(seq 1 "$ROUNDS"); do
24
- echo "" | tee -a "$LOG"
25
- echo "--- Round $r/$ROUNDS at $(date -u +%H:%M:%SZ) ---" | tee -a "$LOG"
26
- if python3 scripts/monitor_and_test.py --until-ok --url "$APP_URL" --expect "$EXPECT" --stress-n 50 >> "$LOG" 2>&1; then
27
- PASSED=$((PASSED+1))
28
- echo "Round $r PASSED" | tee -a "$LOG"
29
- else
30
- echo "Round $r FAILED" | tee -a "$LOG"
31
- exit 1
32
- fi
33
- [ "$r" -lt "$ROUNDS" ] && sleep 30
34
- done
35
-
36
- echo "" | tee -a "$LOG"
37
- echo "=== ALL $ROUNDS ROUNDS PASSED at $(date -u +%Y-%m-%dT%H:%M:%SZ) ===" | tee -a "$LOG"
38
- exit 0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ubuntu-desktop/Dockerfile DELETED
@@ -1,51 +0,0 @@
1
- # Ubuntu 24.04 Desktop on HuggingRun — noVNC on 7860, persistence via /data
2
- FROM ubuntu:24.04
3
-
4
- ENV DEBIAN_FRONTEND=noninteractive
5
-
6
- # System + Python (for sync)
7
- RUN apt-get update && apt-get install -y --no-install-recommends \
8
- ca-certificates curl python3 python3-pip python3-venv \
9
- && pip3 install --no-cache-dir --break-system-packages huggingface_hub \
10
- && rm -rf /var/lib/apt/lists/*
11
-
12
- # Desktop stack: Xvfb, XFCE, dbus, x11vnc, Firefox; OpenSSH for reverse SSH (本地 SSH 进容器)
13
- RUN apt-get update && apt-get install -y --no-install-recommends \
14
- xvfb \
15
- xfce4 xfce4-goodies \
16
- dbus-x11 \
17
- x11vnc \
18
- firefox \
19
- procps \
20
- openssh-server openssh-client \
21
- && rm -rf /var/lib/apt/lists/*
22
-
23
- # noVNC (web client on 7860)
24
- RUN apt-get update && apt-get install -y --no-install-recommends git \
25
- && git clone --depth 1 https://github.com/novnc/noVNC.git /opt/noVNC \
26
- && git clone --depth 1 https://github.com/novnc/websockify /opt/noVNC/utils/websockify \
27
- && rm -rf /var/lib/apt/lists/* /opt/noVNC/.git
28
-
29
- # HF Spaces run as user 1000; UID 1000 may exist (e.g. ubuntu)
30
- RUN (useradd -m -u 1000 user 2>/dev/null) || \
31
- (EXISTING=$$(getent passwd 1000 | cut -d: -f1); \
32
- usermod -l user $$EXISTING; usermod -d /home/user user; \
33
- mkdir -p /home/user && chown 1000:1000 /home/user)
34
- ENV HOME=/home/user
35
- RUN mkdir -p /data && chown user:user /data
36
-
37
- # HuggingRun scripts (build context = repo root)
38
- COPY scripts /scripts
39
- COPY ubuntu-desktop/start-desktop.sh /opt/start-desktop.sh
40
- RUN chmod +x /scripts/entrypoint.sh /opt/start-desktop.sh
41
-
42
- ENV PERSIST_PATH=/data
43
- ENV RUN_CMD="/opt/start-desktop.sh"
44
- ENV DESKTOP_HOME=/data/desktop-home
45
- ENV DISPLAY=:99
46
- ENV VNC_PORT=5901
47
- ENV NOVNC_PORT=7860
48
-
49
- USER user
50
- EXPOSE 7860
51
- ENTRYPOINT ["/scripts/entrypoint.sh"]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ubuntu-desktop/README.md DELETED
@@ -1,20 +0,0 @@
1
- # Ubuntu 桌面示例
2
-
3
- 本目录是 **HuggingRun 通用工具** 的一个示例:在 HF 上跑 Ubuntu + XFCE + noVNC,使用与主仓库**完全相同的** `scripts/`(entrypoint + sync),**不修改任何通用逻辑**;仅通过本目录的 Dockerfile 设置 `RUN_CMD=/opt/start-desktop.sh`。
4
-
5
- - **通用用法**:见 [docs/GENERAL_USAGE.md](docs/GENERAL_USAGE.md)。
6
- - **本示例**:`Dockerfile` 在此目录,构建时从仓库根 COPY `scripts/`,并设置 `RUN_CMD=/opt/start-desktop.sh`;`start-desktop.sh` 启动 Xvfb + XFCE + x11vnc + noVNC(监听 7860),桌面 HOME 放在 `PERSIST_PATH/desktop-home`,由通用同步脚本持久化。
7
-
8
- ## 最小用法(用户只做两件事)
9
-
10
- 1. **Duplicate HuggingRun Space** 后,用本目录的 **Dockerfile 内容替换**仓库根目录的 `Dockerfile`(不增删通用脚本)。
11
- 2. 在 Settings → Secrets 中设置 `HF_TOKEN`,可选 `AUTO_CREATE_DATASET=true`。
12
- 3. 推送后等待构建,浏览器打开 Space 即可看到 noVNC 桌面;重启后状态由通用持久化保留。
13
-
14
- 从仓库根构建(例如本地):`docker build -f ubuntu-desktop/Dockerfile .`
15
-
16
- **部署后监控与压测**(与通用工具同一套):部署完成后,用通用脚本轮询 + 压测即可。例如:
17
- `python3 scripts/monitor_and_test.py --url "https://你的用户名-你的Space名.hf.space" --test --stress-n 50`
18
- 详见 [docs/REMOTE_LOGS.md](docs/REMOTE_LOGS.md) 拉取 build/run 日志配合本地 debug。
19
-
20
- 维护重点在通用层;本示例仅做最小封装,不向 core 增加任何案例专用逻辑。
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ubuntu-desktop/start-desktop.sh DELETED
@@ -1,82 +0,0 @@
1
- #!/bin/bash
2
- # Start Ubuntu desktop: Xvfb + XFCE + x11vnc + noVNC on 7860
3
- # HOME is set to persistent dir by caller (sync/entrypoint). Here we ensure and use it.
4
- echo "[start-desktop] Starting ..." >&2
5
- set -e
6
-
7
- export PERSIST_PATH="${PERSIST_PATH:-/data}"
8
- export DESKTOP_HOME="${DESKTOP_HOME:-$PERSIST_PATH/desktop-home}"
9
- export DISPLAY="${DISPLAY:-:99}"
10
- export VNC_PORT="${VNC_PORT:-5901}"
11
- export NOVNC_PORT="${NOVNC_PORT:-7860}"
12
-
13
- mkdir -p "$DESKTOP_HOME"
14
- export HOME="$DESKTOP_HOME"
15
-
16
- # Ensure minimal XFCE dirs
17
- mkdir -p "$HOME/.config" "$HOME/.local/share" "$HOME/Desktop"
18
-
19
- # Start Xvfb
20
- Xvfb "$DISPLAY" -screen 0 1280x720x24 -ac +extension GLX +render -noreset &
21
- XVFB_PID=$!
22
- sleep 2
23
- echo "[start-desktop] After Xvfb sleep 2" >&2
24
-
25
- # Start dbus for session (optional; run in subshell so failure never triggers set -e)
26
- ( dbus-daemon --session 2>/dev/null ) || true
27
- echo "[start-desktop] Before XFCE background" >&2
28
-
29
- # Start XFCE (lightweight); use full path in case PATH is minimal
30
- (sleep 1; /usr/bin/startxfce4) &
31
- DESKTOP_PID=$!
32
- echo "[start-desktop] After XFCE & before sleep 3" >&2
33
- sleep 3
34
- echo "[start-desktop] XFCE started, starting x11vnc ..." >&2
35
-
36
- # x11vnc: share display :99 on port 5901 (do not exit on failure so noVNC can still start)
37
- x11vnc -display "$DISPLAY" -rfbport "$VNC_PORT" -forever -shared -noxdamage -nopw -bg || true
38
-
39
- # SSH: always start sshd; do not let failures here stop noVNC
40
- set +e
41
- SSHD_PORT="${SSH_PORT:-2222}"
42
- SSHD_LISTEN="${SSH_LISTEN:-0.0.0.0}"
43
- mkdir -p "$HOME/.ssh"
44
-
45
- # If SSH_AUTHORIZED_KEYS is set, use key-based auth only; otherwise allow password auth for local testing
46
- [ -n "${SSH_AUTHORIZED_KEYS-}" ] && echo "$SSH_AUTHORIZED_KEYS" > "$HOME/.ssh/authorized_keys" && chmod 600 "$HOME/.ssh/authorized_keys"
47
-
48
- # Use pre-generated host key from Docker build, or generate at runtime
49
- HOST_KEY="$HOME/.ssh/ssh_host_ed25519_key"
50
- [ ! -f "$HOST_KEY" ] && cp /home/user/.ssh/ssh_host_ed25519_key "$HOST_KEY" 2>/dev/null
51
- [ ! -f "$HOST_KEY" ] && ssh-keygen -t ed25519 -f "$HOST_KEY" -N "" -C "" 2>/dev/null
52
-
53
- if [ -f "$HOST_KEY" ]; then
54
- if [ -f "$HOME/.ssh/authorized_keys" ]; then
55
- # Key-based auth only (production / HF Spaces)
56
- echo "[start-desktop] Starting sshd (key auth) on $SSHD_LISTEN:$SSHD_PORT ..." >&2
57
- /usr/sbin/sshd -o "Port=$SSHD_PORT" -o "HostKey=$HOST_KEY" \
58
- -o "AuthorizedKeysFile=$HOME/.ssh/authorized_keys" \
59
- -o "PermitEmptyPasswords=no" -o "PasswordAuthentication=no" \
60
- -o "ListenAddress=$SSHD_LISTEN" -o "PidFile=$HOME/.ssh/sshd.pid" \
61
- -o "UsePAM=no" -o "PermitUserEnvironment=yes" -D -e &
62
- else
63
- # No keys configured: allow password-less login for local Docker testing
64
- echo "[start-desktop] Starting sshd (no-password, local test) on $SSHD_LISTEN:$SSHD_PORT ..." >&2
65
- /usr/sbin/sshd -o "Port=$SSHD_PORT" -o "HostKey=$HOST_KEY" \
66
- -o "PermitEmptyPasswords=yes" -o "PasswordAuthentication=yes" \
67
- -o "ListenAddress=$SSHD_LISTEN" -o "PidFile=$HOME/.ssh/sshd.pid" \
68
- -o "UsePAM=no" -o "PermitRootLogin=no" -D -e &
69
- fi
70
- SSHD_PID=$!
71
- sleep 1
72
- echo "[start-desktop] sshd PID=$SSHD_PID" >&2
73
-
74
- # Reverse SSH tunnel (HF Spaces: outbound only on 80/443/8080)
75
- [ -n "${SSH_REVERSE_TARGET-}" ] && ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=60 -R "0.0.0.0:${SSHD_PORT}:127.0.0.1:${SSHD_PORT}" $SSH_REVERSE_TARGET -N &
76
- fi
77
- set -e
78
-
79
- # noVNC: must run in foreground; listen on 0.0.0.0 so HF proxy can reach it
80
- echo "[start-desktop] Starting noVNC on 0.0.0.0:$NOVNC_PORT ..." >&2
81
- # Use bash -c so novnc_proxy runs as main process; if it exits, keep container alive with sleep
82
- exec /bin/bash -c "cd /opt/noVNC && ./utils/novnc_proxy --listen 0.0.0.0:$NOVNC_PORT --vnc localhost:$VNC_PORT --web /opt/noVNC" || exec sleep infinity