Spaces:
Sleeping
Sleeping
clean: remove all VNC/desktop files and references
Browse filesRemove ubuntu-desktop/, Dockerfile.ubuntu-desktop, desktop design docs,
and all VNC/noVNC/XFCE references from README and docs.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Dockerfile.ubuntu-desktop +0 -59
- README.md +3 -4
- docs/GENERAL_USAGE.md +7 -10
- docs/PUSH_DEBUG.md +6 -6
- docs/plans/2025-03-03-ubuntu-desktop-design.md +0 -26
- scripts/monitor_and_test.py +0 -633
- scripts/verify_overnight.sh +0 -38
- ubuntu-desktop/Dockerfile +0 -51
- ubuntu-desktop/README.md +0 -20
- ubuntu-desktop/start-desktop.sh +0 -82
Dockerfile.ubuntu-desktop
DELETED
|
@@ -1,59 +0,0 @@
|
|
| 1 |
-
# Ubuntu 24.04 Desktop on HuggingRun — noVNC on 7860, SSH on 2222, persistence via /data
|
| 2 |
-
FROM ubuntu:24.04
|
| 3 |
-
|
| 4 |
-
ENV DEBIAN_FRONTEND=noninteractive
|
| 5 |
-
|
| 6 |
-
# System + Python (for sync)
|
| 7 |
-
RUN apt-get update && apt-get install -y --no-install-recommends \
|
| 8 |
-
ca-certificates curl python3 python3-pip python3-venv \
|
| 9 |
-
&& pip3 install --no-cache-dir --break-system-packages huggingface_hub \
|
| 10 |
-
&& rm -rf /var/lib/apt/lists/*
|
| 11 |
-
|
| 12 |
-
# Desktop stack: Xvfb, XFCE, dbus, x11vnc, Firefox; OpenSSH for local/reverse SSH
|
| 13 |
-
RUN apt-get update && apt-get install -y --no-install-recommends \
|
| 14 |
-
xvfb \
|
| 15 |
-
xfce4 xfce4-goodies \
|
| 16 |
-
dbus-x11 \
|
| 17 |
-
x11vnc \
|
| 18 |
-
firefox \
|
| 19 |
-
procps \
|
| 20 |
-
openssh-server openssh-client \
|
| 21 |
-
&& rm -rf /var/lib/apt/lists/*
|
| 22 |
-
|
| 23 |
-
# noVNC (web client on 7860)
|
| 24 |
-
RUN apt-get update && apt-get install -y --no-install-recommends git \
|
| 25 |
-
&& git clone --depth 1 https://github.com/novnc/noVNC.git /opt/noVNC \
|
| 26 |
-
&& git clone --depth 1 https://github.com/novnc/websockify /opt/noVNC/utils/websockify \
|
| 27 |
-
&& rm -rf /var/lib/apt/lists/* /opt/noVNC/.git
|
| 28 |
-
|
| 29 |
-
# HF Spaces run as user 1000; UID 1000 may exist (e.g. ubuntu)
|
| 30 |
-
RUN (useradd -m -u 1000 user 2>/dev/null) || \
|
| 31 |
-
(EXISTING=$(getent passwd 1000 | cut -d: -f1); \
|
| 32 |
-
usermod -l user $EXISTING; usermod -d /home/user user; \
|
| 33 |
-
mkdir -p /home/user && chown 1000:1000 /home/user)
|
| 34 |
-
ENV HOME=/home/user
|
| 35 |
-
RUN mkdir -p /data && chown user:user /data
|
| 36 |
-
|
| 37 |
-
# Pre-generate SSH host key so sshd can start without root
|
| 38 |
-
RUN mkdir -p /home/user/.ssh && \
|
| 39 |
-
ssh-keygen -t ed25519 -f /home/user/.ssh/ssh_host_ed25519_key -N "" -C "" && \
|
| 40 |
-
chown -R 1000:1000 /home/user/.ssh
|
| 41 |
-
|
| 42 |
-
# HuggingRun scripts (build context = repo root)
|
| 43 |
-
COPY scripts /scripts
|
| 44 |
-
COPY ubuntu-desktop/start-desktop.sh /opt/start-desktop.sh
|
| 45 |
-
RUN chmod +x /scripts/entrypoint.sh /opt/start-desktop.sh
|
| 46 |
-
|
| 47 |
-
ENV PERSIST_PATH=/data
|
| 48 |
-
ENV RUN_CMD="/opt/start-desktop.sh"
|
| 49 |
-
ENV DESKTOP_HOME=/data/desktop-home
|
| 50 |
-
ENV DISPLAY=:99
|
| 51 |
-
ENV VNC_PORT=5901
|
| 52 |
-
ENV NOVNC_PORT=7860
|
| 53 |
-
# SSH_LISTEN: 0.0.0.0 for local Docker testing, 127.0.0.1 for HF (reverse SSH only)
|
| 54 |
-
ENV SSH_LISTEN=0.0.0.0
|
| 55 |
-
ENV SSH_PORT=2222
|
| 56 |
-
|
| 57 |
-
USER user
|
| 58 |
-
EXPOSE 7860 2222
|
| 59 |
-
ENTRYPOINT ["/scripts/entrypoint.sh"]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
README.md
CHANGED
|
@@ -19,7 +19,7 @@ tags:
|
|
| 19 |
|
| 20 |
**Run anything on Hugging Face.**
|
| 21 |
|
| 22 |
-
HuggingRun 是面向 Hugging Face Spaces 的**通用部署接口**:用同一套工具解决 HF 上的持久化、单端口、网络等限制,让**任意 Docker 应用**都能按同一套流程部署、重启后状态保留。
|
| 23 |
|
| 24 |
- **通用用法(用户最少步骤)**:[docs/GENERAL_USAGE.md](docs/GENERAL_USAGE.md) — 不按其他云容器收费或复杂配置,所有能力围绕通用工具展开。
|
| 25 |
- **通用工具优先**:主要维护的是通用层(持久化同步、单入口、可配置端口)。示例仅演示“最少配置”用法,不在核心脚本中为任何案例写死逻辑。
|
|
@@ -64,10 +64,9 @@ HuggingRun 是面向 Hugging Face Spaces 的**通用部署接口**:用同一
|
|
| 64 |
|
| 65 |
(可选)设 `HF_TOKEN` 与 `AUTO_CREATE_DATASET=true`,重启后 SQLite 数据仍在。
|
| 66 |
|
| 67 |
-
### Ubuntu
|
| 68 |
|
| 69 |
-
|
| 70 |
-
用法:Duplicate 本 Space 后,用 [ubuntu-desktop/Dockerfile](ubuntu-desktop/Dockerfile) 的内容**替换**仓库根目录的 `Dockerfile`,保存后构建即可;无需改通用脚本。详见 [ubuntu-desktop/README.md](ubuntu-desktop/README.md)。
|
| 71 |
|
| 72 |
## 环境变量速查
|
| 73 |
|
|
|
|
| 19 |
|
| 20 |
**Run anything on Hugging Face.**
|
| 21 |
|
| 22 |
+
HuggingRun 是面向 Hugging Face Spaces 的**通用部署接口**:用同一套工具解决 HF 上的持久化、单端口、网络等限制,让**任意 Docker 应用**都能按同一套流程部署、重启后状态保留。
|
| 23 |
|
| 24 |
- **通用用法(用户最少步骤)**:[docs/GENERAL_USAGE.md](docs/GENERAL_USAGE.md) — 不按其他云容器收费或复杂配置,所有能力围绕通用工具展开。
|
| 25 |
- **通用工具优先**:主要维护的是通用层(持久化同步、单入口、可配置端口)。示例仅演示“最少配置”用法,不在核心脚本中为任何案例写死逻辑。
|
|
|
|
| 64 |
|
| 65 |
(可选)设 `HF_TOKEN` 与 `AUTO_CREATE_DATASET=true`,重启后 SQLite 数据仍在。
|
| 66 |
|
| 67 |
+
### Ubuntu Server(Web Terminal + SSH)
|
| 68 |
|
| 69 |
+
使用同一套 `scripts/`,通过 ttyd 提供浏览器 Web Terminal,nginx 反代 + WebSocket-SSH 桥接支持远程 SSH 登录。全盘持久化:整个文件系统镜像同步到 HF Dataset。
|
|
|
|
| 70 |
|
| 71 |
## 环境变量速查
|
| 72 |
|
docs/GENERAL_USAGE.md
CHANGED
|
@@ -1,8 +1,8 @@
|
|
| 1 |
# HuggingRun 通用用法
|
| 2 |
|
| 3 |
-
本文档说明**通用工具**的用法。所有能力都围绕这一套工具展开;示例
|
| 4 |
|
| 5 |
-
**设计目标**:让用这个工具的人可以**正常部署所有东西**。
|
| 6 |
|
| 7 |
---
|
| 8 |
|
|
@@ -38,13 +38,11 @@
|
|
| 38 |
3. 打开 Space 链接即可。
|
| 39 |
无需改代码、无需付费、无需像其他云容器那样单独买持久盘或做复杂配置。
|
| 40 |
|
| 41 |
-
### 场景 B:跑「另一种镜像」示例
|
| 42 |
|
| 43 |
-
- 仍用**同一套通用工具**:只是
|
| 44 |
-
- 操作:Duplicate 本 Space 后,
|
| 45 |
-
- 之后同样只需在 Settings 里配 Secrets(如 `HF_TOKEN`),无需在通用脚本里加
|
| 46 |
-
|
| 47 |
-
**Ubuntu 桌面示例步骤**:见 [ubuntu-desktop/README.md](../ubuntu-desktop/README.md)。方式一:用 `ubuntu-desktop/Dockerfile` 的内容替换根目录 `Dockerfile` 后推送。方式二:新建一个 Space,将本仓库的 **deploy-ubuntu-desktop** 分支推送到该 Space 的 main(该分支根目录已是桌面 Dockerfile,仍使用同一套 `scripts/`)。
|
| 48 |
|
| 49 |
---
|
| 50 |
|
|
@@ -67,5 +65,4 @@
|
|
| 67 |
## 和「其他云容器」的对比
|
| 68 |
|
| 69 |
- **其它云**:往往要选机型、买持久盘、配网络/密钥等,步骤多、有持续费用。
|
| 70 |
-
- **HuggingRun**:Duplicate Space → 按需设 `HF_TOKEN` / `RUN_CMD`(或换示例 Dockerfile),即可跑任意兼容 Docker 的应用,持久化用 HF Dataset,不额外付费。
|
| 71 |
-
所有修改都围绕这套**通用工具**展开;示例(包括 Ubuntu 桌面)只演示用法,不扩展通用层为「专用逻辑」。
|
|
|
|
| 1 |
# HuggingRun 通用用法
|
| 2 |
|
| 3 |
+
本文档说明**通用工具**的用法。所有能力都围绕这一套工具展开;示例只是「同一条通用流水线 + 不同 RUN_CMD 或不同 Dockerfile」的用法,不做单独定制。
|
| 4 |
|
| 5 |
+
**设计目标**:让用这个工具的人可以**正常部署所有东西**。
|
| 6 |
|
| 7 |
---
|
| 8 |
|
|
|
|
| 38 |
3. 打开 Space 链接即可。
|
| 39 |
无需改代码、无需付费、无需像其他云容器那样单独买持久盘或做复杂配置。
|
| 40 |
|
| 41 |
+
### 场景 B:跑「另一种镜像」示例
|
| 42 |
|
| 43 |
+
- 仍用**同一套通用工具**:只是换 Dockerfile。
|
| 44 |
+
- 操作:Duplicate 本 Space 后,替换根目录的 `Dockerfile` 内容。
|
| 45 |
+
- 之后同样只需在 Settings 里配 Secrets(如 `HF_TOKEN`),无需在通用脚本里加专用逻辑。
|
|
|
|
|
|
|
| 46 |
|
| 47 |
---
|
| 48 |
|
|
|
|
| 65 |
## 和「其他云容器」的对比
|
| 66 |
|
| 67 |
- **其它云**:往往要选机型、买持久盘、配网络/密钥等,步骤多、有持续费用。
|
| 68 |
+
- **HuggingRun**:Duplicate Space → 按需设 `HF_TOKEN` / `RUN_CMD`(或换示例 Dockerfile),即可跑任意兼容 Docker 的应用,持久化用 HF Dataset,不额外付费。
|
|
|
docs/PUSH_DEBUG.md
CHANGED
|
@@ -87,10 +87,10 @@ curl -N -H "Authorization: Bearer $HF_TOKEN" \
|
|
| 87 |
# Demo 或默认 Space
|
| 88 |
HF_TOKEN=你的token python3 scripts/monitor_and_test.py --wait-running --test
|
| 89 |
|
| 90 |
-
#
|
| 91 |
HF_TOKEN=你的token python3 scripts/monitor_and_test.py --wait-running --test \
|
| 92 |
--url https://你的用户名-你的Space名.hf.space \
|
| 93 |
-
--expect "
|
| 94 |
```
|
| 95 |
|
| 96 |
**方式 B:无 HF_TOKEN 时**(只轮询 URL 直到页面出现期望内容)
|
|
@@ -98,17 +98,17 @@ HF_TOKEN=你的token python3 scripts/monitor_and_test.py --wait-running --test \
|
|
| 98 |
```bash
|
| 99 |
python3 scripts/monitor_and_test.py --wait-url --test \
|
| 100 |
--url https://你的用户名-你的Space名.hf.space \
|
| 101 |
-
--expect "
|
| 102 |
```
|
| 103 |
|
| 104 |
-
脚本会先轮询直到 GET 200 且 body 含你给的 `--expect`
|
| 105 |
|
| 106 |
### 2.5 不等待、直接测当前页面(Space 已 RUNNING 时)
|
| 107 |
|
| 108 |
```bash
|
| 109 |
python3 scripts/monitor_and_test.py --test
|
| 110 |
# 或
|
| 111 |
-
python3 scripts/monitor_and_test.py --url https://xxx.hf.space --test --expect "
|
| 112 |
```
|
| 113 |
|
| 114 |
---
|
|
@@ -122,7 +122,7 @@ python3 scripts/monitor_and_test.py --url https://xxx.hf.space --test --expect "
|
|
| 122 |
|
| 123 |
2. **构建完成后**:另一个终端等 RUNNING 并跑测试。
|
| 124 |
```bash
|
| 125 |
-
HF_TOKEN=xxx python3 scripts/monitor_and_test.py --until-ok --url https://tao-shen-huggingrun.hf.space --expect "
|
| 126 |
```
|
| 127 |
|
| 128 |
3. 若 **测试失败或一直 503**:用 `--logs run`(以及 `--logs build`)看容器内报错,修代码后:
|
|
|
|
| 87 |
# Demo 或默认 Space
|
| 88 |
HF_TOKEN=你的token python3 scripts/monitor_and_test.py --wait-running --test
|
| 89 |
|
| 90 |
+
# 自定义 expect 内容
|
| 91 |
HF_TOKEN=你的token python3 scripts/monitor_and_test.py --wait-running --test \
|
| 92 |
--url https://你的用户名-你的Space名.hf.space \
|
| 93 |
+
--expect "ttyd"
|
| 94 |
```
|
| 95 |
|
| 96 |
**方式 B:无 HF_TOKEN 时**(只轮询 URL 直到页面出现期望内容)
|
|
|
|
| 98 |
```bash
|
| 99 |
python3 scripts/monitor_and_test.py --wait-url --test \
|
| 100 |
--url https://你的用户名-你的Space名.hf.space \
|
| 101 |
+
--expect "ttyd" --max-wait 900
|
| 102 |
```
|
| 103 |
|
| 104 |
+
脚本会先轮询直到 GET 200 且 body 含你给的 `--expect`,再跑:基础 GET、压力请求、多轮持久化检查。**全部通过才 exit 0**,任一失败则 exit 1。
|
| 105 |
|
| 106 |
### 2.5 不等待、直接测当前页面(Space 已 RUNNING 时)
|
| 107 |
|
| 108 |
```bash
|
| 109 |
python3 scripts/monitor_and_test.py --test
|
| 110 |
# 或
|
| 111 |
+
python3 scripts/monitor_and_test.py --url https://xxx.hf.space --test --expect "ttyd"
|
| 112 |
```
|
| 113 |
|
| 114 |
---
|
|
|
|
| 122 |
|
| 123 |
2. **构建完成后**:另一个终端等 RUNNING 并跑测试。
|
| 124 |
```bash
|
| 125 |
+
HF_TOKEN=xxx python3 scripts/monitor_and_test.py --until-ok --url https://tao-shen-huggingrun.hf.space --expect "ttyd"
|
| 126 |
```
|
| 127 |
|
| 128 |
3. 若 **测试失败或一直 503**:用 `--logs run`(以及 `--logs build`)看容器内报错,修代码后:
|
docs/plans/2025-03-03-ubuntu-desktop-design.md
DELETED
|
@@ -1,26 +0,0 @@
|
|
| 1 |
-
# Ubuntu 桌面版 on HuggingRun 设计
|
| 2 |
-
|
| 3 |
-
**目标**: 在 HuggingRun 上部署最新版 Ubuntu 桌面(浏览器内 noVNC 完整桌面),打通常用功能,重启后状态完整保留。
|
| 4 |
-
|
| 5 |
-
## 方案
|
| 6 |
-
|
| 7 |
-
- **基础镜像**: Ubuntu 24.04 LTS
|
| 8 |
-
- **桌面**: XFCE(轻量,适合 2 vCPU / 16GB)
|
| 9 |
-
- **显示**: Xvfb 虚拟显示 + TigerVNC + noVNC(noVNC 监听 7860,满足 HF Spaces)
|
| 10 |
-
- **持久化**: 桌面用户 HOME 放在 `PERSIST_PATH`(默认 `/data/desktop-home`),由现有 sync_hf.py 同步到 HF Dataset;启动时先恢复再挂载/HOME 指向该目录
|
| 11 |
-
- **入口**: 独立 `ubuntu-desktop/` 目录,自有 Dockerfile;entrypoint 先执行 sync 恢复,再启动 Xvfb → 桌面 → VNC → noVNC
|
| 12 |
-
|
| 13 |
-
## 完成标准(迭代开发)
|
| 14 |
-
|
| 15 |
-
- [ ] `ubuntu-desktop/` 可独立构建并运行,浏览器访问 7860 看到完整 XFCE 桌面
|
| 16 |
-
- [ ] 桌面功能可用:文件管理器、终端、浏览器(Firefox)、文本编辑器
|
| 17 |
-
- [ ] 设置 HF_TOKEN + AUTO_CREATE_DATASET 后,重启 Space 后桌面状态(桌面文件、配置、已装软件状态)保留,无报错
|
| 18 |
-
- [ ] 周期性同步与退出时同步正常,无遗漏
|
| 19 |
-
|
| 20 |
-
## 实现要点
|
| 21 |
-
|
| 22 |
-
1. **Dockerfile.ubuntu-desktop**: FROM ubuntu:24.04,装 python3、huggingface_hub、XFCE、TigerVNC、noVNC、Firefox;复制 HuggingRun scripts;用户 uid 1000;HOME 指向持久化目录
|
| 23 |
-
2. **entrypoint_desktop**: 恢复 `/data` → 创建并绑定 `/data/desktop-home` 为桌面 HOME → 启动 sync 后台 → 启动 Xvfb、dbus、XFCE、x11vnc/tigervnc、noVNC(监听 7860)
|
| 24 |
-
3. **PERSIST_PATH**: 使用 `/data`,`/data/desktop-home` 存桌面主目录;sync 继续上传/下载整个 `/data`
|
| 25 |
-
|
| 26 |
-
日期: 2025-03-03
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
scripts/monitor_and_test.py
DELETED
|
@@ -1,633 +0,0 @@
|
|
| 1 |
-
#!/usr/bin/env python3
|
| 2 |
-
"""
|
| 3 |
-
HuggingRun: 监控远端 Space 状态并执行基础/压力/持久化验证(通用工具,适用于任意 Space)。
|
| 4 |
-
轮询用 HF API(runtime 状态 + build/run 日志),不是只轮询 URL。
|
| 5 |
-
|
| 6 |
-
用法:
|
| 7 |
-
python3 scripts/monitor_and_test.py --test
|
| 8 |
-
python3 scripts/monitor_and_test.py --ssh-test --ssh-host localhost --ssh-port 2222 --ssh-user user
|
| 9 |
-
python3 scripts/monitor_and_test.py --ssh-test --ssh-stress-n 30 --ssh-host localhost
|
| 10 |
-
HF_TOKEN=xxx python3 scripts/monitor_and_test.py --watch
|
| 11 |
-
HF_TOKEN=xxx python3 scripts/monitor_and_test.py --until-ok --url https://xxx.hf.space --expect noVNC
|
| 12 |
-
HF_TOKEN=xxx python3 scripts/monitor_and_test.py --logs run
|
| 13 |
-
HF_TOKEN=xxx python3 scripts/monitor_and_test.py --logs build
|
| 14 |
-
等价 curl(需 Bearer token):
|
| 15 |
-
curl -N -H "Authorization: Bearer $HF_TOKEN" "https://huggingface.co/api/spaces/<SPACE_ID>/logs/run"
|
| 16 |
-
curl -N -H "Authorization: Bearer $HF_TOKEN" "https://huggingface.co/api/spaces/<SPACE_ID>/logs/build"
|
| 17 |
-
"""
|
| 18 |
-
import argparse
|
| 19 |
-
import os
|
| 20 |
-
import sys
|
| 21 |
-
import time
|
| 22 |
-
import urllib.request
|
| 23 |
-
import urllib.error
|
| 24 |
-
|
| 25 |
-
# Load .env from repo root if present (HF_TOKEN etc.); never commit .env
|
| 26 |
-
def _load_dotenv():
|
| 27 |
-
if os.environ.get("HF_TOKEN"):
|
| 28 |
-
return
|
| 29 |
-
root = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
|
| 30 |
-
env_file = os.path.join(root, ".env")
|
| 31 |
-
if not os.path.isfile(env_file):
|
| 32 |
-
return
|
| 33 |
-
with open(env_file) as f:
|
| 34 |
-
for line in f:
|
| 35 |
-
line = line.strip()
|
| 36 |
-
if not line or line.startswith("#"):
|
| 37 |
-
continue
|
| 38 |
-
if "=" in line:
|
| 39 |
-
k, v = line.split("=", 1)
|
| 40 |
-
k, v = k.strip(), v.strip().strip('"').strip("'")
|
| 41 |
-
if k and v and k not in os.environ:
|
| 42 |
-
os.environ[k] = v
|
| 43 |
-
|
| 44 |
-
_load_dotenv()
|
| 45 |
-
|
| 46 |
-
SPACE_ID = os.environ.get("SPACE_ID", "tao-shen/HuggingRun")
|
| 47 |
-
HF_LOGS_BASE = "https://huggingface.co/api/spaces"
|
| 48 |
-
# HF Space app URL (replace / with - and often lowercase)
|
| 49 |
-
APP_URL = os.environ.get("APP_URL", "https://tao-shen-huggingrun.hf.space")
|
| 50 |
-
|
| 51 |
-
|
| 52 |
-
def get_runtime():
|
| 53 |
-
try:
|
| 54 |
-
from huggingface_hub import HfApi
|
| 55 |
-
token = os.environ.get("HF_TOKEN")
|
| 56 |
-
if not token:
|
| 57 |
-
return None, "HF_TOKEN not set"
|
| 58 |
-
api = HfApi(token=token)
|
| 59 |
-
rt = api.get_space_runtime(SPACE_ID)
|
| 60 |
-
return rt, None
|
| 61 |
-
except Exception as e:
|
| 62 |
-
return None, str(e)
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
def get_stage():
|
| 66 |
-
"""当前 state 一次查询,立即返回。返回 (stage, err)。"""
|
| 67 |
-
rt, err = get_runtime()
|
| 68 |
-
if err:
|
| 69 |
-
return None, err
|
| 70 |
-
stage = getattr(rt, "stage", None) or (getattr(rt, "raw", None) or {}).get("stage")
|
| 71 |
-
return stage, None
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
def wait_running(max_wait_sec=600, poll_interval=15, app_url=None, expect_substrings=None):
|
| 75 |
-
"""轮询直到 stage == RUNNING 或 APP_STARTING 且 URL 已 200+期望内容;先立即查一次,已失败则马上返回。"""
|
| 76 |
-
start = time.time()
|
| 77 |
-
first = True
|
| 78 |
-
while (time.time() - start) < max_wait_sec:
|
| 79 |
-
if not first:
|
| 80 |
-
time.sleep(poll_interval)
|
| 81 |
-
first = False
|
| 82 |
-
stage, err = get_stage()
|
| 83 |
-
if err:
|
| 84 |
-
print(f"[monitor] get_runtime error: {err}")
|
| 85 |
-
continue
|
| 86 |
-
print(f"[monitor] Space {SPACE_ID} stage={stage}")
|
| 87 |
-
if stage == "RUNNING":
|
| 88 |
-
return True
|
| 89 |
-
if stage == "ERROR" or stage == "BUILD_ERROR":
|
| 90 |
-
print(f"[monitor] Space in error state: {stage}")
|
| 91 |
-
return False
|
| 92 |
-
# APP_STARTING 时若 URL 已可访问则视为就绪(HF 可能迟迟不标 RUNNING)
|
| 93 |
-
if stage == "APP_STARTING" and app_url and expect_substrings:
|
| 94 |
-
status, body = http_get(app_url, timeout=10)
|
| 95 |
-
if status == 200 and any(s in body for s in expect_substrings):
|
| 96 |
-
print(f"[monitor] App URL ready (stage still APP_STARTING)")
|
| 97 |
-
return True
|
| 98 |
-
print("[monitor] Timeout waiting for RUNNING")
|
| 99 |
-
return False
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
def wait_url(url, expect_substrings=None, max_wait_sec=900, poll_interval=20):
|
| 103 |
-
"""轮询 URL 直到 GET 200 且 body 含任一 expect_substrings;无 HF_TOKEN 时用。"""
|
| 104 |
-
if expect_substrings is None:
|
| 105 |
-
expect_substrings = ("HuggingRun", "Run anything", "noVNC")
|
| 106 |
-
start = time.time()
|
| 107 |
-
while (time.time() - start) < max_wait_sec:
|
| 108 |
-
status, body = http_get(url, timeout=30)
|
| 109 |
-
if status == 200 and any(s in body for s in expect_substrings):
|
| 110 |
-
print(f"[monitor] URL ready: {url}")
|
| 111 |
-
return True
|
| 112 |
-
print(f"[monitor] URL not ready: status={status}, waiting {poll_interval}s ...")
|
| 113 |
-
time.sleep(poll_interval)
|
| 114 |
-
print("[monitor] Timeout waiting for URL content")
|
| 115 |
-
return False
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
def http_get(url, timeout=30, retries=3, retry_delay=2):
|
| 119 |
-
"""GET url; retry on 502/503/timeout/connection errors (generic HF robustness)."""
|
| 120 |
-
last_status, last_body, last_err = None, "", None
|
| 121 |
-
for attempt in range(max(1, retries)):
|
| 122 |
-
try:
|
| 123 |
-
req = urllib.request.Request(url, method="GET")
|
| 124 |
-
with urllib.request.urlopen(req, timeout=timeout) as resp:
|
| 125 |
-
body = resp.read().decode("utf-8", errors="replace")
|
| 126 |
-
return (resp.status, body)
|
| 127 |
-
except urllib.error.HTTPError as e:
|
| 128 |
-
last_status = e.code
|
| 129 |
-
last_body = e.read().decode("utf-8", errors="replace") if e.fp else ""
|
| 130 |
-
last_err = e
|
| 131 |
-
if e.code in (502, 503) and attempt < retries - 1:
|
| 132 |
-
time.sleep(retry_delay)
|
| 133 |
-
continue
|
| 134 |
-
return (e.code, last_body)
|
| 135 |
-
except (OSError, urllib.error.URLError) as e:
|
| 136 |
-
last_err = e
|
| 137 |
-
last_status = -1
|
| 138 |
-
last_body = str(e)
|
| 139 |
-
if attempt < retries - 1:
|
| 140 |
-
time.sleep(retry_delay)
|
| 141 |
-
continue
|
| 142 |
-
return (-1, last_body)
|
| 143 |
-
return (last_status or -1, last_body or str(last_err or ""))
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
def test_basic(url, expect_substrings=None):
|
| 147 |
-
"""GET url; pass if status 200 and body contains any of expect_substrings (default: HuggingRun / Run anything)."""
|
| 148 |
-
if expect_substrings is None:
|
| 149 |
-
expect_substrings = ("HuggingRun", "Run anything")
|
| 150 |
-
status, body = http_get(url)
|
| 151 |
-
found = any(s in body for s in expect_substrings)
|
| 152 |
-
ok = status == 200 and found
|
| 153 |
-
print(f"[test] GET {url} -> {status}, body contains expected: {found}")
|
| 154 |
-
return ok
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
def test_stress(url, n=50, concurrency=10):
|
| 158 |
-
"""连续请求 n 次(简单串行),检查均返回 200。"""
|
| 159 |
-
import concurrent.futures
|
| 160 |
-
failed = 0
|
| 161 |
-
def one(i):
|
| 162 |
-
s, _ = http_get(url, timeout=15)
|
| 163 |
-
return s == 200
|
| 164 |
-
with concurrent.futures.ThreadPoolExecutor(max_workers=concurrency) as ex:
|
| 165 |
-
results = list(ex.map(one, range(n)))
|
| 166 |
-
passed = sum(results)
|
| 167 |
-
failed = n - passed
|
| 168 |
-
print(f"[stress] {n} requests: {passed} ok, {failed} failed")
|
| 169 |
-
return failed == 0
|
| 170 |
-
|
| 171 |
-
|
| 172 |
-
def test_persistence(url, rounds=3):
|
| 173 |
-
"""多轮访问,每轮均需返回 200(通用:任意应用只要稳定返回 200 即通过)。"""
|
| 174 |
-
ok_rounds = 0
|
| 175 |
-
for _ in range(rounds):
|
| 176 |
-
status, _ = http_get(url)
|
| 177 |
-
if status == 200:
|
| 178 |
-
ok_rounds += 1
|
| 179 |
-
time.sleep(1)
|
| 180 |
-
print(f"[persistence] {rounds} rounds: {ok_rounds} ok")
|
| 181 |
-
return ok_rounds == rounds
|
| 182 |
-
|
| 183 |
-
|
| 184 |
-
# ── SSH Tests ────────────────────────────────────────────────────────────────
|
| 185 |
-
|
| 186 |
-
def _ssh_cmd(host, port, user, command, timeout=15, identity_file=None):
|
| 187 |
-
"""Run a command over SSH. Returns (returncode, stdout, stderr)."""
|
| 188 |
-
import subprocess
|
| 189 |
-
cmd = [
|
| 190 |
-
"ssh", "-o", "StrictHostKeyChecking=no",
|
| 191 |
-
"-o", "UserKnownHostsFile=/dev/null",
|
| 192 |
-
"-o", f"ConnectTimeout={timeout}",
|
| 193 |
-
"-o", "LogLevel=ERROR",
|
| 194 |
-
"-p", str(port),
|
| 195 |
-
]
|
| 196 |
-
if identity_file:
|
| 197 |
-
cmd += ["-i", identity_file]
|
| 198 |
-
cmd += [f"{user}@{host}", command]
|
| 199 |
-
try:
|
| 200 |
-
proc = subprocess.run(cmd, capture_output=True, text=True, timeout=timeout + 5)
|
| 201 |
-
return proc.returncode, proc.stdout, proc.stderr
|
| 202 |
-
except subprocess.TimeoutExpired:
|
| 203 |
-
return -1, "", "SSH command timed out"
|
| 204 |
-
except Exception as e:
|
| 205 |
-
return -1, "", str(e)
|
| 206 |
-
|
| 207 |
-
|
| 208 |
-
def test_ssh_connect(host, port, user, identity_file=None):
|
| 209 |
-
"""Test SSH connectivity: run 'echo ok' and verify output."""
|
| 210 |
-
rc, out, err = _ssh_cmd(host, port, user, "echo ok", identity_file=identity_file)
|
| 211 |
-
ok = rc == 0 and "ok" in out
|
| 212 |
-
print(f"[ssh-test] connect {user}@{host}:{port} -> rc={rc}, output={'ok' if ok else repr(out.strip())}")
|
| 213 |
-
if not ok and err:
|
| 214 |
-
print(f"[ssh-test] stderr: {err.strip()}")
|
| 215 |
-
return ok
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
def test_ssh_command(host, port, user, identity_file=None):
|
| 219 |
-
"""Test SSH command execution: run several diagnostic commands."""
|
| 220 |
-
checks = [
|
| 221 |
-
("whoami", lambda out: user in out),
|
| 222 |
-
("uname -s", lambda out: "Linux" in out),
|
| 223 |
-
("which claude || echo no-claude", lambda out: "claude" in out.lower()),
|
| 224 |
-
("pgrep -a ttyd || pgrep -a sshd", lambda out: len(out.strip()) > 0),
|
| 225 |
-
]
|
| 226 |
-
all_ok = True
|
| 227 |
-
for cmd, validate in checks:
|
| 228 |
-
rc, out, err = _ssh_cmd(host, port, user, cmd, identity_file=identity_file)
|
| 229 |
-
passed = rc == 0 and validate(out)
|
| 230 |
-
status = "PASS" if passed else "FAIL"
|
| 231 |
-
print(f"[ssh-test] cmd '{cmd}' -> {status} (rc={rc}, out={out.strip()[:80]})")
|
| 232 |
-
if not passed:
|
| 233 |
-
all_ok = False
|
| 234 |
-
return all_ok
|
| 235 |
-
|
| 236 |
-
|
| 237 |
-
def test_ssh_stress(host, port, user, n=30, concurrency=10, identity_file=None):
|
| 238 |
-
"""SSH stress test: n concurrent SSH sessions each running a command."""
|
| 239 |
-
import concurrent.futures
|
| 240 |
-
|
| 241 |
-
def one_session(i):
|
| 242 |
-
rc, out, _ = _ssh_cmd(host, port, user, f"echo session-{i} && uptime",
|
| 243 |
-
timeout=20, identity_file=identity_file)
|
| 244 |
-
return rc == 0 and f"session-{i}" in out
|
| 245 |
-
|
| 246 |
-
with concurrent.futures.ThreadPoolExecutor(max_workers=concurrency) as ex:
|
| 247 |
-
results = list(ex.map(one_session, range(n)))
|
| 248 |
-
passed = sum(results)
|
| 249 |
-
failed = n - passed
|
| 250 |
-
print(f"[ssh-stress] {n} sessions (concurrency={concurrency}): {passed} ok, {failed} failed")
|
| 251 |
-
return failed == 0
|
| 252 |
-
|
| 253 |
-
|
| 254 |
-
def test_ssh_bruteforce(host, port, user, rounds=3, ramp_up=None, identity_file=None):
|
| 255 |
-
"""Multi-round SSH stress with increasing concurrency (brute-force style)."""
|
| 256 |
-
if ramp_up is None:
|
| 257 |
-
ramp_up = [(20, 5), (40, 10), (60, 20)]
|
| 258 |
-
all_ok = True
|
| 259 |
-
for r in range(rounds):
|
| 260 |
-
n, conc = ramp_up[r % len(ramp_up)]
|
| 261 |
-
print(f"[ssh-bruteforce] Round {r+1}/{rounds}: {n} sessions, concurrency={conc}")
|
| 262 |
-
ok = test_ssh_stress(host, port, user, n=n, concurrency=conc, identity_file=identity_file)
|
| 263 |
-
if not ok:
|
| 264 |
-
all_ok = False
|
| 265 |
-
print(f"[ssh-bruteforce] Round {r+1} FAILED")
|
| 266 |
-
break
|
| 267 |
-
time.sleep(1)
|
| 268 |
-
if all_ok:
|
| 269 |
-
print(f"[ssh-bruteforce] ALL {rounds} rounds PASSED")
|
| 270 |
-
return all_ok
|
| 271 |
-
|
| 272 |
-
|
| 273 |
-
def test_ssh_persistence_stress(host, port, user, persist_path="/data",
|
| 274 |
-
n_files=100, concurrency=10, identity_file=None):
|
| 275 |
-
"""Persistence stress test: write many files via SSH, verify they exist, check integrity.
|
| 276 |
-
|
| 277 |
-
Tests the operating system's persistent storage under load:
|
| 278 |
-
1. Write n_files with known content (concurrent)
|
| 279 |
-
2. Verify all files exist and content matches
|
| 280 |
-
3. Write large files to test storage capacity
|
| 281 |
-
4. Verify checksums
|
| 282 |
-
"""
|
| 283 |
-
import concurrent.futures
|
| 284 |
-
import hashlib
|
| 285 |
-
|
| 286 |
-
test_dir = f"{persist_path}/stress-test-{int(time.time())}"
|
| 287 |
-
print(f"[persist-stress] Creating {n_files} files in {test_dir} ...")
|
| 288 |
-
|
| 289 |
-
# Phase 1: Create test directory
|
| 290 |
-
rc, _, err = _ssh_cmd(host, port, user, f"mkdir -p {test_dir}", identity_file=identity_file)
|
| 291 |
-
if rc != 0:
|
| 292 |
-
print(f"[persist-stress] FAIL: cannot mkdir {test_dir}: {err}")
|
| 293 |
-
return False
|
| 294 |
-
|
| 295 |
-
# Phase 2: Write files concurrently
|
| 296 |
-
def write_file(i):
|
| 297 |
-
content = f"persistence-test-file-{i}-{time.time()}"
|
| 298 |
-
cmd = f"echo '{content}' > {test_dir}/file_{i:04d}.txt"
|
| 299 |
-
rc, _, _ = _ssh_cmd(host, port, user, cmd, timeout=20, identity_file=identity_file)
|
| 300 |
-
return rc == 0, content
|
| 301 |
-
|
| 302 |
-
with concurrent.futures.ThreadPoolExecutor(max_workers=concurrency) as ex:
|
| 303 |
-
results = list(ex.map(write_file, range(n_files)))
|
| 304 |
-
written = sum(1 for ok, _ in results if ok)
|
| 305 |
-
print(f"[persist-stress] Written: {written}/{n_files} files")
|
| 306 |
-
if written < n_files:
|
| 307 |
-
print(f"[persist-stress] FAIL: only {written}/{n_files} files written")
|
| 308 |
-
return False
|
| 309 |
-
|
| 310 |
-
# Phase 3: Verify all files exist
|
| 311 |
-
rc, out, _ = _ssh_cmd(host, port, user, f"ls {test_dir}/ | wc -l",
|
| 312 |
-
timeout=30, identity_file=identity_file)
|
| 313 |
-
count = int(out.strip()) if rc == 0 and out.strip().isdigit() else 0
|
| 314 |
-
print(f"[persist-stress] Verified: {count} files exist on disk")
|
| 315 |
-
if count < n_files:
|
| 316 |
-
print(f"[persist-stress] FAIL: expected {n_files}, found {count}")
|
| 317 |
-
return False
|
| 318 |
-
|
| 319 |
-
# Phase 4: Write a large file (1MB) to test storage
|
| 320 |
-
rc, _, err = _ssh_cmd(host, port, user,
|
| 321 |
-
f"dd if=/dev/urandom of={test_dir}/large_1mb.bin bs=1024 count=1024 2>/dev/null && "
|
| 322 |
-
f"ls -la {test_dir}/large_1mb.bin",
|
| 323 |
-
timeout=30, identity_file=identity_file)
|
| 324 |
-
if rc != 0:
|
| 325 |
-
print(f"[persist-stress] FAIL: cannot write large file: {err}")
|
| 326 |
-
return False
|
| 327 |
-
print(f"[persist-stress] Large file (1MB) written OK")
|
| 328 |
-
|
| 329 |
-
# Phase 5: Compute and verify checksum
|
| 330 |
-
rc, out, _ = _ssh_cmd(host, port, user,
|
| 331 |
-
f"sha256sum {test_dir}/large_1mb.bin",
|
| 332 |
-
timeout=30, identity_file=identity_file)
|
| 333 |
-
if rc != 0 or not out.strip():
|
| 334 |
-
print(f"[persist-stress] FAIL: cannot compute checksum")
|
| 335 |
-
return False
|
| 336 |
-
checksum1 = out.strip().split()[0]
|
| 337 |
-
|
| 338 |
-
# Re-read and verify checksum matches
|
| 339 |
-
rc, out, _ = _ssh_cmd(host, port, user,
|
| 340 |
-
f"sha256sum {test_dir}/large_1mb.bin",
|
| 341 |
-
timeout=30, identity_file=identity_file)
|
| 342 |
-
checksum2 = out.strip().split()[0] if rc == 0 else ""
|
| 343 |
-
if checksum1 != checksum2:
|
| 344 |
-
print(f"[persist-stress] FAIL: checksum mismatch {checksum1} != {checksum2}")
|
| 345 |
-
return False
|
| 346 |
-
print(f"[persist-stress] Checksum verified: {checksum1[:16]}...")
|
| 347 |
-
|
| 348 |
-
# Phase 6: Concurrent read-write (simulates real usage)
|
| 349 |
-
def read_write(i):
|
| 350 |
-
# Read existing file, write new one
|
| 351 |
-
rc1, out, _ = _ssh_cmd(host, port, user,
|
| 352 |
-
f"cat {test_dir}/file_{i:04d}.txt",
|
| 353 |
-
timeout=20, identity_file=identity_file)
|
| 354 |
-
rc2, _, _ = _ssh_cmd(host, port, user,
|
| 355 |
-
f"echo 'updated-{i}' >> {test_dir}/file_{i:04d}.txt",
|
| 356 |
-
timeout=20, identity_file=identity_file)
|
| 357 |
-
return rc1 == 0 and rc2 == 0
|
| 358 |
-
|
| 359 |
-
print(f"[persist-stress] Concurrent read-write test ({n_files} files, {concurrency} workers)...")
|
| 360 |
-
with concurrent.futures.ThreadPoolExecutor(max_workers=concurrency) as ex:
|
| 361 |
-
results = list(ex.map(read_write, range(n_files)))
|
| 362 |
-
rw_ok = sum(results)
|
| 363 |
-
print(f"[persist-stress] Read-write: {rw_ok}/{n_files} ok")
|
| 364 |
-
|
| 365 |
-
# Cleanup
|
| 366 |
-
_ssh_cmd(host, port, user, f"rm -rf {test_dir}", timeout=30, identity_file=identity_file)
|
| 367 |
-
|
| 368 |
-
all_ok = rw_ok == n_files
|
| 369 |
-
if all_ok:
|
| 370 |
-
print(f"[persist-stress] ALL PERSISTENCE TESTS PASSED")
|
| 371 |
-
return all_ok
|
| 372 |
-
|
| 373 |
-
|
| 374 |
-
def _curl_logs_url(space_id: str, log_type: str) -> str:
|
| 375 |
-
"""Build the logs API URL (same as user's curl command)."""
|
| 376 |
-
return f"https://huggingface.co/api/spaces/{space_id}/logs/{log_type}"
|
| 377 |
-
|
| 378 |
-
|
| 379 |
-
def stream_logs(space_id: str, log_type: str):
|
| 380 |
-
"""Stream build or run logs via curl (user's command). Requires HF_TOKEN."""
|
| 381 |
-
import subprocess
|
| 382 |
-
token = os.environ.get("HF_TOKEN")
|
| 383 |
-
if not token:
|
| 384 |
-
print("HF_TOKEN required for --logs", file=sys.stderr)
|
| 385 |
-
sys.exit(1)
|
| 386 |
-
url = _curl_logs_url(space_id, log_type)
|
| 387 |
-
# curl -N -H "Authorization: Bearer $HF_TOKEN" "https://huggingface.co/api/spaces/<SPACE_ID>/logs/run|build"
|
| 388 |
-
try:
|
| 389 |
-
proc = subprocess.Popen(
|
| 390 |
-
["curl", "-N", "-sS", "-H", f"Authorization: Bearer {token}", url],
|
| 391 |
-
stdout=subprocess.stdout,
|
| 392 |
-
stderr=subprocess.stderr,
|
| 393 |
-
)
|
| 394 |
-
proc.wait()
|
| 395 |
-
if proc.returncode != 0:
|
| 396 |
-
sys.exit(proc.returncode or 1)
|
| 397 |
-
except FileNotFoundError:
|
| 398 |
-
print("curl not found; falling back to urllib", file=sys.stderr)
|
| 399 |
-
req = urllib.request.Request(url, method="GET")
|
| 400 |
-
req.add_header("Authorization", f"Bearer {token}")
|
| 401 |
-
with urllib.request.urlopen(req, timeout=5) as resp:
|
| 402 |
-
while True:
|
| 403 |
-
chunk = resp.read(4096)
|
| 404 |
-
if not chunk:
|
| 405 |
-
break
|
| 406 |
-
sys.stdout.buffer.write(chunk)
|
| 407 |
-
sys.stdout.flush()
|
| 408 |
-
except Exception as e:
|
| 409 |
-
print(f"Logs error: {e}", file=sys.stderr)
|
| 410 |
-
sys.exit(1)
|
| 411 |
-
|
| 412 |
-
|
| 413 |
-
def fetch_log_tail(space_id: str, log_type: str, read_timeout=60, keep_tail_chars=25000):
|
| 414 |
-
"""Fetch log via curl (user's command), return last keep_tail_chars. Used when build/run fails."""
|
| 415 |
-
import subprocess
|
| 416 |
-
token = os.environ.get("HF_TOKEN")
|
| 417 |
-
if not token:
|
| 418 |
-
return "(HF_TOKEN not set — set it and run again to see logs)"
|
| 419 |
-
url = _curl_logs_url(space_id, log_type)
|
| 420 |
-
try:
|
| 421 |
-
proc = subprocess.run(
|
| 422 |
-
["curl", "-N", "-sS", "-H", f"Authorization: Bearer {token}", "--max-time", str(read_timeout), url],
|
| 423 |
-
capture_output=True,
|
| 424 |
-
text=True,
|
| 425 |
-
timeout=read_timeout + 10,
|
| 426 |
-
)
|
| 427 |
-
out = (proc.stdout or "") + (proc.stderr or "")
|
| 428 |
-
return out[-keep_tail_chars:] if len(out) > keep_tail_chars else out
|
| 429 |
-
except FileNotFoundError:
|
| 430 |
-
# fallback to urllib
|
| 431 |
-
req = urllib.request.Request(url, method="GET")
|
| 432 |
-
req.add_header("Authorization", f"Bearer {token}")
|
| 433 |
-
with urllib.request.urlopen(req, timeout=read_timeout) as resp:
|
| 434 |
-
out = resp.read().decode("utf-8", errors="replace")
|
| 435 |
-
return out[-keep_tail_chars:] if len(out) > keep_tail_chars else out
|
| 436 |
-
except Exception as e:
|
| 437 |
-
return f"(failed to fetch log: {e})"
|
| 438 |
-
|
| 439 |
-
|
| 440 |
-
def main():
|
| 441 |
-
global SPACE_ID, APP_URL
|
| 442 |
-
p = argparse.ArgumentParser()
|
| 443 |
-
p.add_argument("--space-id", default=SPACE_ID)
|
| 444 |
-
p.add_argument("--url", default=APP_URL)
|
| 445 |
-
p.add_argument("--wait-running", action="store_true", help="Poll until Space is RUNNING")
|
| 446 |
-
p.add_argument("--test", action="store_true", help="Run basic + stress + persistence tests")
|
| 447 |
-
p.add_argument("--logs", choices=("build", "run"), help="Stream logs: build or run (SSE)")
|
| 448 |
-
p.add_argument("--stress-n", type=int, default=50)
|
| 449 |
-
p.add_argument("--max-wait", type=int, default=600)
|
| 450 |
-
p.add_argument("--expect", action="append", dest="expect_substrings",
|
| 451 |
-
help="Expected substring(s) in response body (basic test). Can repeat. Default: HuggingRun, Run anything")
|
| 452 |
-
p.add_argument("--wait-url", action="store_true",
|
| 453 |
-
help="Poll URL until 200 and body contains one of --expect (no HF_TOKEN needed)")
|
| 454 |
-
p.add_argument("--until-ok", action="store_true",
|
| 455 |
-
help="Poll API until RUNNING, then test; on any fail print log tail and exit 1. Loop until this exits 0.")
|
| 456 |
-
p.add_argument("--watch", action="store_true",
|
| 457 |
-
help="Use curl to poll run (and optional build) logs + app URL every N sec; don't stop (Ctrl+C to exit)")
|
| 458 |
-
p.add_argument("--watch-interval", type=int, default=20, help="Seconds between --watch polls (default 20)")
|
| 459 |
-
# SSH test options
|
| 460 |
-
p.add_argument("--ssh-test", action="store_true",
|
| 461 |
-
help="Run SSH tests: connect + command + stress + bruteforce")
|
| 462 |
-
p.add_argument("--ssh-host", default="localhost", help="SSH host (default: localhost)")
|
| 463 |
-
p.add_argument("--ssh-port", type=int, default=2222, help="SSH port (default: 2222)")
|
| 464 |
-
p.add_argument("--ssh-user", default="user", help="SSH user (default: user)")
|
| 465 |
-
p.add_argument("--ssh-key", default=None, help="Path to SSH private key (optional)")
|
| 466 |
-
p.add_argument("--ssh-stress-n", type=int, default=30, help="SSH stress: total sessions (default: 30)")
|
| 467 |
-
p.add_argument("--ssh-concurrency", type=int, default=10, help="SSH stress: concurrent sessions (default: 10)")
|
| 468 |
-
args = p.parse_args()
|
| 469 |
-
SPACE_ID = args.space_id
|
| 470 |
-
APP_URL = args.url.rstrip("/")
|
| 471 |
-
expect_substrings = tuple(args.expect_substrings) if args.expect_substrings else None
|
| 472 |
-
|
| 473 |
-
if args.logs:
|
| 474 |
-
stream_logs(SPACE_ID, args.logs)
|
| 475 |
-
return
|
| 476 |
-
|
| 477 |
-
if args.watch:
|
| 478 |
-
# 用 curl + Bearer token 持续查看远端状态,不退出
|
| 479 |
-
if not os.environ.get("HF_TOKEN"):
|
| 480 |
-
print("HF_TOKEN required for --watch (use .env or export)", file=sys.stderr)
|
| 481 |
-
sys.exit(1)
|
| 482 |
-
import subprocess
|
| 483 |
-
interval = max(10, args.watch_interval)
|
| 484 |
-
run_url = _curl_logs_url(SPACE_ID, "run")
|
| 485 |
-
build_url = _curl_logs_url(SPACE_ID, "build")
|
| 486 |
-
token = os.environ.get("HF_TOKEN")
|
| 487 |
-
curl_h = ["-H", f"Authorization: Bearer {token}", "-N", "-sS", "--max-time", str(interval + 5)]
|
| 488 |
-
n = 0
|
| 489 |
-
while True:
|
| 490 |
-
n += 1
|
| 491 |
-
ts = time.strftime("%H:%M:%S", time.gmtime())
|
| 492 |
-
print(f"\n[watch #{n} {ts}] === runtime stage ===")
|
| 493 |
-
stage, _ = get_stage()
|
| 494 |
-
print(f"[watch] stage={stage}")
|
| 495 |
-
print(f"[watch] === GET {APP_URL} ===")
|
| 496 |
-
status, body = http_get(APP_URL, timeout=15)
|
| 497 |
-
print(f"[watch] HTTP {status}, body len={len(body)}, has noVNC={('noVNC' in body)}")
|
| 498 |
-
print(f"[watch] === run log (tail, curl --max-time {interval}) ===")
|
| 499 |
-
proc = subprocess.run(
|
| 500 |
-
["curl"] + curl_h + ["--max-time", str(interval), run_url],
|
| 501 |
-
capture_output=True, text=True, timeout=interval + 10,
|
| 502 |
-
)
|
| 503 |
-
out = (proc.stdout or "") + (proc.stderr or "")
|
| 504 |
-
tail = out[-4000:] if len(out) > 4000 else out
|
| 505 |
-
for line in tail.strip().split("\n")[-25:]:
|
| 506 |
-
print(line)
|
| 507 |
-
print(f"[watch] next in {interval}s (Ctrl+C to stop)...")
|
| 508 |
-
time.sleep(interval)
|
| 509 |
-
return
|
| 510 |
-
|
| 511 |
-
if args.until_ok:
|
| 512 |
-
# 先立即查一次当前状态;已报错则马上用 curl 拉日志并退出,不空等
|
| 513 |
-
if not os.environ.get("HF_TOKEN"):
|
| 514 |
-
print("HF_TOKEN required for --until-ok (poll runtime + fetch logs)", file=sys.stderr)
|
| 515 |
-
sys.exit(1)
|
| 516 |
-
stage, err = get_stage()
|
| 517 |
-
if err:
|
| 518 |
-
print(f"[monitor] {err}")
|
| 519 |
-
sys.exit(1)
|
| 520 |
-
print(f"[monitor] Space {SPACE_ID} stage={stage}")
|
| 521 |
-
if stage == "ERROR" or stage == "BUILD_ERROR":
|
| 522 |
-
print(f"[monitor] 远端已报错,立即拉取日志 (curl)")
|
| 523 |
-
print("\n[monitor] === Build log (tail) ===")
|
| 524 |
-
print(fetch_log_tail(SPACE_ID, "build", read_timeout=15))
|
| 525 |
-
print("\n[monitor] === Run log (tail) ===")
|
| 526 |
-
print(fetch_log_tail(SPACE_ID, "run", read_timeout=15))
|
| 527 |
-
sys.exit(1)
|
| 528 |
-
if stage != "RUNNING":
|
| 529 |
-
ok = wait_running(
|
| 530 |
-
max_wait_sec=args.max_wait,
|
| 531 |
-
poll_interval=5,
|
| 532 |
-
app_url=APP_URL,
|
| 533 |
-
expect_substrings=expect_substrings or ("HuggingRun", "Run anything", "noVNC"),
|
| 534 |
-
)
|
| 535 |
-
if not ok:
|
| 536 |
-
print("\n[monitor] === Build log (tail) ===")
|
| 537 |
-
print(fetch_log_tail(SPACE_ID, "build", read_timeout=15))
|
| 538 |
-
print("\n[monitor] === Run log (tail) ===")
|
| 539 |
-
print(fetch_log_tail(SPACE_ID, "run", read_timeout=15))
|
| 540 |
-
sys.exit(1)
|
| 541 |
-
print(f"[test] Target: {APP_URL}")
|
| 542 |
-
if not test_basic(APP_URL, expect_substrings=expect_substrings):
|
| 543 |
-
print("[test] BASIC FAILED")
|
| 544 |
-
print("\n[monitor] === Run log (tail) ===")
|
| 545 |
-
print(fetch_log_tail(SPACE_ID, "run"))
|
| 546 |
-
sys.exit(1)
|
| 547 |
-
if not test_stress(APP_URL, n=args.stress_n):
|
| 548 |
-
print("[test] STRESS FAILED")
|
| 549 |
-
print("\n[monitor] === Run log (tail) ===")
|
| 550 |
-
print(fetch_log_tail(SPACE_ID, "run"))
|
| 551 |
-
sys.exit(1)
|
| 552 |
-
if not test_persistence(APP_URL):
|
| 553 |
-
print("[test] PERSISTENCE FAILED")
|
| 554 |
-
print("\n[monitor] === Run log (tail) ===")
|
| 555 |
-
print(fetch_log_tail(SPACE_ID, "run"))
|
| 556 |
-
sys.exit(1)
|
| 557 |
-
print("[test] ALL PASSED")
|
| 558 |
-
return
|
| 559 |
-
|
| 560 |
-
if args.wait_running:
|
| 561 |
-
ok = wait_running(max_wait_sec=args.max_wait)
|
| 562 |
-
if not ok:
|
| 563 |
-
print("\n[monitor] === Build log (tail) ===")
|
| 564 |
-
print(fetch_log_tail(SPACE_ID, "build"))
|
| 565 |
-
print("\n[monitor] === Run log (tail) ===")
|
| 566 |
-
print(fetch_log_tail(SPACE_ID, "run"))
|
| 567 |
-
sys.exit(1)
|
| 568 |
-
|
| 569 |
-
if args.wait_url:
|
| 570 |
-
ok = wait_url(APP_URL, expect_substrings=expect_substrings or ("HuggingRun", "Run anything", "noVNC"),
|
| 571 |
-
max_wait_sec=args.max_wait, poll_interval=20)
|
| 572 |
-
if not ok:
|
| 573 |
-
sys.exit(1)
|
| 574 |
-
|
| 575 |
-
if args.ssh_test:
|
| 576 |
-
print(f"[ssh-test] Target: {args.ssh_user}@{args.ssh_host}:{args.ssh_port}")
|
| 577 |
-
print("=" * 60)
|
| 578 |
-
print("[Phase 1] SSH Connect")
|
| 579 |
-
if not test_ssh_connect(args.ssh_host, args.ssh_port, args.ssh_user, identity_file=args.ssh_key):
|
| 580 |
-
print("[ssh-test] CONNECT FAILED")
|
| 581 |
-
sys.exit(1)
|
| 582 |
-
print()
|
| 583 |
-
print("[Phase 2] SSH Command Execution")
|
| 584 |
-
if not test_ssh_command(args.ssh_host, args.ssh_port, args.ssh_user, identity_file=args.ssh_key):
|
| 585 |
-
print("[ssh-test] COMMAND EXEC FAILED")
|
| 586 |
-
sys.exit(1)
|
| 587 |
-
print()
|
| 588 |
-
print("[Phase 3] SSH Stress Test")
|
| 589 |
-
if not test_ssh_stress(args.ssh_host, args.ssh_port, args.ssh_user,
|
| 590 |
-
n=args.ssh_stress_n, concurrency=args.ssh_concurrency,
|
| 591 |
-
identity_file=args.ssh_key):
|
| 592 |
-
print("[ssh-test] STRESS FAILED")
|
| 593 |
-
sys.exit(1)
|
| 594 |
-
print()
|
| 595 |
-
print("[Phase 4] SSH Brute-force Ramp-up")
|
| 596 |
-
if not test_ssh_bruteforce(args.ssh_host, args.ssh_port, args.ssh_user,
|
| 597 |
-
identity_file=args.ssh_key):
|
| 598 |
-
print("[ssh-test] BRUTEFORCE FAILED")
|
| 599 |
-
sys.exit(1)
|
| 600 |
-
print()
|
| 601 |
-
print("[Phase 5] Persistence Stress Test")
|
| 602 |
-
if not test_ssh_persistence_stress(args.ssh_host, args.ssh_port, args.ssh_user,
|
| 603 |
-
n_files=args.ssh_stress_n,
|
| 604 |
-
concurrency=args.ssh_concurrency,
|
| 605 |
-
identity_file=args.ssh_key):
|
| 606 |
-
print("[ssh-test] PERSISTENCE STRESS FAILED")
|
| 607 |
-
sys.exit(1)
|
| 608 |
-
print("=" * 60)
|
| 609 |
-
print("[ssh-test] ALL SSH TESTS PASSED")
|
| 610 |
-
return
|
| 611 |
-
|
| 612 |
-
if args.test:
|
| 613 |
-
print(f"[test] Target: {APP_URL}")
|
| 614 |
-
if not test_basic(APP_URL, expect_substrings=expect_substrings):
|
| 615 |
-
print("[test] BASIC FAILED")
|
| 616 |
-
sys.exit(1)
|
| 617 |
-
if not test_stress(APP_URL, n=args.stress_n):
|
| 618 |
-
print("[test] STRESS FAILED")
|
| 619 |
-
sys.exit(1)
|
| 620 |
-
if not test_persistence(APP_URL):
|
| 621 |
-
print("[test] PERSISTENCE CHECK (keyword) FAILED")
|
| 622 |
-
sys.exit(1)
|
| 623 |
-
print("[test] ALL PASSED")
|
| 624 |
-
else:
|
| 625 |
-
rt, err = get_runtime()
|
| 626 |
-
if err:
|
| 627 |
-
print("Runtime:", err)
|
| 628 |
-
else:
|
| 629 |
-
print("Runtime:", getattr(rt, "stage", rt.raw))
|
| 630 |
-
|
| 631 |
-
|
| 632 |
-
if __name__ == "__main__":
|
| 633 |
-
main()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
scripts/verify_overnight.sh
DELETED
|
@@ -1,38 +0,0 @@
|
|
| 1 |
-
#!/usr/bin/env bash
|
| 2 |
-
# Overnight verification: 3 full --until-ok runs. Exit 0 only if all pass.
|
| 3 |
-
# Usage: from repo root, with .env containing HF_TOKEN:
|
| 4 |
-
# bash scripts/verify_overnight.sh
|
| 5 |
-
set -e
|
| 6 |
-
REPO_ROOT="$(cd "$(dirname "$0")/.." && pwd)"
|
| 7 |
-
cd "$REPO_ROOT"
|
| 8 |
-
LOG="$REPO_ROOT/docs/verification_run.log"
|
| 9 |
-
APP_URL="${APP_URL:-https://tao-shen-huggingrun.hf.space}"
|
| 10 |
-
EXPECT="${EXPECT:-Directory listing}"
|
| 11 |
-
ROUNDS="${ROUNDS:-3}"
|
| 12 |
-
|
| 13 |
-
if [ ! -f .env ]; then
|
| 14 |
-
echo "Missing .env (HF_TOKEN required)" >&2
|
| 15 |
-
exit 1
|
| 16 |
-
fi
|
| 17 |
-
export $(grep -v '^#' .env | xargs)
|
| 18 |
-
|
| 19 |
-
echo "=== Overnight verification started $(date -u +%Y-%m-%dT%H:%M:%SZ) ===" | tee -a "$LOG"
|
| 20 |
-
echo "APP_URL=$APP_URL EXPECT=$EXPECT ROUNDS=$ROUNDS" | tee -a "$LOG"
|
| 21 |
-
|
| 22 |
-
PASSED=0
|
| 23 |
-
for r in $(seq 1 "$ROUNDS"); do
|
| 24 |
-
echo "" | tee -a "$LOG"
|
| 25 |
-
echo "--- Round $r/$ROUNDS at $(date -u +%H:%M:%SZ) ---" | tee -a "$LOG"
|
| 26 |
-
if python3 scripts/monitor_and_test.py --until-ok --url "$APP_URL" --expect "$EXPECT" --stress-n 50 >> "$LOG" 2>&1; then
|
| 27 |
-
PASSED=$((PASSED+1))
|
| 28 |
-
echo "Round $r PASSED" | tee -a "$LOG"
|
| 29 |
-
else
|
| 30 |
-
echo "Round $r FAILED" | tee -a "$LOG"
|
| 31 |
-
exit 1
|
| 32 |
-
fi
|
| 33 |
-
[ "$r" -lt "$ROUNDS" ] && sleep 30
|
| 34 |
-
done
|
| 35 |
-
|
| 36 |
-
echo "" | tee -a "$LOG"
|
| 37 |
-
echo "=== ALL $ROUNDS ROUNDS PASSED at $(date -u +%Y-%m-%dT%H:%M:%SZ) ===" | tee -a "$LOG"
|
| 38 |
-
exit 0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ubuntu-desktop/Dockerfile
DELETED
|
@@ -1,51 +0,0 @@
|
|
| 1 |
-
# Ubuntu 24.04 Desktop on HuggingRun — noVNC on 7860, persistence via /data
|
| 2 |
-
FROM ubuntu:24.04
|
| 3 |
-
|
| 4 |
-
ENV DEBIAN_FRONTEND=noninteractive
|
| 5 |
-
|
| 6 |
-
# System + Python (for sync)
|
| 7 |
-
RUN apt-get update && apt-get install -y --no-install-recommends \
|
| 8 |
-
ca-certificates curl python3 python3-pip python3-venv \
|
| 9 |
-
&& pip3 install --no-cache-dir --break-system-packages huggingface_hub \
|
| 10 |
-
&& rm -rf /var/lib/apt/lists/*
|
| 11 |
-
|
| 12 |
-
# Desktop stack: Xvfb, XFCE, dbus, x11vnc, Firefox; OpenSSH for reverse SSH (本地 SSH 进容器)
|
| 13 |
-
RUN apt-get update && apt-get install -y --no-install-recommends \
|
| 14 |
-
xvfb \
|
| 15 |
-
xfce4 xfce4-goodies \
|
| 16 |
-
dbus-x11 \
|
| 17 |
-
x11vnc \
|
| 18 |
-
firefox \
|
| 19 |
-
procps \
|
| 20 |
-
openssh-server openssh-client \
|
| 21 |
-
&& rm -rf /var/lib/apt/lists/*
|
| 22 |
-
|
| 23 |
-
# noVNC (web client on 7860)
|
| 24 |
-
RUN apt-get update && apt-get install -y --no-install-recommends git \
|
| 25 |
-
&& git clone --depth 1 https://github.com/novnc/noVNC.git /opt/noVNC \
|
| 26 |
-
&& git clone --depth 1 https://github.com/novnc/websockify /opt/noVNC/utils/websockify \
|
| 27 |
-
&& rm -rf /var/lib/apt/lists/* /opt/noVNC/.git
|
| 28 |
-
|
| 29 |
-
# HF Spaces run as user 1000; UID 1000 may exist (e.g. ubuntu)
|
| 30 |
-
RUN (useradd -m -u 1000 user 2>/dev/null) || \
|
| 31 |
-
(EXISTING=$$(getent passwd 1000 | cut -d: -f1); \
|
| 32 |
-
usermod -l user $$EXISTING; usermod -d /home/user user; \
|
| 33 |
-
mkdir -p /home/user && chown 1000:1000 /home/user)
|
| 34 |
-
ENV HOME=/home/user
|
| 35 |
-
RUN mkdir -p /data && chown user:user /data
|
| 36 |
-
|
| 37 |
-
# HuggingRun scripts (build context = repo root)
|
| 38 |
-
COPY scripts /scripts
|
| 39 |
-
COPY ubuntu-desktop/start-desktop.sh /opt/start-desktop.sh
|
| 40 |
-
RUN chmod +x /scripts/entrypoint.sh /opt/start-desktop.sh
|
| 41 |
-
|
| 42 |
-
ENV PERSIST_PATH=/data
|
| 43 |
-
ENV RUN_CMD="/opt/start-desktop.sh"
|
| 44 |
-
ENV DESKTOP_HOME=/data/desktop-home
|
| 45 |
-
ENV DISPLAY=:99
|
| 46 |
-
ENV VNC_PORT=5901
|
| 47 |
-
ENV NOVNC_PORT=7860
|
| 48 |
-
|
| 49 |
-
USER user
|
| 50 |
-
EXPOSE 7860
|
| 51 |
-
ENTRYPOINT ["/scripts/entrypoint.sh"]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ubuntu-desktop/README.md
DELETED
|
@@ -1,20 +0,0 @@
|
|
| 1 |
-
# Ubuntu 桌面示例
|
| 2 |
-
|
| 3 |
-
本目录是 **HuggingRun 通用工具** 的一个示例:在 HF 上跑 Ubuntu + XFCE + noVNC,使用与主仓库**完全相同的** `scripts/`(entrypoint + sync),**不修改任何通用逻辑**;仅通过本目录的 Dockerfile 设置 `RUN_CMD=/opt/start-desktop.sh`。
|
| 4 |
-
|
| 5 |
-
- **通用用法**:见 [docs/GENERAL_USAGE.md](docs/GENERAL_USAGE.md)。
|
| 6 |
-
- **本示例**:`Dockerfile` 在此目录,构建时从仓库根 COPY `scripts/`,并设置 `RUN_CMD=/opt/start-desktop.sh`;`start-desktop.sh` 启动 Xvfb + XFCE + x11vnc + noVNC(监听 7860),桌面 HOME 放在 `PERSIST_PATH/desktop-home`,由通用同步脚本持久化。
|
| 7 |
-
|
| 8 |
-
## 最小用法(用户只做两件事)
|
| 9 |
-
|
| 10 |
-
1. **Duplicate HuggingRun Space** 后,用本目录的 **Dockerfile 内容替换**仓库根目录的 `Dockerfile`(不增删通用脚本)。
|
| 11 |
-
2. 在 Settings → Secrets 中设置 `HF_TOKEN`,可选 `AUTO_CREATE_DATASET=true`。
|
| 12 |
-
3. 推送后等待构建,浏览器打开 Space 即可看到 noVNC 桌面;重启后状态由通用持久化保留。
|
| 13 |
-
|
| 14 |
-
从仓库根构建(例如本地):`docker build -f ubuntu-desktop/Dockerfile .`
|
| 15 |
-
|
| 16 |
-
**部署后监控与压测**(与通用工具同一套):部署完成后,用通用脚本轮询 + 压测即可。例如:
|
| 17 |
-
`python3 scripts/monitor_and_test.py --url "https://你的用户名-你的Space名.hf.space" --test --stress-n 50`
|
| 18 |
-
详见 [docs/REMOTE_LOGS.md](docs/REMOTE_LOGS.md) 拉取 build/run 日志配合本地 debug。
|
| 19 |
-
|
| 20 |
-
维护重点在通用层;本示例仅做最小封装,不向 core 增加任何案例专用逻辑。
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
ubuntu-desktop/start-desktop.sh
DELETED
|
@@ -1,82 +0,0 @@
|
|
| 1 |
-
#!/bin/bash
|
| 2 |
-
# Start Ubuntu desktop: Xvfb + XFCE + x11vnc + noVNC on 7860
|
| 3 |
-
# HOME is set to persistent dir by caller (sync/entrypoint). Here we ensure and use it.
|
| 4 |
-
echo "[start-desktop] Starting ..." >&2
|
| 5 |
-
set -e
|
| 6 |
-
|
| 7 |
-
export PERSIST_PATH="${PERSIST_PATH:-/data}"
|
| 8 |
-
export DESKTOP_HOME="${DESKTOP_HOME:-$PERSIST_PATH/desktop-home}"
|
| 9 |
-
export DISPLAY="${DISPLAY:-:99}"
|
| 10 |
-
export VNC_PORT="${VNC_PORT:-5901}"
|
| 11 |
-
export NOVNC_PORT="${NOVNC_PORT:-7860}"
|
| 12 |
-
|
| 13 |
-
mkdir -p "$DESKTOP_HOME"
|
| 14 |
-
export HOME="$DESKTOP_HOME"
|
| 15 |
-
|
| 16 |
-
# Ensure minimal XFCE dirs
|
| 17 |
-
mkdir -p "$HOME/.config" "$HOME/.local/share" "$HOME/Desktop"
|
| 18 |
-
|
| 19 |
-
# Start Xvfb
|
| 20 |
-
Xvfb "$DISPLAY" -screen 0 1280x720x24 -ac +extension GLX +render -noreset &
|
| 21 |
-
XVFB_PID=$!
|
| 22 |
-
sleep 2
|
| 23 |
-
echo "[start-desktop] After Xvfb sleep 2" >&2
|
| 24 |
-
|
| 25 |
-
# Start dbus for session (optional; run in subshell so failure never triggers set -e)
|
| 26 |
-
( dbus-daemon --session 2>/dev/null ) || true
|
| 27 |
-
echo "[start-desktop] Before XFCE background" >&2
|
| 28 |
-
|
| 29 |
-
# Start XFCE (lightweight); use full path in case PATH is minimal
|
| 30 |
-
(sleep 1; /usr/bin/startxfce4) &
|
| 31 |
-
DESKTOP_PID=$!
|
| 32 |
-
echo "[start-desktop] After XFCE & before sleep 3" >&2
|
| 33 |
-
sleep 3
|
| 34 |
-
echo "[start-desktop] XFCE started, starting x11vnc ..." >&2
|
| 35 |
-
|
| 36 |
-
# x11vnc: share display :99 on port 5901 (do not exit on failure so noVNC can still start)
|
| 37 |
-
x11vnc -display "$DISPLAY" -rfbport "$VNC_PORT" -forever -shared -noxdamage -nopw -bg || true
|
| 38 |
-
|
| 39 |
-
# SSH: always start sshd; do not let failures here stop noVNC
|
| 40 |
-
set +e
|
| 41 |
-
SSHD_PORT="${SSH_PORT:-2222}"
|
| 42 |
-
SSHD_LISTEN="${SSH_LISTEN:-0.0.0.0}"
|
| 43 |
-
mkdir -p "$HOME/.ssh"
|
| 44 |
-
|
| 45 |
-
# If SSH_AUTHORIZED_KEYS is set, use key-based auth only; otherwise allow password auth for local testing
|
| 46 |
-
[ -n "${SSH_AUTHORIZED_KEYS-}" ] && echo "$SSH_AUTHORIZED_KEYS" > "$HOME/.ssh/authorized_keys" && chmod 600 "$HOME/.ssh/authorized_keys"
|
| 47 |
-
|
| 48 |
-
# Use pre-generated host key from Docker build, or generate at runtime
|
| 49 |
-
HOST_KEY="$HOME/.ssh/ssh_host_ed25519_key"
|
| 50 |
-
[ ! -f "$HOST_KEY" ] && cp /home/user/.ssh/ssh_host_ed25519_key "$HOST_KEY" 2>/dev/null
|
| 51 |
-
[ ! -f "$HOST_KEY" ] && ssh-keygen -t ed25519 -f "$HOST_KEY" -N "" -C "" 2>/dev/null
|
| 52 |
-
|
| 53 |
-
if [ -f "$HOST_KEY" ]; then
|
| 54 |
-
if [ -f "$HOME/.ssh/authorized_keys" ]; then
|
| 55 |
-
# Key-based auth only (production / HF Spaces)
|
| 56 |
-
echo "[start-desktop] Starting sshd (key auth) on $SSHD_LISTEN:$SSHD_PORT ..." >&2
|
| 57 |
-
/usr/sbin/sshd -o "Port=$SSHD_PORT" -o "HostKey=$HOST_KEY" \
|
| 58 |
-
-o "AuthorizedKeysFile=$HOME/.ssh/authorized_keys" \
|
| 59 |
-
-o "PermitEmptyPasswords=no" -o "PasswordAuthentication=no" \
|
| 60 |
-
-o "ListenAddress=$SSHD_LISTEN" -o "PidFile=$HOME/.ssh/sshd.pid" \
|
| 61 |
-
-o "UsePAM=no" -o "PermitUserEnvironment=yes" -D -e &
|
| 62 |
-
else
|
| 63 |
-
# No keys configured: allow password-less login for local Docker testing
|
| 64 |
-
echo "[start-desktop] Starting sshd (no-password, local test) on $SSHD_LISTEN:$SSHD_PORT ..." >&2
|
| 65 |
-
/usr/sbin/sshd -o "Port=$SSHD_PORT" -o "HostKey=$HOST_KEY" \
|
| 66 |
-
-o "PermitEmptyPasswords=yes" -o "PasswordAuthentication=yes" \
|
| 67 |
-
-o "ListenAddress=$SSHD_LISTEN" -o "PidFile=$HOME/.ssh/sshd.pid" \
|
| 68 |
-
-o "UsePAM=no" -o "PermitRootLogin=no" -D -e &
|
| 69 |
-
fi
|
| 70 |
-
SSHD_PID=$!
|
| 71 |
-
sleep 1
|
| 72 |
-
echo "[start-desktop] sshd PID=$SSHD_PID" >&2
|
| 73 |
-
|
| 74 |
-
# Reverse SSH tunnel (HF Spaces: outbound only on 80/443/8080)
|
| 75 |
-
[ -n "${SSH_REVERSE_TARGET-}" ] && ssh -o StrictHostKeyChecking=no -o ServerAliveInterval=60 -R "0.0.0.0:${SSHD_PORT}:127.0.0.1:${SSHD_PORT}" $SSH_REVERSE_TARGET -N &
|
| 76 |
-
fi
|
| 77 |
-
set -e
|
| 78 |
-
|
| 79 |
-
# noVNC: must run in foreground; listen on 0.0.0.0 so HF proxy can reach it
|
| 80 |
-
echo "[start-desktop] Starting noVNC on 0.0.0.0:$NOVNC_PORT ..." >&2
|
| 81 |
-
# Use bash -c so novnc_proxy runs as main process; if it exits, keep container alive with sleep
|
| 82 |
-
exec /bin/bash -c "cd /opt/noVNC && ./utils/novnc_proxy --listen 0.0.0.0:$NOVNC_PORT --vnc localhost:$VNC_PORT --web /opt/noVNC" || exec sleep infinity
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|