Spaces:
Paused
Paused
deploy: Ubuntu desktop as root Dockerfile; add push-debug workflow doc; generic monitor retries and sync retry
Browse files- Dockerfile +32 -20
- README.md +2 -2
- docs/GENERAL_USAGE.md +2 -0
- docs/PUSH_DEBUG.md +100 -0
- scripts/monitor_and_test.py +52 -29
- scripts/sync_hf.py +16 -7
Dockerfile
CHANGED
|
@@ -1,34 +1,46 @@
|
|
| 1 |
-
# HuggingRun —
|
| 2 |
-
|
| 3 |
-
ARG BASE_IMAGE=python:3.11-slim
|
| 4 |
-
FROM ${BASE_IMAGE}
|
| 5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
RUN apt-get update && apt-get install -y --no-install-recommends \
|
| 7 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
&& rm -rf /var/lib/apt/lists/*
|
| 9 |
|
| 10 |
-
#
|
| 11 |
-
RUN
|
| 12 |
-
&&
|
| 13 |
-
&&
|
|
|
|
| 14 |
|
| 15 |
# HF Spaces run as user 1000
|
| 16 |
RUN useradd -m -u 1000 user
|
| 17 |
ENV HOME=/home/user
|
| 18 |
-
|
| 19 |
|
| 20 |
-
|
| 21 |
-
COPY
|
| 22 |
-
|
|
|
|
| 23 |
|
| 24 |
-
# Default: run demo app. Override with RUN_CMD in Space secrets.
|
| 25 |
ENV PERSIST_PATH=/data
|
| 26 |
-
ENV RUN_CMD=""
|
| 27 |
-
ENV
|
| 28 |
-
ENV
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
RUN mkdir -p /data && chown user:user /data
|
| 32 |
|
| 33 |
USER user
|
| 34 |
EXPOSE 7860
|
|
|
|
| 1 |
+
# Ubuntu 24.04 Desktop on HuggingRun — noVNC on 7860, persistence via /data
|
| 2 |
+
FROM ubuntu:24.04
|
|
|
|
|
|
|
| 3 |
|
| 4 |
+
ENV DEBIAN_FRONTEND=noninteractive
|
| 5 |
+
|
| 6 |
+
# System + Python (for sync)
|
| 7 |
+
RUN apt-get update && apt-get install -y --no-install-recommends \
|
| 8 |
+
ca-certificates curl python3 python3-pip python3-venv \
|
| 9 |
+
&& pip3 install --no-cache-dir --break-system-packages huggingface_hub \
|
| 10 |
+
&& rm -rf /var/lib/apt/lists/*
|
| 11 |
+
|
| 12 |
+
# Desktop stack: Xvfb, XFCE, dbus, x11vnc, Firefox
|
| 13 |
RUN apt-get update && apt-get install -y --no-install-recommends \
|
| 14 |
+
xvfb \
|
| 15 |
+
xfce4 xfce4-goodies \
|
| 16 |
+
dbus-x11 \
|
| 17 |
+
x11vnc \
|
| 18 |
+
firefox \
|
| 19 |
+
procps \
|
| 20 |
&& rm -rf /var/lib/apt/lists/*
|
| 21 |
|
| 22 |
+
# noVNC (web client on 7860)
|
| 23 |
+
RUN apt-get update && apt-get install -y --no-install-recommends git \
|
| 24 |
+
&& git clone --depth 1 https://github.com/novnc/noVNC.git /opt/noVNC \
|
| 25 |
+
&& git clone --depth 1 https://github.com/novnc/websockify /opt/noVNC/utils/websockify \
|
| 26 |
+
&& rm -rf /var/lib/apt/lists/* /opt/noVNC/.git
|
| 27 |
|
| 28 |
# HF Spaces run as user 1000
|
| 29 |
RUN useradd -m -u 1000 user
|
| 30 |
ENV HOME=/home/user
|
| 31 |
+
RUN mkdir -p /data && chown user:user /data
|
| 32 |
|
| 33 |
+
# HuggingRun scripts (build context = repo root)
|
| 34 |
+
COPY scripts /scripts
|
| 35 |
+
COPY ubuntu-desktop/start-desktop.sh /opt/start-desktop.sh
|
| 36 |
+
RUN chmod +x /scripts/entrypoint.sh /opt/start-desktop.sh
|
| 37 |
|
|
|
|
| 38 |
ENV PERSIST_PATH=/data
|
| 39 |
+
ENV RUN_CMD="/opt/start-desktop.sh"
|
| 40 |
+
ENV DESKTOP_HOME=/data/desktop-home
|
| 41 |
+
ENV DISPLAY=:99
|
| 42 |
+
ENV VNC_PORT=5901
|
| 43 |
+
ENV NOVNC_PORT=7860
|
|
|
|
| 44 |
|
| 45 |
USER user
|
| 46 |
EXPOSE 7860
|
README.md
CHANGED
|
@@ -19,7 +19,7 @@ tags:
|
|
| 19 |
|
| 20 |
**Run anything on Hugging Face.**
|
| 21 |
|
| 22 |
-
HuggingRun 是面向 Hugging Face Spaces 的**通用部署接口**:用同一套工具解决 HF 上的持久化、单端口、网络等限制,让任意 Docker 应用都能一
|
| 23 |
|
| 24 |
- **通用用法(用户最少步骤)**:[docs/GENERAL_USAGE.md](docs/GENERAL_USAGE.md) — 不按其他云容器收费或复杂配置,所有能力围绕通用工具展开。
|
| 25 |
- **通用工具优先**:主要维护的是通用层(持久化同步、单入口、可配置端口)。示例仅演示“最少配置”用法,不在核心脚本中为任何案例写死逻辑。
|
|
@@ -46,7 +46,7 @@ HuggingRun 是面向 Hugging Face Spaces 的**通用部署接口**:用同一
|
|
| 46 |
- **统一入口**:同一 entrypoint 先做恢复与同步,再 `exec` 你的 `RUN_CMD`,便于任意镜像复用。
|
| 47 |
|
| 48 |
详见 [docs/HF_LIMITATIONS.md](docs/HF_LIMITATIONS.md)。
|
| 49 |
-
远端构建/运行日志
|
| 50 |
|
| 51 |
## 示例(最小用法)
|
| 52 |
|
|
|
|
| 19 |
|
| 20 |
**Run anything on Hugging Face.**
|
| 21 |
|
| 22 |
+
HuggingRun 是面向 Hugging Face Spaces 的**通用部署接口**:用同一套工具解决 HF 上的持久化、单端口、网络等限制,让**任意 Docker 应用**都能按同一套流程部署、重启后状态保留。我们以「部署一整台操作系统」(如 Ubuntu 桌面)作为高难度用例做验证——这类任务若能稳定跑通,说明通用工具足以支撑用户正常部署各种复杂应用。
|
| 23 |
|
| 24 |
- **通用用法(用户最少步骤)**:[docs/GENERAL_USAGE.md](docs/GENERAL_USAGE.md) — 不按其他云容器收费或复杂配置,所有能力围绕通用工具展开。
|
| 25 |
- **通用工具优先**:主要维护的是通用层(持久化同步、单入口、可配置端口)。示例仅演示“最少配置”用法,不在核心脚本中为任何案例写死逻辑。
|
|
|
|
| 46 |
- **统一入口**:同一 entrypoint 先做恢复与同步,再 `exec` 你的 `RUN_CMD`,便于任意镜像复用。
|
| 47 |
|
| 48 |
详见 [docs/HF_LIMITATIONS.md](docs/HF_LIMITATIONS.md)。
|
| 49 |
+
远端构建/运行日志:[docs/REMOTE_LOGS.md](docs/REMOTE_LOGS.md)。**Push → 部署 → 监控 → 测试** 循环:[docs/PUSH_DEBUG.md](docs/PUSH_DEBUG.md)。
|
| 50 |
|
| 51 |
## 示例(最小用法)
|
| 52 |
|
docs/GENERAL_USAGE.md
CHANGED
|
@@ -2,6 +2,8 @@
|
|
| 2 |
|
| 3 |
本文档说明**通用工具**的用法。所有能力都围绕这一套工具展开;示例(含 Ubuntu 桌面)只是「同一条通用流水线 + 不同 RUN_CMD 或不同 Dockerfile」的用法,不做单独定制。
|
| 4 |
|
|
|
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
## 设计原则
|
|
|
|
| 2 |
|
| 3 |
本文档说明**通用工具**的用法。所有能力都围绕这一套工具展开;示例(含 Ubuntu 桌面)只是「同一条通用流水线 + 不同 RUN_CMD 或不同 Dockerfile」的用法,不做单独定制。
|
| 4 |
|
| 5 |
+
**设计目标**:让用这个工具的人可以**正常部署所有东西**。我们把「部署一整台操作系统」(如 Ubuntu 桌面 + noVNC)当作高难度用例——若这类任务都能运行正常,说明通用层足够鲁棒,其他应用更不在话下。
|
| 6 |
+
|
| 7 |
---
|
| 8 |
|
| 9 |
## 设计原则
|
docs/PUSH_DEBUG.md
ADDED
|
@@ -0,0 +1,100 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Push → 远端部署 → 监控 → 测试(循环直到远端成功)
|
| 2 |
+
|
| 3 |
+
**原则**:只有 **push 上去** 才会触发 HF 部署;只有 **远端** 构建成功、应用 RUNNING、且压力测试全部通过,才算完成。本地改完必须不断 push、用远端日志和测试结果 debug,直到远端成功才能停。
|
| 4 |
+
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
## 1. 循环流程
|
| 8 |
+
|
| 9 |
+
```
|
| 10 |
+
修代码 → git add / commit → git push
|
| 11 |
+
↓
|
| 12 |
+
HF 开始构建/部署
|
| 13 |
+
↓
|
| 14 |
+
实时监控远端:--logs build / --logs run,或 --wait-running
|
| 15 |
+
↓
|
| 16 |
+
跑压力测试:--test(必要时 --url 与 --expect)
|
| 17 |
+
↓
|
| 18 |
+
全部通过? → 停
|
| 19 |
+
有失败? → 根据日志/现象修代码 → 回到「修代码」再 push
|
| 20 |
+
```
|
| 21 |
+
|
| 22 |
+
---
|
| 23 |
+
|
| 24 |
+
## 2. 常用命令(按顺序用)
|
| 25 |
+
|
| 26 |
+
### 2.1 推送(触发部署)
|
| 27 |
+
|
| 28 |
+
```bash
|
| 29 |
+
git add -A
|
| 30 |
+
git commit -m "fix: 简短描述"
|
| 31 |
+
git push origin main
|
| 32 |
+
```
|
| 33 |
+
|
| 34 |
+
(若 Space 在别的分支,把 `main` 换成该分支。)
|
| 35 |
+
|
| 36 |
+
### 2.2 实时看远端日志(debug 用)
|
| 37 |
+
|
| 38 |
+
```bash
|
| 39 |
+
# 构建日志(推送后立刻看)
|
| 40 |
+
HF_TOKEN=你的token python3 scripts/monitor_and_test.py --space-id 你的用户名/你的Space名 --logs build
|
| 41 |
+
|
| 42 |
+
# 运行日志(构建完成后看容器是否正常)
|
| 43 |
+
HF_TOKEN=你的token python3 scripts/monitor_and_test.py --space-id 你的用户名/你的Space名 --logs run
|
| 44 |
+
```
|
| 45 |
+
|
| 46 |
+
`--space-id` 默认是 `tao-shen/HuggingRun`,可省略。
|
| 47 |
+
|
| 48 |
+
### 2.3 等 RUNNING 后跑完整测试(一键「远端是否成功」)
|
| 49 |
+
|
| 50 |
+
```bash
|
| 51 |
+
# Demo 或默认 Space
|
| 52 |
+
HF_TOKEN=你的token python3 scripts/monitor_and_test.py --wait-running --test
|
| 53 |
+
|
| 54 |
+
# Ubuntu 桌面等:指定 URL 和期望页面关键词
|
| 55 |
+
HF_TOKEN=你的token python3 scripts/monitor_and_test.py --wait-running --test \
|
| 56 |
+
--url https://你的用户名-你的Space名.hf.space \
|
| 57 |
+
--expect noVNC
|
| 58 |
+
```
|
| 59 |
+
|
| 60 |
+
脚本会先轮询直到 Space 状态为 RUNNING,再跑:基础 GET、压力请求、多轮持久化检查。**全部通过才 exit 0**,任一失败则 exit 1,便于脚本化「只有远端成功才算数」。
|
| 61 |
+
|
| 62 |
+
### 2.4 不等待、直接测当前页面(Space 已 RUNNING 时)
|
| 63 |
+
|
| 64 |
+
```bash
|
| 65 |
+
python3 scripts/monitor_and_test.py --test
|
| 66 |
+
# 或
|
| 67 |
+
python3 scripts/monitor_and_test.py --url https://xxx.hf.space --test --expect noVNC
|
| 68 |
+
```
|
| 69 |
+
|
| 70 |
+
---
|
| 71 |
+
|
| 72 |
+
## 3. 建议用法(复制粘贴循环)
|
| 73 |
+
|
| 74 |
+
1. **推送后**:开一个终端拉构建日志,确认无报错。
|
| 75 |
+
```bash
|
| 76 |
+
HF_TOKEN=xxx python3 scripts/monitor_and_test.py --logs build
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
2. **构建完成后**:另一个终端等 RUNNING 并跑测试。
|
| 80 |
+
```bash
|
| 81 |
+
HF_TOKEN=xxx python3 scripts/monitor_and_test.py --wait-running --test --url https://tao-shen-huggingrun.hf.space --expect noVNC
|
| 82 |
+
```
|
| 83 |
+
|
| 84 |
+
3. 若 **测试失败**:用 `--logs run` 看容器内报错,修代码后:
|
| 85 |
+
```bash
|
| 86 |
+
git add -A && git commit -m "fix: ..." && git push origin main
|
| 87 |
+
```
|
| 88 |
+
然后重复 1–2,直到测试全部通过。
|
| 89 |
+
|
| 90 |
+
---
|
| 91 |
+
|
| 92 |
+
## 4. 环境变量速查
|
| 93 |
+
|
| 94 |
+
| 变量 | 说明 |
|
| 95 |
+
|------|------|
|
| 96 |
+
| `HF_TOKEN` | 拉日志、查 runtime 状态、等 RUNNING 时必填 |
|
| 97 |
+
| `SPACE_ID` | 默认 `tao-shen/HuggingRun`,也可用 `--space-id` |
|
| 98 |
+
| `APP_URL` | 默认 `https://tao-shen-huggingrun.hf.space`,也可用 `--url` |
|
| 99 |
+
|
| 100 |
+
所有「成功」以 **远端** 为准:构建成功 + 应用 RUNNING + 压力测试全部通过。
|
scripts/monitor_and_test.py
CHANGED
|
@@ -1,10 +1,12 @@
|
|
| 1 |
#!/usr/bin/env python3
|
| 2 |
"""
|
| 3 |
-
HuggingRun: 监控远端 Space 状态并执行基础/压力/持久化验证。
|
| 4 |
用法:
|
| 5 |
-
|
| 6 |
-
HF_TOKEN=xxx python3 scripts/monitor_and_test.py --
|
| 7 |
-
|
|
|
|
|
|
|
| 8 |
"""
|
| 9 |
import argparse
|
| 10 |
import os
|
|
@@ -52,21 +54,42 @@ def wait_running(max_wait_sec=600, poll_interval=15):
|
|
| 52 |
return False
|
| 53 |
|
| 54 |
|
| 55 |
-
def http_get(url, timeout=30):
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 67 |
status, body = http_get(url)
|
| 68 |
-
|
| 69 |
-
|
|
|
|
| 70 |
return ok
|
| 71 |
|
| 72 |
|
|
@@ -86,18 +109,15 @@ def test_stress(url, n=50, concurrency=10):
|
|
| 86 |
|
| 87 |
|
| 88 |
def test_persistence(url, rounds=3):
|
| 89 |
-
"""多轮访问,
|
| 90 |
-
|
| 91 |
for _ in range(rounds):
|
| 92 |
-
status,
|
| 93 |
-
if status
|
| 94 |
-
|
| 95 |
-
# Demo 页有 "Visit count (persisted): N"
|
| 96 |
-
if "Visit count" in body or "total_visits" in body or "persisted" in body:
|
| 97 |
-
counts.append(1)
|
| 98 |
time.sleep(1)
|
| 99 |
-
print(f"[persistence] {rounds} rounds
|
| 100 |
-
return
|
| 101 |
|
| 102 |
|
| 103 |
def stream_logs(space_id: str, log_type: str):
|
|
@@ -132,9 +152,12 @@ def main():
|
|
| 132 |
p.add_argument("--logs", choices=("build", "run"), help="Stream logs: build or run (SSE)")
|
| 133 |
p.add_argument("--stress-n", type=int, default=50)
|
| 134 |
p.add_argument("--max-wait", type=int, default=600)
|
|
|
|
|
|
|
| 135 |
args = p.parse_args()
|
| 136 |
SPACE_ID = args.space_id
|
| 137 |
APP_URL = args.url.rstrip("/")
|
|
|
|
| 138 |
|
| 139 |
if args.logs:
|
| 140 |
stream_logs(SPACE_ID, args.logs)
|
|
@@ -147,7 +170,7 @@ def main():
|
|
| 147 |
|
| 148 |
if args.test:
|
| 149 |
print(f"[test] Target: {APP_URL}")
|
| 150 |
-
if not test_basic(APP_URL):
|
| 151 |
print("[test] BASIC FAILED")
|
| 152 |
sys.exit(1)
|
| 153 |
if not test_stress(APP_URL, n=args.stress_n):
|
|
|
|
| 1 |
#!/usr/bin/env python3
|
| 2 |
"""
|
| 3 |
+
HuggingRun: 监控远端 Space 状态并执行基础/压力/持久化验证(通用工具,适用于任意 Space)。
|
| 4 |
用法:
|
| 5 |
+
python3 scripts/monitor_and_test.py --test
|
| 6 |
+
HF_TOKEN=xxx python3 scripts/monitor_and_test.py --wait-running --test # 等 RUNNING 后再测
|
| 7 |
+
python3 scripts/monitor_and_test.py --url https://xxx.hf.space --test --expect noVNC # 桌面等非 demo
|
| 8 |
+
HF_TOKEN=xxx python3 scripts/monitor_and_test.py --logs run
|
| 9 |
+
HF_TOKEN=xxx python3 scripts/monitor_and_test.py --logs build
|
| 10 |
"""
|
| 11 |
import argparse
|
| 12 |
import os
|
|
|
|
| 54 |
return False
|
| 55 |
|
| 56 |
|
| 57 |
+
def http_get(url, timeout=30, retries=3, retry_delay=2):
|
| 58 |
+
"""GET url; retry on 502/503/timeout/connection errors (generic HF robustness)."""
|
| 59 |
+
last_status, last_body, last_err = None, "", None
|
| 60 |
+
for attempt in range(max(1, retries)):
|
| 61 |
+
try:
|
| 62 |
+
req = urllib.request.Request(url, method="GET")
|
| 63 |
+
with urllib.request.urlopen(req, timeout=timeout) as resp:
|
| 64 |
+
body = resp.read().decode("utf-8", errors="replace")
|
| 65 |
+
return (resp.status, body)
|
| 66 |
+
except urllib.error.HTTPError as e:
|
| 67 |
+
last_status = e.code
|
| 68 |
+
last_body = e.read().decode("utf-8", errors="replace") if e.fp else ""
|
| 69 |
+
last_err = e
|
| 70 |
+
if e.code in (502, 503) and attempt < retries - 1:
|
| 71 |
+
time.sleep(retry_delay)
|
| 72 |
+
continue
|
| 73 |
+
return (e.code, last_body)
|
| 74 |
+
except (OSError, urllib.error.URLError) as e:
|
| 75 |
+
last_err = e
|
| 76 |
+
last_status = -1
|
| 77 |
+
last_body = str(e)
|
| 78 |
+
if attempt < retries - 1:
|
| 79 |
+
time.sleep(retry_delay)
|
| 80 |
+
continue
|
| 81 |
+
return (-1, last_body)
|
| 82 |
+
return (last_status or -1, last_body or str(last_err or ""))
|
| 83 |
+
|
| 84 |
+
|
| 85 |
+
def test_basic(url, expect_substrings=None):
|
| 86 |
+
"""GET url; pass if status 200 and body contains any of expect_substrings (default: HuggingRun / Run anything)."""
|
| 87 |
+
if expect_substrings is None:
|
| 88 |
+
expect_substrings = ("HuggingRun", "Run anything")
|
| 89 |
status, body = http_get(url)
|
| 90 |
+
found = any(s in body for s in expect_substrings)
|
| 91 |
+
ok = status == 200 and found
|
| 92 |
+
print(f"[test] GET {url} -> {status}, body contains expected: {found}")
|
| 93 |
return ok
|
| 94 |
|
| 95 |
|
|
|
|
| 109 |
|
| 110 |
|
| 111 |
def test_persistence(url, rounds=3):
|
| 112 |
+
"""多轮访问,每轮均需返回 200(通用:任意应用只要稳定返回 200 即通过)。"""
|
| 113 |
+
ok_rounds = 0
|
| 114 |
for _ in range(rounds):
|
| 115 |
+
status, _ = http_get(url)
|
| 116 |
+
if status == 200:
|
| 117 |
+
ok_rounds += 1
|
|
|
|
|
|
|
|
|
|
| 118 |
time.sleep(1)
|
| 119 |
+
print(f"[persistence] {rounds} rounds: {ok_rounds} ok")
|
| 120 |
+
return ok_rounds == rounds
|
| 121 |
|
| 122 |
|
| 123 |
def stream_logs(space_id: str, log_type: str):
|
|
|
|
| 152 |
p.add_argument("--logs", choices=("build", "run"), help="Stream logs: build or run (SSE)")
|
| 153 |
p.add_argument("--stress-n", type=int, default=50)
|
| 154 |
p.add_argument("--max-wait", type=int, default=600)
|
| 155 |
+
p.add_argument("--expect", action="append", dest="expect_substrings",
|
| 156 |
+
help="Expected substring(s) in response body (basic test). Can repeat. Default: HuggingRun, Run anything")
|
| 157 |
args = p.parse_args()
|
| 158 |
SPACE_ID = args.space_id
|
| 159 |
APP_URL = args.url.rstrip("/")
|
| 160 |
+
expect_substrings = tuple(args.expect_substrings) if args.expect_substrings else None
|
| 161 |
|
| 162 |
if args.logs:
|
| 163 |
stream_logs(SPACE_ID, args.logs)
|
|
|
|
| 170 |
|
| 171 |
if args.test:
|
| 172 |
print(f"[test] Target: {APP_URL}")
|
| 173 |
+
if not test_basic(APP_URL, expect_substrings=expect_substrings):
|
| 174 |
print("[test] BASIC FAILED")
|
| 175 |
sys.exit(1)
|
| 176 |
if not test_stress(APP_URL, n=args.stress_n):
|
scripts/sync_hf.py
CHANGED
|
@@ -110,13 +110,22 @@ class GenericSync:
|
|
| 110 |
print(f"[HuggingRun] Restoring {PERSIST_PATH} from {HF_REPO_ID} ...")
|
| 111 |
PERSIST_PATH.mkdir(parents=True, exist_ok=True)
|
| 112 |
with tempfile.TemporaryDirectory() as tmpdir:
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 120 |
src = Path(tmpdir) / DATASET_SUBFOLDER
|
| 121 |
if src.exists():
|
| 122 |
for item in src.rglob("*"):
|
|
|
|
| 110 |
print(f"[HuggingRun] Restoring {PERSIST_PATH} from {HF_REPO_ID} ...")
|
| 111 |
PERSIST_PATH.mkdir(parents=True, exist_ok=True)
|
| 112 |
with tempfile.TemporaryDirectory() as tmpdir:
|
| 113 |
+
for attempt in range(2):
|
| 114 |
+
try:
|
| 115 |
+
snapshot_download(
|
| 116 |
+
repo_id=HF_REPO_ID,
|
| 117 |
+
repo_type="dataset",
|
| 118 |
+
allow_patterns=f"{prefix}**",
|
| 119 |
+
local_dir=tmpdir,
|
| 120 |
+
token=HF_TOKEN,
|
| 121 |
+
)
|
| 122 |
+
break
|
| 123 |
+
except Exception as e:
|
| 124 |
+
if attempt == 0:
|
| 125 |
+
print(f"[HuggingRun] Restore attempt {attempt + 1} failed: {e}. Retrying...")
|
| 126 |
+
time.sleep(3)
|
| 127 |
+
else:
|
| 128 |
+
raise
|
| 129 |
src = Path(tmpdir) / DATASET_SUBFOLDER
|
| 130 |
if src.exists():
|
| 131 |
for item in src.rglob("*"):
|