Spaces:
Sleeping
Sleeping
Harden UI + validation fixes for Space trainer
Browse files- README.md +4 -0
- VALIDATION_LOG.md +72 -0
- app.py +11 -6
- tests/test_core_utils.py +9 -0
README.md
CHANGED
|
@@ -93,3 +93,7 @@ Recommended runtime secrets posture:
|
|
| 93 |
- avoid storing long-lived API tokens in repository files
|
| 94 |
|
| 95 |
Detailed deployment/rollback steps are documented in `PRODUCTION.md`.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 93 |
- avoid storing long-lived API tokens in repository files
|
| 94 |
|
| 95 |
Detailed deployment/rollback steps are documented in `PRODUCTION.md`.
|
| 96 |
+
|
| 97 |
+
## Validation Record
|
| 98 |
+
|
| 99 |
+
- Latest verification and hardening run details are recorded in `VALIDATION_LOG.md`.
|
VALIDATION_LOG.md
ADDED
|
@@ -0,0 +1,72 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Space Trainer Validation Log
|
| 2 |
+
|
| 3 |
+
Date (UTC): 2026-02-28 10:24:36 UTC
|
| 4 |
+
|
| 5 |
+
## Scope Reviewed
|
| 6 |
+
|
| 7 |
+
Reviewed the full `space_trainer/` implementation surface used by the Hugging Face Space runtime:
|
| 8 |
+
|
| 9 |
+
- `space_trainer/app.py`
|
| 10 |
+
- `space_trainer/README.md`
|
| 11 |
+
- `space_trainer/PRODUCTION.md`
|
| 12 |
+
- `space_trainer/.env.example`
|
| 13 |
+
- `space_trainer/requirements.txt`
|
| 14 |
+
- `space_trainer/configs/deepseek_math_sota.yaml`
|
| 15 |
+
- `space_trainer/scripts/preflight_check.py`
|
| 16 |
+
- `space_trainer/scripts/train_sota.py`
|
| 17 |
+
- `space_trainer/scripts/eval_sota.py`
|
| 18 |
+
- `space_trainer/tests/test_core_utils.py`
|
| 19 |
+
- Existing workspace runtime/run artifacts under `space_trainer/workspace/`
|
| 20 |
+
|
| 21 |
+
## Issues Found
|
| 22 |
+
|
| 23 |
+
1. UI result badge mapping treated `preflight passed` as neutral because `_` was converted to spaces before class lookup.
|
| 24 |
+
2. Unit tests failed when run from repository root due import path assumptions (`ModuleNotFoundError: app`).
|
| 25 |
+
|
| 26 |
+
## Fixes Applied
|
| 27 |
+
|
| 28 |
+
1. `space_trainer/app.py`
|
| 29 |
+
- Normalized run result strings in `_run_result_badge_class()` to handle underscore/space/hyphen variants.
|
| 30 |
+
- Updated recent-runs badge rendering to classify by raw result key and only prettify the display label.
|
| 31 |
+
- Kept Gradio theme/css/head in `launch()` (Gradio 6.6 recommended path), and set queue configuration once at module load with `demo.queue(default_concurrency_limit=1)`.
|
| 32 |
+
|
| 33 |
+
2. `space_trainer/tests/test_core_utils.py`
|
| 34 |
+
- Added deterministic `sys.path` insertion for `space_trainer/` root so tests pass from both:
|
| 35 |
+
- repo root (`python -m unittest discover -s space_trainer/tests -v`)
|
| 36 |
+
- `space_trainer/` directory (`python -m unittest discover -s tests -v`)
|
| 37 |
+
- Added regression test for preflight badge-class normalization.
|
| 38 |
+
|
| 39 |
+
## Validation Commands and Results
|
| 40 |
+
|
| 41 |
+
1. Preflight checks:
|
| 42 |
+
- Command: `.venv/bin/python space_trainer/scripts/preflight_check.py --json`
|
| 43 |
+
- Result: PASS (`"ok": true`)
|
| 44 |
+
|
| 45 |
+
2. Unit tests from repo root:
|
| 46 |
+
- Command: `.venv/bin/python -m unittest discover -s space_trainer/tests -v`
|
| 47 |
+
- Result: PASS (`Ran 15 tests`, `OK`)
|
| 48 |
+
|
| 49 |
+
3. Unit tests from `space_trainer/`:
|
| 50 |
+
- Command: `../.venv/bin/python -m unittest discover -s tests -v`
|
| 51 |
+
- Result: PASS (`Ran 15 tests`, `OK`)
|
| 52 |
+
|
| 53 |
+
4. Python syntax compile check:
|
| 54 |
+
- Command: `../.venv/bin/python -m py_compile app.py scripts/preflight_check.py scripts/train_sota.py scripts/eval_sota.py tests/test_core_utils.py`
|
| 55 |
+
- Result: PASS
|
| 56 |
+
|
| 57 |
+
5. Gradio app object/config smoke check:
|
| 58 |
+
- Command: `../.venv/bin/python - <<'PY' ... app.demo.get_config_file() ... PY`
|
| 59 |
+
- Result: PASS (`mode=blocks`, `components=44`, `dependencies=3`, `queue_set=True`)
|
| 60 |
+
|
| 61 |
+
## Environment Notes
|
| 62 |
+
|
| 63 |
+
- CUDA warning appears in this environment (`cudaGetDeviceCount` OS unsupported). This is expected on non-GPU hosts and handled by app CPU fallback logic.
|
| 64 |
+
- Fast tokenizer fallback warning (`protobuf missing`) is already handled by project fallback code and validated by tests.
|
| 65 |
+
- Direct local `app.py` server launch in this sandbox cannot bind any Gradio ports (`Cannot find empty port...`). This is an execution-environment limitation, not a code-level validation failure.
|
| 66 |
+
|
| 67 |
+
## Current Status
|
| 68 |
+
|
| 69 |
+
- UI telemetry classification bug fixed.
|
| 70 |
+
- Test reliability improved.
|
| 71 |
+
- Preflight + tests + compile checks are passing.
|
| 72 |
+
- Space runtime code path is consistent and ready for deployment validation inside Hugging Face Spaces.
|
app.py
CHANGED
|
@@ -1352,7 +1352,7 @@ def _refresh_recent_runs(summary: Dict[str, Any], log_lines: List[str]) -> None:
|
|
| 1352 |
|
| 1353 |
|
| 1354 |
def _run_result_badge_class(result_text: str) -> str:
|
| 1355 |
-
normalized = (result_text or "").strip().lower()
|
| 1356 |
if normalized in {"completed", "preflight_passed"}:
|
| 1357 |
return "ok"
|
| 1358 |
if normalized in {"failed", "error"}:
|
|
@@ -1373,9 +1373,9 @@ def _build_recent_runs_panel(summary: Dict[str, Any]) -> str:
|
|
| 1373 |
lines: List[str] = []
|
| 1374 |
for entry in entries[:RECENT_RUNS_VISUAL_LIMIT]:
|
| 1375 |
run_label = html.escape(str(entry.get("run_label") or "--"))
|
| 1376 |
-
|
| 1377 |
-
badge_cls = _run_result_badge_class(
|
| 1378 |
-
badge_label = html.escape(
|
| 1379 |
evaluation = _as_dict(entry.get("evaluation"))
|
| 1380 |
pass_1 = _fmt_pct(evaluation.get("pass_at_1"))
|
| 1381 |
pass_k = _fmt_pct(evaluation.get("pass_at_k"))
|
|
@@ -2584,6 +2584,9 @@ def run_pipeline(
|
|
| 2584 |
cycle_index += 1
|
| 2585 |
|
| 2586 |
|
|
|
|
|
|
|
|
|
|
| 2587 |
with gr.Blocks(title="Math Conjecture Trainer Space") as demo:
|
| 2588 |
gr.HTML(APP_HEADER_HTML)
|
| 2589 |
gr.Markdown(PROJECT_DESCRIPTION, elem_classes=["section-copy"])
|
|
@@ -2688,10 +2691,12 @@ with gr.Blocks(title="Math Conjecture Trainer Space") as demo:
|
|
| 2688 |
clear_button.click(fn=clear_outputs, inputs=None, outputs=[logs, status, ops_visual], queue=False)
|
| 2689 |
gr.HTML(APP_FOOTER_HTML)
|
| 2690 |
|
|
|
|
|
|
|
| 2691 |
|
| 2692 |
if __name__ == "__main__":
|
| 2693 |
-
demo.
|
| 2694 |
-
theme=
|
| 2695 |
css=UI_CSS,
|
| 2696 |
head=UI_HEAD,
|
| 2697 |
)
|
|
|
|
| 1352 |
|
| 1353 |
|
| 1354 |
def _run_result_badge_class(result_text: str) -> str:
|
| 1355 |
+
normalized = (result_text or "").strip().lower().replace("-", "_").replace(" ", "_")
|
| 1356 |
if normalized in {"completed", "preflight_passed"}:
|
| 1357 |
return "ok"
|
| 1358 |
if normalized in {"failed", "error"}:
|
|
|
|
| 1373 |
lines: List[str] = []
|
| 1374 |
for entry in entries[:RECENT_RUNS_VISUAL_LIMIT]:
|
| 1375 |
run_label = html.escape(str(entry.get("run_label") or "--"))
|
| 1376 |
+
raw_result = str(entry.get("result") or "unknown").strip().lower()
|
| 1377 |
+
badge_cls = _run_result_badge_class(raw_result)
|
| 1378 |
+
badge_label = html.escape(raw_result.replace("_", " "))
|
| 1379 |
evaluation = _as_dict(entry.get("evaluation"))
|
| 1380 |
pass_1 = _fmt_pct(evaluation.get("pass_at_1"))
|
| 1381 |
pass_k = _fmt_pct(evaluation.get("pass_at_k"))
|
|
|
|
| 2584 |
cycle_index += 1
|
| 2585 |
|
| 2586 |
|
| 2587 |
+
APP_THEME = gr.themes.Default(primary_hue="gray")
|
| 2588 |
+
|
| 2589 |
+
|
| 2590 |
with gr.Blocks(title="Math Conjecture Trainer Space") as demo:
|
| 2591 |
gr.HTML(APP_HEADER_HTML)
|
| 2592 |
gr.Markdown(PROJECT_DESCRIPTION, elem_classes=["section-copy"])
|
|
|
|
| 2691 |
clear_button.click(fn=clear_outputs, inputs=None, outputs=[logs, status, ops_visual], queue=False)
|
| 2692 |
gr.HTML(APP_FOOTER_HTML)
|
| 2693 |
|
| 2694 |
+
demo.queue(default_concurrency_limit=1)
|
| 2695 |
+
|
| 2696 |
|
| 2697 |
if __name__ == "__main__":
|
| 2698 |
+
demo.launch(
|
| 2699 |
+
theme=APP_THEME,
|
| 2700 |
css=UI_CSS,
|
| 2701 |
head=UI_HEAD,
|
| 2702 |
)
|
tests/test_core_utils.py
CHANGED
|
@@ -4,11 +4,16 @@
|
|
| 4 |
from __future__ import annotations
|
| 5 |
|
| 6 |
import json
|
|
|
|
| 7 |
import tempfile
|
| 8 |
import unittest
|
| 9 |
from unittest import mock
|
| 10 |
from pathlib import Path
|
| 11 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
import app
|
| 13 |
from scripts import eval_sota
|
| 14 |
from scripts import train_sota
|
|
@@ -57,6 +62,10 @@ class AppUtilityTests(unittest.TestCase):
|
|
| 57 |
self.assertIn("run-20260101-000000", html)
|
| 58 |
self.assertIn("completed", html)
|
| 59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
def test_persist_run_artifacts_updates_history(self) -> None:
|
| 61 |
with tempfile.TemporaryDirectory() as tmpdir:
|
| 62 |
history_path = Path(tmpdir) / "run_history.json"
|
|
|
|
| 4 |
from __future__ import annotations
|
| 5 |
|
| 6 |
import json
|
| 7 |
+
import sys
|
| 8 |
import tempfile
|
| 9 |
import unittest
|
| 10 |
from unittest import mock
|
| 11 |
from pathlib import Path
|
| 12 |
|
| 13 |
+
ROOT = Path(__file__).resolve().parents[1]
|
| 14 |
+
if str(ROOT) not in sys.path:
|
| 15 |
+
sys.path.insert(0, str(ROOT))
|
| 16 |
+
|
| 17 |
import app
|
| 18 |
from scripts import eval_sota
|
| 19 |
from scripts import train_sota
|
|
|
|
| 62 |
self.assertIn("run-20260101-000000", html)
|
| 63 |
self.assertIn("completed", html)
|
| 64 |
|
| 65 |
+
def test_run_result_badge_class_handles_preflight_variants(self) -> None:
|
| 66 |
+
self.assertEqual(app._run_result_badge_class("preflight_passed"), "ok")
|
| 67 |
+
self.assertEqual(app._run_result_badge_class("preflight passed"), "ok")
|
| 68 |
+
|
| 69 |
def test_persist_run_artifacts_updates_history(self) -> None:
|
| 70 |
with tempfile.TemporaryDirectory() as tmpdir:
|
| 71 |
history_path = Path(tmpdir) / "run_history.json"
|