NorthernTribe-Research commited on
Commit
b18dd63
·
verified ·
1 Parent(s): f0734c2

Harden UI + validation fixes for Space trainer

Browse files
Files changed (4) hide show
  1. README.md +4 -0
  2. VALIDATION_LOG.md +72 -0
  3. app.py +11 -6
  4. tests/test_core_utils.py +9 -0
README.md CHANGED
@@ -93,3 +93,7 @@ Recommended runtime secrets posture:
93
  - avoid storing long-lived API tokens in repository files
94
 
95
  Detailed deployment/rollback steps are documented in `PRODUCTION.md`.
 
 
 
 
 
93
  - avoid storing long-lived API tokens in repository files
94
 
95
  Detailed deployment/rollback steps are documented in `PRODUCTION.md`.
96
+
97
+ ## Validation Record
98
+
99
+ - Latest verification and hardening run details are recorded in `VALIDATION_LOG.md`.
VALIDATION_LOG.md ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Space Trainer Validation Log
2
+
3
+ Date (UTC): 2026-02-28 10:24:36 UTC
4
+
5
+ ## Scope Reviewed
6
+
7
+ Reviewed the full `space_trainer/` implementation surface used by the Hugging Face Space runtime:
8
+
9
+ - `space_trainer/app.py`
10
+ - `space_trainer/README.md`
11
+ - `space_trainer/PRODUCTION.md`
12
+ - `space_trainer/.env.example`
13
+ - `space_trainer/requirements.txt`
14
+ - `space_trainer/configs/deepseek_math_sota.yaml`
15
+ - `space_trainer/scripts/preflight_check.py`
16
+ - `space_trainer/scripts/train_sota.py`
17
+ - `space_trainer/scripts/eval_sota.py`
18
+ - `space_trainer/tests/test_core_utils.py`
19
+ - Existing workspace runtime/run artifacts under `space_trainer/workspace/`
20
+
21
+ ## Issues Found
22
+
23
+ 1. UI result badge mapping treated `preflight passed` as neutral because `_` was converted to spaces before class lookup.
24
+ 2. Unit tests failed when run from repository root due import path assumptions (`ModuleNotFoundError: app`).
25
+
26
+ ## Fixes Applied
27
+
28
+ 1. `space_trainer/app.py`
29
+ - Normalized run result strings in `_run_result_badge_class()` to handle underscore/space/hyphen variants.
30
+ - Updated recent-runs badge rendering to classify by raw result key and only prettify the display label.
31
+ - Kept Gradio theme/css/head in `launch()` (Gradio 6.6 recommended path), and set queue configuration once at module load with `demo.queue(default_concurrency_limit=1)`.
32
+
33
+ 2. `space_trainer/tests/test_core_utils.py`
34
+ - Added deterministic `sys.path` insertion for `space_trainer/` root so tests pass from both:
35
+ - repo root (`python -m unittest discover -s space_trainer/tests -v`)
36
+ - `space_trainer/` directory (`python -m unittest discover -s tests -v`)
37
+ - Added regression test for preflight badge-class normalization.
38
+
39
+ ## Validation Commands and Results
40
+
41
+ 1. Preflight checks:
42
+ - Command: `.venv/bin/python space_trainer/scripts/preflight_check.py --json`
43
+ - Result: PASS (`"ok": true`)
44
+
45
+ 2. Unit tests from repo root:
46
+ - Command: `.venv/bin/python -m unittest discover -s space_trainer/tests -v`
47
+ - Result: PASS (`Ran 15 tests`, `OK`)
48
+
49
+ 3. Unit tests from `space_trainer/`:
50
+ - Command: `../.venv/bin/python -m unittest discover -s tests -v`
51
+ - Result: PASS (`Ran 15 tests`, `OK`)
52
+
53
+ 4. Python syntax compile check:
54
+ - Command: `../.venv/bin/python -m py_compile app.py scripts/preflight_check.py scripts/train_sota.py scripts/eval_sota.py tests/test_core_utils.py`
55
+ - Result: PASS
56
+
57
+ 5. Gradio app object/config smoke check:
58
+ - Command: `../.venv/bin/python - <<'PY' ... app.demo.get_config_file() ... PY`
59
+ - Result: PASS (`mode=blocks`, `components=44`, `dependencies=3`, `queue_set=True`)
60
+
61
+ ## Environment Notes
62
+
63
+ - CUDA warning appears in this environment (`cudaGetDeviceCount` OS unsupported). This is expected on non-GPU hosts and handled by app CPU fallback logic.
64
+ - Fast tokenizer fallback warning (`protobuf missing`) is already handled by project fallback code and validated by tests.
65
+ - Direct local `app.py` server launch in this sandbox cannot bind any Gradio ports (`Cannot find empty port...`). This is an execution-environment limitation, not a code-level validation failure.
66
+
67
+ ## Current Status
68
+
69
+ - UI telemetry classification bug fixed.
70
+ - Test reliability improved.
71
+ - Preflight + tests + compile checks are passing.
72
+ - Space runtime code path is consistent and ready for deployment validation inside Hugging Face Spaces.
app.py CHANGED
@@ -1352,7 +1352,7 @@ def _refresh_recent_runs(summary: Dict[str, Any], log_lines: List[str]) -> None:
1352
 
1353
 
1354
  def _run_result_badge_class(result_text: str) -> str:
1355
- normalized = (result_text or "").strip().lower()
1356
  if normalized in {"completed", "preflight_passed"}:
1357
  return "ok"
1358
  if normalized in {"failed", "error"}:
@@ -1373,9 +1373,9 @@ def _build_recent_runs_panel(summary: Dict[str, Any]) -> str:
1373
  lines: List[str] = []
1374
  for entry in entries[:RECENT_RUNS_VISUAL_LIMIT]:
1375
  run_label = html.escape(str(entry.get("run_label") or "--"))
1376
- result_text = str(entry.get("result") or "unknown").replace("_", " ").strip().lower()
1377
- badge_cls = _run_result_badge_class(result_text)
1378
- badge_label = html.escape(result_text)
1379
  evaluation = _as_dict(entry.get("evaluation"))
1380
  pass_1 = _fmt_pct(evaluation.get("pass_at_1"))
1381
  pass_k = _fmt_pct(evaluation.get("pass_at_k"))
@@ -2584,6 +2584,9 @@ def run_pipeline(
2584
  cycle_index += 1
2585
 
2586
 
 
 
 
2587
  with gr.Blocks(title="Math Conjecture Trainer Space") as demo:
2588
  gr.HTML(APP_HEADER_HTML)
2589
  gr.Markdown(PROJECT_DESCRIPTION, elem_classes=["section-copy"])
@@ -2688,10 +2691,12 @@ with gr.Blocks(title="Math Conjecture Trainer Space") as demo:
2688
  clear_button.click(fn=clear_outputs, inputs=None, outputs=[logs, status, ops_visual], queue=False)
2689
  gr.HTML(APP_FOOTER_HTML)
2690
 
 
 
2691
 
2692
  if __name__ == "__main__":
2693
- demo.queue(default_concurrency_limit=1).launch(
2694
- theme=gr.themes.Default(primary_hue="gray"),
2695
  css=UI_CSS,
2696
  head=UI_HEAD,
2697
  )
 
1352
 
1353
 
1354
  def _run_result_badge_class(result_text: str) -> str:
1355
+ normalized = (result_text or "").strip().lower().replace("-", "_").replace(" ", "_")
1356
  if normalized in {"completed", "preflight_passed"}:
1357
  return "ok"
1358
  if normalized in {"failed", "error"}:
 
1373
  lines: List[str] = []
1374
  for entry in entries[:RECENT_RUNS_VISUAL_LIMIT]:
1375
  run_label = html.escape(str(entry.get("run_label") or "--"))
1376
+ raw_result = str(entry.get("result") or "unknown").strip().lower()
1377
+ badge_cls = _run_result_badge_class(raw_result)
1378
+ badge_label = html.escape(raw_result.replace("_", " "))
1379
  evaluation = _as_dict(entry.get("evaluation"))
1380
  pass_1 = _fmt_pct(evaluation.get("pass_at_1"))
1381
  pass_k = _fmt_pct(evaluation.get("pass_at_k"))
 
2584
  cycle_index += 1
2585
 
2586
 
2587
+ APP_THEME = gr.themes.Default(primary_hue="gray")
2588
+
2589
+
2590
  with gr.Blocks(title="Math Conjecture Trainer Space") as demo:
2591
  gr.HTML(APP_HEADER_HTML)
2592
  gr.Markdown(PROJECT_DESCRIPTION, elem_classes=["section-copy"])
 
2691
  clear_button.click(fn=clear_outputs, inputs=None, outputs=[logs, status, ops_visual], queue=False)
2692
  gr.HTML(APP_FOOTER_HTML)
2693
 
2694
+ demo.queue(default_concurrency_limit=1)
2695
+
2696
 
2697
  if __name__ == "__main__":
2698
+ demo.launch(
2699
+ theme=APP_THEME,
2700
  css=UI_CSS,
2701
  head=UI_HEAD,
2702
  )
tests/test_core_utils.py CHANGED
@@ -4,11 +4,16 @@
4
  from __future__ import annotations
5
 
6
  import json
 
7
  import tempfile
8
  import unittest
9
  from unittest import mock
10
  from pathlib import Path
11
 
 
 
 
 
12
  import app
13
  from scripts import eval_sota
14
  from scripts import train_sota
@@ -57,6 +62,10 @@ class AppUtilityTests(unittest.TestCase):
57
  self.assertIn("run-20260101-000000", html)
58
  self.assertIn("completed", html)
59
 
 
 
 
 
60
  def test_persist_run_artifacts_updates_history(self) -> None:
61
  with tempfile.TemporaryDirectory() as tmpdir:
62
  history_path = Path(tmpdir) / "run_history.json"
 
4
  from __future__ import annotations
5
 
6
  import json
7
+ import sys
8
  import tempfile
9
  import unittest
10
  from unittest import mock
11
  from pathlib import Path
12
 
13
+ ROOT = Path(__file__).resolve().parents[1]
14
+ if str(ROOT) not in sys.path:
15
+ sys.path.insert(0, str(ROOT))
16
+
17
  import app
18
  from scripts import eval_sota
19
  from scripts import train_sota
 
62
  self.assertIn("run-20260101-000000", html)
63
  self.assertIn("completed", html)
64
 
65
+ def test_run_result_badge_class_handles_preflight_variants(self) -> None:
66
+ self.assertEqual(app._run_result_badge_class("preflight_passed"), "ok")
67
+ self.assertEqual(app._run_result_badge_class("preflight passed"), "ok")
68
+
69
  def test_persist_run_artifacts_updates_history(self) -> None:
70
  with tempfile.TemporaryDirectory() as tmpdir:
71
  history_path = Path(tmpdir) / "run_history.json"