Spaces:

NorthernTribe-Research
/

math_trainer

Sleeping

App Files Files Community

NorthernTribe-Research commited on Feb 28

Commit

b18dd63

verified ·

1 Parent(s): f0734c2

Harden UI + validation fixes for Space trainer

Browse files

Files changed (4) hide show

README.md +4 -0
VALIDATION_LOG.md +72 -0
app.py +11 -6
tests/test_core_utils.py +9 -0

README.md CHANGED Viewed

@@ -93,3 +93,7 @@ Recommended runtime secrets posture:
 - avoid storing long-lived API tokens in repository files
 Detailed deployment/rollback steps are documented in `PRODUCTION.md`.

 - avoid storing long-lived API tokens in repository files
 Detailed deployment/rollback steps are documented in `PRODUCTION.md`.
+## Validation Record
+- Latest verification and hardening run details are recorded in `VALIDATION_LOG.md`.

VALIDATION_LOG.md ADDED Viewed

	@@ -0,0 +1,72 @@

+# Space Trainer Validation Log
+Date (UTC): 2026-02-28 10:24:36 UTC
+## Scope Reviewed
+Reviewed the full `space_trainer/` implementation surface used by the Hugging Face Space runtime:
+- `space_trainer/app.py`
+- `space_trainer/README.md`
+- `space_trainer/PRODUCTION.md`
+- `space_trainer/.env.example`
+- `space_trainer/requirements.txt`
+- `space_trainer/configs/deepseek_math_sota.yaml`
+- `space_trainer/scripts/preflight_check.py`
+- `space_trainer/scripts/train_sota.py`
+- `space_trainer/scripts/eval_sota.py`
+- `space_trainer/tests/test_core_utils.py`
+- Existing workspace runtime/run artifacts under `space_trainer/workspace/`
+## Issues Found
+1. UI result badge mapping treated `preflight passed` as neutral because `_` was converted to spaces before class lookup.
+2. Unit tests failed when run from repository root due import path assumptions (`ModuleNotFoundError: app`).
+## Fixes Applied
+1. `space_trainer/app.py`
+- Normalized run result strings in `_run_result_badge_class()` to handle underscore/space/hyphen variants.
+- Updated recent-runs badge rendering to classify by raw result key and only prettify the display label.
+- Kept Gradio theme/css/head in `launch()` (Gradio 6.6 recommended path), and set queue configuration once at module load with `demo.queue(default_concurrency_limit=1)`.
+2. `space_trainer/tests/test_core_utils.py`
+- Added deterministic `sys.path` insertion for `space_trainer/` root so tests pass from both:
+  - repo root (`python -m unittest discover -s space_trainer/tests -v`)
+  - `space_trainer/` directory (`python -m unittest discover -s tests -v`)
+- Added regression test for preflight badge-class normalization.
+## Validation Commands and Results
+1. Preflight checks:
+- Command: `.venv/bin/python space_trainer/scripts/preflight_check.py --json`
+- Result: PASS (`"ok": true`)
+2. Unit tests from repo root:
+- Command: `.venv/bin/python -m unittest discover -s space_trainer/tests -v`
+- Result: PASS (`Ran 15 tests`, `OK`)
+3. Unit tests from `space_trainer/`:
+- Command: `../.venv/bin/python -m unittest discover -s tests -v`
+- Result: PASS (`Ran 15 tests`, `OK`)
+4. Python syntax compile check:
+- Command: `../.venv/bin/python -m py_compile app.py scripts/preflight_check.py scripts/train_sota.py scripts/eval_sota.py tests/test_core_utils.py`
+- Result: PASS
+5. Gradio app object/config smoke check:
+- Command: `../.venv/bin/python - <<'PY' ... app.demo.get_config_file() ... PY`
+- Result: PASS (`mode=blocks`, `components=44`, `dependencies=3`, `queue_set=True`)
+## Environment Notes
+- CUDA warning appears in this environment (`cudaGetDeviceCount` OS unsupported). This is expected on non-GPU hosts and handled by app CPU fallback logic.
+- Fast tokenizer fallback warning (`protobuf missing`) is already handled by project fallback code and validated by tests.
+- Direct local `app.py` server launch in this sandbox cannot bind any Gradio ports (`Cannot find empty port...`). This is an execution-environment limitation, not a code-level validation failure.
+## Current Status
+- UI telemetry classification bug fixed.
+- Test reliability improved.
+- Preflight + tests + compile checks are passing.
+- Space runtime code path is consistent and ready for deployment validation inside Hugging Face Spaces.

app.py CHANGED Viewed

@@ -1352,7 +1352,7 @@ def _refresh_recent_runs(summary: Dict[str, Any], log_lines: List[str]) -> None:
 def _run_result_badge_class(result_text: str) -> str:
-    normalized = (result_text or "").strip().lower()
     if normalized in {"completed", "preflight_passed"}:
         return "ok"
     if normalized in {"failed", "error"}:
@@ -1373,9 +1373,9 @@ def _build_recent_runs_panel(summary: Dict[str, Any]) -> str:
     lines: List[str] = []
     for entry in entries[:RECENT_RUNS_VISUAL_LIMIT]:
         run_label = html.escape(str(entry.get("run_label") or "--"))
-        result_text = str(entry.get("result") or "unknown").replace("_", " ").strip().lower()
-        badge_cls = _run_result_badge_class(result_text)
-        badge_label = html.escape(result_text)
         evaluation = _as_dict(entry.get("evaluation"))
         pass_1 = _fmt_pct(evaluation.get("pass_at_1"))
         pass_k = _fmt_pct(evaluation.get("pass_at_k"))
@@ -2584,6 +2584,9 @@ def run_pipeline(
         cycle_index += 1
 with gr.Blocks(title="Math Conjecture Trainer Space") as demo:
     gr.HTML(APP_HEADER_HTML)
     gr.Markdown(PROJECT_DESCRIPTION, elem_classes=["section-copy"])
@@ -2688,10 +2691,12 @@ with gr.Blocks(title="Math Conjecture Trainer Space") as demo:
     clear_button.click(fn=clear_outputs, inputs=None, outputs=[logs, status, ops_visual], queue=False)
     gr.HTML(APP_FOOTER_HTML)
 if __name__ == "__main__":
-    demo.queue(default_concurrency_limit=1).launch(
-        theme=gr.themes.Default(primary_hue="gray"),
         css=UI_CSS,
         head=UI_HEAD,
     )

 def _run_result_badge_class(result_text: str) -> str:
+    normalized = (result_text or "").strip().lower().replace("-", "_").replace(" ", "_")
     if normalized in {"completed", "preflight_passed"}:
         return "ok"
     if normalized in {"failed", "error"}:
     lines: List[str] = []
     for entry in entries[:RECENT_RUNS_VISUAL_LIMIT]:
         run_label = html.escape(str(entry.get("run_label") or "--"))
+        raw_result = str(entry.get("result") or "unknown").strip().lower()
+        badge_cls = _run_result_badge_class(raw_result)
+        badge_label = html.escape(raw_result.replace("_", " "))
         evaluation = _as_dict(entry.get("evaluation"))
         pass_1 = _fmt_pct(evaluation.get("pass_at_1"))
         pass_k = _fmt_pct(evaluation.get("pass_at_k"))
         cycle_index += 1
+APP_THEME = gr.themes.Default(primary_hue="gray")
 with gr.Blocks(title="Math Conjecture Trainer Space") as demo:
     gr.HTML(APP_HEADER_HTML)
     gr.Markdown(PROJECT_DESCRIPTION, elem_classes=["section-copy"])
     clear_button.click(fn=clear_outputs, inputs=None, outputs=[logs, status, ops_visual], queue=False)
     gr.HTML(APP_FOOTER_HTML)
+demo.queue(default_concurrency_limit=1)
 if __name__ == "__main__":
+    demo.launch(
+        theme=APP_THEME,
         css=UI_CSS,
         head=UI_HEAD,
     )

tests/test_core_utils.py CHANGED Viewed

@@ -4,11 +4,16 @@
 from __future__ import annotations
 import json
 import tempfile
 import unittest
 from unittest import mock
 from pathlib import Path
 import app
 from scripts import eval_sota
 from scripts import train_sota
@@ -57,6 +62,10 @@ class AppUtilityTests(unittest.TestCase):
         self.assertIn("run-20260101-000000", html)
         self.assertIn("completed", html)
     def test_persist_run_artifacts_updates_history(self) -> None:
         with tempfile.TemporaryDirectory() as tmpdir:
             history_path = Path(tmpdir) / "run_history.json"

 from __future__ import annotations
 import json
+import sys
 import tempfile
 import unittest
 from unittest import mock
 from pathlib import Path
+ROOT = Path(__file__).resolve().parents[1]
+if str(ROOT) not in sys.path:
+    sys.path.insert(0, str(ROOT))
 import app
 from scripts import eval_sota
 from scripts import train_sota
         self.assertIn("run-20260101-000000", html)
         self.assertIn("completed", html)
+    def test_run_result_badge_class_handles_preflight_variants(self) -> None:
+        self.assertEqual(app._run_result_badge_class("preflight_passed"), "ok")
+        self.assertEqual(app._run_result_badge_class("preflight passed"), "ok")
     def test_persist_run_artifacts_updates_history(self) -> None:
         with tempfile.TemporaryDirectory() as tmpdir:
             history_path = Path(tmpdir) / "run_history.json"