Spaces:
Sleeping
Sleeping
Switch Space trainer defaults to math_conjecture_sota profile and remove DeepSeek references
9a4f619 verified | # Space Trainer Validation Log | |
| Date (UTC): 2026-02-28 10:24:36 UTC | |
| ## Scope Reviewed | |
| Reviewed the full `space_trainer/` implementation surface used by the Hugging Face Space runtime: | |
| - `space_trainer/app.py` | |
| - `space_trainer/README.md` | |
| - `space_trainer/PRODUCTION.md` | |
| - `space_trainer/.env.example` | |
| - `space_trainer/requirements.txt` | |
| - `space_trainer/configs/math_conjecture_sota.yaml` | |
| - `space_trainer/scripts/preflight_check.py` | |
| - `space_trainer/scripts/train_sota.py` | |
| - `space_trainer/scripts/eval_sota.py` | |
| - `space_trainer/tests/test_core_utils.py` | |
| - Existing workspace runtime/run artifacts under `space_trainer/workspace/` | |
| ## Issues Found | |
| 1. UI result badge mapping treated `preflight passed` as neutral because `_` was converted to spaces before class lookup. | |
| 2. Unit tests failed when run from repository root due import path assumptions (`ModuleNotFoundError: app`). | |
| ## Fixes Applied | |
| 1. `space_trainer/app.py` | |
| - Normalized run result strings in `_run_result_badge_class()` to handle underscore/space/hyphen variants. | |
| - Updated recent-runs badge rendering to classify by raw result key and only prettify the display label. | |
| - Kept Gradio theme/css/head in `launch()` (Gradio 6.6 recommended path), and set queue configuration once at module load with `demo.queue(default_concurrency_limit=1)`. | |
| 2. `space_trainer/tests/test_core_utils.py` | |
| - Added deterministic `sys.path` insertion for `space_trainer/` root so tests pass from both: | |
| - repo root (`python -m unittest discover -s space_trainer/tests -v`) | |
| - `space_trainer/` directory (`python -m unittest discover -s tests -v`) | |
| - Added regression test for preflight badge-class normalization. | |
| ## Validation Commands and Results | |
| 1. Preflight checks: | |
| - Command: `.venv/bin/python space_trainer/scripts/preflight_check.py --json` | |
| - Result: PASS (`"ok": true`) | |
| 2. Unit tests from repo root: | |
| - Command: `.venv/bin/python -m unittest discover -s space_trainer/tests -v` | |
| - Result: PASS (`Ran 15 tests`, `OK`) | |
| 3. Unit tests from `space_trainer/`: | |
| - Command: `../.venv/bin/python -m unittest discover -s tests -v` | |
| - Result: PASS (`Ran 15 tests`, `OK`) | |
| 4. Python syntax compile check: | |
| - Command: `../.venv/bin/python -m py_compile app.py scripts/preflight_check.py scripts/train_sota.py scripts/eval_sota.py tests/test_core_utils.py` | |
| - Result: PASS | |
| 5. Gradio app object/config smoke check: | |
| - Command: `../.venv/bin/python - <<'PY' ... app.demo.get_config_file() ... PY` | |
| - Result: PASS (`mode=blocks`, `components=44`, `dependencies=3`, `queue_set=True`) | |
| ## Environment Notes | |
| - CUDA warning appears in this environment (`cudaGetDeviceCount` OS unsupported). This is expected on non-GPU hosts and handled by app CPU fallback logic. | |
| - Fast tokenizer fallback warning (`protobuf missing`) is already handled by project fallback code and validated by tests. | |
| - Direct local `app.py` server launch in this sandbox cannot bind any Gradio ports (`Cannot find empty port...`). This is an execution-environment limitation, not a code-level validation failure. | |
| ## Current Status | |
| - UI telemetry classification bug fixed. | |
| - Test reliability improved. | |
| - Preflight + tests + compile checks are passing. | |
| - Space runtime code path is consistent and ready for deployment validation inside Hugging Face Spaces. | |
| --- | |
| ## Rewrite Session | |
| Date (UTC): 2026-02-28 11:56:17 UTC | |
| ### Objective | |
| - Reprogram `app.py` from scratch. | |
| - Switch UI to a full monochrome theme. | |
| - Preserve full end-to-end pipeline functionality in a newly structured implementation. | |
| ### Implementation Summary | |
| - Replaced `space_trainer/app.py` entirely with a new architecture and new UI/CSS/HTML structure. | |
| - Kept all major operational capabilities: | |
| - dataset download and cache handling | |
| - runtime config generation | |
| - staged training subprocess orchestration | |
| - optional post-training evaluation fallback path | |
| - quality gate + push status surfacing | |
| - continuous auto-restart with cooldown and circuit breaker | |
| - cancellation controls | |
| - run history persistence and recent-runs panel | |
| - Kept compatibility for existing tests and tooling contracts (e.g., helper function names used by tests and preflight checks). | |
| ### Monochrome Redesign | |
| - New monochrome command-center visual language with grayscale-only palette. | |
| - New telemetry card layout, stage timeline, recent-runs view, and loss sparkline styling. | |
| - New hero header and runtime timestamp script in `UI_HEAD`. | |
| ### Verification Executed | |
| 1. Syntax check: | |
| - `../.venv/bin/python -m py_compile app.py` | |
| - Result: PASS | |
| 2. Preflight: | |
| - `../.venv/bin/python scripts/preflight_check.py --json` | |
| - Result: PASS (`"ok": true`) | |
| 3. Unit tests: | |
| - `../.venv/bin/python -m unittest discover -s tests -v` | |
| - Result: PASS (`Ran 15 tests`, `OK`) | |
| 4. Gradio config smoke check: | |
| - `../.venv/bin/python - <<'PY' ... app.demo.get_config_file() ... PY` | |
| - Result: PASS (`mode=blocks`, `components=44`, `dependencies=3`, `stage_count=4`) | |
| --- | |
| ## Footer + Continuous Enforcement Session | |
| Date (UTC): 2026-02-28 12:45:36 UTC | |
| ### Requested Changes | |
| - Remove default Gradio footer controls (`Use via API`, logo, settings) from footer area. | |
| - Place API/settings access in a better UI location. | |
| - Ensure training runs in continuous mode. | |
| ### Implementation | |
| 1. Footer controls removed from Gradio launch: | |
| - Added `footer_links=[]` in `demo.launch(...)`. | |
| 2. API/settings moved into hero section: | |
| - Added `.mono-link-row` with: | |
| - `/gradio_api/docs` | |
| - `https://huggingface.co/spaces/NorthernTribe-Research/math_trainer/settings` | |
| - Added matching CSS styles for the new header links. | |
| 3. Continuous mode enforced: | |
| - Runtime enforcement in `run_pipeline(...)`: | |
| - `continuous_mode = not bool(preflight_only)` | |
| - UI control locked to enforced-on: | |
| - `Continuous Auto-Restart (Enforced)` with `interactive=False`. | |
| ### Verification | |
| - `../.venv/bin/python -m py_compile app.py` -> PASS | |
| - `../.venv/bin/python scripts/preflight_check.py --json` -> PASS | |
| - `../.venv/bin/python -m unittest discover -s tests -v` -> PASS (`Ran 15 tests`, `OK`) | |
| ### Deployment | |
| - Space: `NorthernTribe-Research/math_trainer` | |
| - Commit: `c8a24f966d710173764da0355e56632af9e66c40` | |
| - Runtime after deploy: `RUNNING` | |
| - `https://northerntribe-research-math-trainer.hf.space/config` -> `200` JSON | |