Spaces:
Sleeping
Sleeping
File size: 6,229 Bytes
b18dd63 9a4f619 b18dd63 10565d4 6d3e802 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 | # Space Trainer Validation Log
Date (UTC): 2026-02-28 10:24:36 UTC
## Scope Reviewed
Reviewed the full `space_trainer/` implementation surface used by the Hugging Face Space runtime:
- `space_trainer/app.py`
- `space_trainer/README.md`
- `space_trainer/PRODUCTION.md`
- `space_trainer/.env.example`
- `space_trainer/requirements.txt`
- `space_trainer/configs/math_conjecture_sota.yaml`
- `space_trainer/scripts/preflight_check.py`
- `space_trainer/scripts/train_sota.py`
- `space_trainer/scripts/eval_sota.py`
- `space_trainer/tests/test_core_utils.py`
- Existing workspace runtime/run artifacts under `space_trainer/workspace/`
## Issues Found
1. UI result badge mapping treated `preflight passed` as neutral because `_` was converted to spaces before class lookup.
2. Unit tests failed when run from repository root due import path assumptions (`ModuleNotFoundError: app`).
## Fixes Applied
1. `space_trainer/app.py`
- Normalized run result strings in `_run_result_badge_class()` to handle underscore/space/hyphen variants.
- Updated recent-runs badge rendering to classify by raw result key and only prettify the display label.
- Kept Gradio theme/css/head in `launch()` (Gradio 6.6 recommended path), and set queue configuration once at module load with `demo.queue(default_concurrency_limit=1)`.
2. `space_trainer/tests/test_core_utils.py`
- Added deterministic `sys.path` insertion for `space_trainer/` root so tests pass from both:
- repo root (`python -m unittest discover -s space_trainer/tests -v`)
- `space_trainer/` directory (`python -m unittest discover -s tests -v`)
- Added regression test for preflight badge-class normalization.
## Validation Commands and Results
1. Preflight checks:
- Command: `.venv/bin/python space_trainer/scripts/preflight_check.py --json`
- Result: PASS (`"ok": true`)
2. Unit tests from repo root:
- Command: `.venv/bin/python -m unittest discover -s space_trainer/tests -v`
- Result: PASS (`Ran 15 tests`, `OK`)
3. Unit tests from `space_trainer/`:
- Command: `../.venv/bin/python -m unittest discover -s tests -v`
- Result: PASS (`Ran 15 tests`, `OK`)
4. Python syntax compile check:
- Command: `../.venv/bin/python -m py_compile app.py scripts/preflight_check.py scripts/train_sota.py scripts/eval_sota.py tests/test_core_utils.py`
- Result: PASS
5. Gradio app object/config smoke check:
- Command: `../.venv/bin/python - <<'PY' ... app.demo.get_config_file() ... PY`
- Result: PASS (`mode=blocks`, `components=44`, `dependencies=3`, `queue_set=True`)
## Environment Notes
- CUDA warning appears in this environment (`cudaGetDeviceCount` OS unsupported). This is expected on non-GPU hosts and handled by app CPU fallback logic.
- Fast tokenizer fallback warning (`protobuf missing`) is already handled by project fallback code and validated by tests.
- Direct local `app.py` server launch in this sandbox cannot bind any Gradio ports (`Cannot find empty port...`). This is an execution-environment limitation, not a code-level validation failure.
## Current Status
- UI telemetry classification bug fixed.
- Test reliability improved.
- Preflight + tests + compile checks are passing.
- Space runtime code path is consistent and ready for deployment validation inside Hugging Face Spaces.
---
## Rewrite Session
Date (UTC): 2026-02-28 11:56:17 UTC
### Objective
- Reprogram `app.py` from scratch.
- Switch UI to a full monochrome theme.
- Preserve full end-to-end pipeline functionality in a newly structured implementation.
### Implementation Summary
- Replaced `space_trainer/app.py` entirely with a new architecture and new UI/CSS/HTML structure.
- Kept all major operational capabilities:
- dataset download and cache handling
- runtime config generation
- staged training subprocess orchestration
- optional post-training evaluation fallback path
- quality gate + push status surfacing
- continuous auto-restart with cooldown and circuit breaker
- cancellation controls
- run history persistence and recent-runs panel
- Kept compatibility for existing tests and tooling contracts (e.g., helper function names used by tests and preflight checks).
### Monochrome Redesign
- New monochrome command-center visual language with grayscale-only palette.
- New telemetry card layout, stage timeline, recent-runs view, and loss sparkline styling.
- New hero header and runtime timestamp script in `UI_HEAD`.
### Verification Executed
1. Syntax check:
- `../.venv/bin/python -m py_compile app.py`
- Result: PASS
2. Preflight:
- `../.venv/bin/python scripts/preflight_check.py --json`
- Result: PASS (`"ok": true`)
3. Unit tests:
- `../.venv/bin/python -m unittest discover -s tests -v`
- Result: PASS (`Ran 15 tests`, `OK`)
4. Gradio config smoke check:
- `../.venv/bin/python - <<'PY' ... app.demo.get_config_file() ... PY`
- Result: PASS (`mode=blocks`, `components=44`, `dependencies=3`, `stage_count=4`)
---
## Footer + Continuous Enforcement Session
Date (UTC): 2026-02-28 12:45:36 UTC
### Requested Changes
- Remove default Gradio footer controls (`Use via API`, logo, settings) from footer area.
- Place API/settings access in a better UI location.
- Ensure training runs in continuous mode.
### Implementation
1. Footer controls removed from Gradio launch:
- Added `footer_links=[]` in `demo.launch(...)`.
2. API/settings moved into hero section:
- Added `.mono-link-row` with:
- `/gradio_api/docs`
- `https://huggingface.co/spaces/NorthernTribe-Research/math_trainer/settings`
- Added matching CSS styles for the new header links.
3. Continuous mode enforced:
- Runtime enforcement in `run_pipeline(...)`:
- `continuous_mode = not bool(preflight_only)`
- UI control locked to enforced-on:
- `Continuous Auto-Restart (Enforced)` with `interactive=False`.
### Verification
- `../.venv/bin/python -m py_compile app.py` -> PASS
- `../.venv/bin/python scripts/preflight_check.py --json` -> PASS
- `../.venv/bin/python -m unittest discover -s tests -v` -> PASS (`Ran 15 tests`, `OK`)
### Deployment
- Space: `NorthernTribe-Research/math_trainer`
- Commit: `c8a24f966d710173764da0355e56632af9e66c40`
- Runtime after deploy: `RUNNING`
- `https://northerntribe-research-math-trainer.hf.space/config` -> `200` JSON
|