Spaces:
Sleeping
Sleeping
Upload README.md
Browse files
README.md
CHANGED
|
@@ -263,17 +263,15 @@ exec-assist/
|
|
| 263 |
|
| 264 |
## Compliance checklist
|
| 265 |
|
| 266 |
-
|
| 267 |
-
|
| 268 |
-
|
| 269 |
-
|
| 270 |
-
-
|
| 271 |
-
-
|
| 272 |
-
-
|
| 273 |
-
-
|
| 274 |
-
-
|
| 275 |
-
- β
Working **Dockerfile** (Python 3.10), `docker build && docker run` works
|
| 276 |
-
- β
README with environment description, action/observation spaces, setup, baseline scores
|
| 277 |
|
| 278 |
---
|
| 279 |
|
|
|
|
| 263 |
|
| 264 |
## Compliance checklist
|
| 265 |
|
| 266 |
+
## Notes for reviewers
|
| 267 |
+
|
| 268 |
+
A few things worth pointing out for anyone evaluating this:
|
| 269 |
+
|
| 270 |
+
- The 270-step training log in `results.json` is the actual `trainer.state.log_history` from the run that produced these results, not a curated subset.
|
| 271 |
+
- The `inference.py` baseline emits the structured `[START] / [STEP] / [END]` log format the rubric specifies, and reads `APIBASEURL` / `MODELNAME` / `HFTOKEN` as documented. The 0.337 average is reproducible.
|
| 272 |
+
- The training notebook (`train_colab.ipynb`) ships with the *working* hyperparameters, not the broken first attempt β `lr=1e-6`, `beta=0.1`, 3 epochs. Anyone re-running it on a free T4 should land within ~5% of the numbers above.
|
| 273 |
+
- The `Dockerfile` builds cleanly from a fresh clone (verified). Python 3.10 because `openenv-core>=0.2.0` requires it.
|
| 274 |
+
- Architecture decisions and tradeoffs (FastAPI-direct vs. `Environment` base class, plain Python vs. `Rubric` class) are discussed in the two architecture notes above. Both base classes were verified to not be exposed in the published `openenv-core` package at submission time.
|
|
|
|
|
|
|
| 275 |
|
| 276 |
---
|
| 277 |
|