Spaces:
Sleeping
Sleeping
Upload 3 files
Browse files
README.md
CHANGED
|
@@ -122,7 +122,18 @@ All scores are deterministic and bounded to [0, 1]. Scenarios are randomized at
|
|
| 122 |
|
| 123 |
These were added because GRPO will find shortcuts. During training the model briefly collapsed to a single short safe response — the penalties + KL regularization fixed it cleanly.
|
| 124 |
|
| 125 |
-
**Architectural note on rubrics.** The reward is composed from independent
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 126 |
|
| 127 |
---
|
| 128 |
|
|
@@ -212,9 +223,19 @@ GRPOConfig(
|
|
| 212 |
|
| 213 |
## Architecture note
|
| 214 |
|
| 215 |
-
The environment is implemented as a FastAPI application that exposes the
|
| 216 |
-
|
| 217 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 218 |
|
| 219 |
---
|
| 220 |
|
|
|
|
| 122 |
|
| 123 |
These were added because GRPO will find shortcuts. During training the model briefly collapsed to a single short safe response — the penalties + KL regularization fixed it cleanly.
|
| 124 |
|
| 125 |
+
**Architectural note on rubrics.** The reward is composed from independent
|
| 126 |
+
scoring functions (one per dimension: email quality, scheduling correctness,
|
| 127 |
+
conflict resolution) plus four named penalty checks. Each function returns
|
| 128 |
+
a value in [0, 1] (or a negative penalty) and is mixed by the task-specific
|
| 129 |
+
weighting shown in the Tasks table. Structurally this is a composable rubric
|
| 130 |
+
— any individual grader can be swapped, reweighted, or audited in isolation.
|
| 131 |
+
|
| 132 |
+
We verified at submission time that `from openenv import Rubric` raises
|
| 133 |
+
`ImportError` against the published `openenv-core` package, so direct subclassing
|
| 134 |
+
of an OpenEnv `Rubric` base class was not available. The plain-Python
|
| 135 |
+
implementation produces the same composable, auditable behavior at the
|
| 136 |
+
function level.
|
| 137 |
|
| 138 |
---
|
| 139 |
|
|
|
|
| 223 |
|
| 224 |
## Architecture note
|
| 225 |
|
| 226 |
+
The environment is implemented as a FastAPI application that exposes the
|
| 227 |
+
OpenEnv-spec endpoints (`/reset`, `/step`, `/state`, `/tasks`, `/health`,
|
| 228 |
+
`/metadata`, `/schema`) directly. We verified at submission time that
|
| 229 |
+
`from openenv import Environment` raises `ImportError` against the published
|
| 230 |
+
`openenv-core` package, so direct subclassing of an OpenEnv `Environment`
|
| 231 |
+
base class was not available. FastAPI gave us complete control over the
|
| 232 |
+
JSON-over-HTTP interface, which is what the spec actually requires.
|
| 233 |
+
|
| 234 |
+
The client (`client.py`) does extend `openenv.EnvClient` (which IS exposed
|
| 235 |
+
in the published package) and provides the standard Gym-style typed interface,
|
| 236 |
+
so any code that uses an `EnvClient` to talk to this Space will work without
|
| 237 |
+
modification. Client/server separation is preserved — the client only imports
|
| 238 |
+
typed models, never server internals.
|
| 239 |
|
| 240 |
---
|
| 241 |
|