DevanshuDon commited on
Commit
269af96
·
verified ·
1 Parent(s): 3786220

Upload 3 files

Browse files
Files changed (1) hide show
  1. README.md +25 -4
README.md CHANGED
@@ -122,7 +122,18 @@ All scores are deterministic and bounded to [0, 1]. Scenarios are randomized at
122
 
123
  These were added because GRPO will find shortcuts. During training the model briefly collapsed to a single short safe response — the penalties + KL regularization fixed it cleanly.
124
 
125
- **Architectural note on rubrics.** The reward is composed from independent scoring functions (one per dimension: email quality, scheduling correctness, conflict resolution) plus four named penalty checks. Each function returns a value in [0, 1] (or a negative penalty) and is mixed by the task-specific weighting shown in the Tasks table. This is structurally a composable rubric — any individual grader can be swapped, reweighted, or audited in isolation. We implemented it as plain Python rather than OpenEnv's `Rubric` class for hackathon speed, but the design pattern (independent, composable, auditable) is the same.
 
 
 
 
 
 
 
 
 
 
 
126
 
127
  ---
128
 
@@ -212,9 +223,19 @@ GRPOConfig(
212
 
213
  ## Architecture note
214
 
215
- The environment is implemented as a FastAPI application that exposes the OpenEnv-spec endpoints (`/reset`, `/step`, `/state`, `/tasks`, `/health`, `/metadata`, `/schema`) directly, rather than extending `openenv.Environment` as a Python class. Both implementations are spec-compliant — they expose the same JSON-over-HTTP interface — but the FastAPI-direct approach gave us finer control over the multi-component reward function and Pydantic validation during the time-boxed hackathon build.
216
-
217
- The client (`client.py`) does extend `openenv.EnvClient` and provides the standard Gym-style typed interface, so any code that uses an `EnvClient` to talk to this Space will work without modification. Client/server separation is preserved the client only imports models, never server internals.
 
 
 
 
 
 
 
 
 
 
218
 
219
  ---
220
 
 
122
 
123
  These were added because GRPO will find shortcuts. During training the model briefly collapsed to a single short safe response — the penalties + KL regularization fixed it cleanly.
124
 
125
+ **Architectural note on rubrics.** The reward is composed from independent
126
+ scoring functions (one per dimension: email quality, scheduling correctness,
127
+ conflict resolution) plus four named penalty checks. Each function returns
128
+ a value in [0, 1] (or a negative penalty) and is mixed by the task-specific
129
+ weighting shown in the Tasks table. Structurally this is a composable rubric
130
+ — any individual grader can be swapped, reweighted, or audited in isolation.
131
+
132
+ We verified at submission time that `from openenv import Rubric` raises
133
+ `ImportError` against the published `openenv-core` package, so direct subclassing
134
+ of an OpenEnv `Rubric` base class was not available. The plain-Python
135
+ implementation produces the same composable, auditable behavior at the
136
+ function level.
137
 
138
  ---
139
 
 
223
 
224
  ## Architecture note
225
 
226
+ The environment is implemented as a FastAPI application that exposes the
227
+ OpenEnv-spec endpoints (`/reset`, `/step`, `/state`, `/tasks`, `/health`,
228
+ `/metadata`, `/schema`) directly. We verified at submission time that
229
+ `from openenv import Environment` raises `ImportError` against the published
230
+ `openenv-core` package, so direct subclassing of an OpenEnv `Environment`
231
+ base class was not available. FastAPI gave us complete control over the
232
+ JSON-over-HTTP interface, which is what the spec actually requires.
233
+
234
+ The client (`client.py`) does extend `openenv.EnvClient` (which IS exposed
235
+ in the published package) and provides the standard Gym-style typed interface,
236
+ so any code that uses an `EnvClient` to talk to this Space will work without
237
+ modification. Client/server separation is preserved — the client only imports
238
+ typed models, never server internals.
239
 
240
  ---
241