Spaces:

DevanshuDon
/

exec-assist

Sleeping

App Files Files Community

DevanshuDon commited on 25 days ago

Commit

269af96

verified ·

1 Parent(s): 3786220

Upload 3 files

Browse files

Files changed (1) hide show

README.md +25 -4

README.md CHANGED Viewed

@@ -122,7 +122,18 @@ All scores are deterministic and bounded to [0, 1]. Scenarios are randomized at
 These were added because GRPO will find shortcuts. During training the model briefly collapsed to a single short safe response — the penalties + KL regularization fixed it cleanly.
-**Architectural note on rubrics.** The reward is composed from independent scoring functions (one per dimension: email quality, scheduling correctness, conflict resolution) plus four named penalty checks. Each function returns a value in [0, 1] (or a negative penalty) and is mixed by the task-specific weighting shown in the Tasks table. This is structurally a composable rubric — any individual grader can be swapped, reweighted, or audited in isolation. We implemented it as plain Python rather than OpenEnv's `Rubric` class for hackathon speed, but the design pattern (independent, composable, auditable) is the same.
 ---
@@ -212,9 +223,19 @@ GRPOConfig(
 ## Architecture note
-The environment is implemented as a FastAPI application that exposes the OpenEnv-spec endpoints (`/reset`, `/step`, `/state`, `/tasks`, `/health`, `/metadata`, `/schema`) directly, rather than extending `openenv.Environment` as a Python class. Both implementations are spec-compliant — they expose the same JSON-over-HTTP interface — but the FastAPI-direct approach gave us finer control over the multi-component reward function and Pydantic validation during the time-boxed hackathon build.
-The client (`client.py`) does extend `openenv.EnvClient` and provides the standard Gym-style typed interface, so any code that uses an `EnvClient` to talk to this Space will work without modification. Client/server separation is preserved — the client only imports models, never server internals.
 ---

 These were added because GRPO will find shortcuts. During training the model briefly collapsed to a single short safe response — the penalties + KL regularization fixed it cleanly.
+**Architectural note on rubrics.** The reward is composed from independent
+scoring functions (one per dimension: email quality, scheduling correctness,
+conflict resolution) plus four named penalty checks. Each function returns
+a value in [0, 1] (or a negative penalty) and is mixed by the task-specific
+weighting shown in the Tasks table. Structurally this is a composable rubric
+— any individual grader can be swapped, reweighted, or audited in isolation.
+We verified at submission time that `from openenv import Rubric` raises
+`ImportError` against the published `openenv-core` package, so direct subclassing
+of an OpenEnv `Rubric` base class was not available. The plain-Python
+implementation produces the same composable, auditable behavior at the
+function level.
 ---
 ## Architecture note
+The environment is implemented as a FastAPI application that exposes the
+OpenEnv-spec endpoints (`/reset`, `/step`, `/state`, `/tasks`, `/health`,
+`/metadata`, `/schema`) directly. We verified at submission time that
+`from openenv import Environment` raises `ImportError` against the published
+`openenv-core` package, so direct subclassing of an OpenEnv `Environment`
+base class was not available. FastAPI gave us complete control over the
+JSON-over-HTTP interface, which is what the spec actually requires.
+The client (`client.py`) does extend `openenv.EnvClient` (which IS exposed
+in the published package) and provides the standard Gym-style typed interface,
+so any code that uses an `EnvClient` to talk to this Space will work without
+modification. Client/server separation is preserved — the client only imports
+typed models, never server internals.
 ---