Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -113,6 +113,11 @@ All scores are deterministic and bounded to [0, 1].
|
|
| 113 |
| **Scheduling correctness** | 0–1 | No double-booking, within working hours, appropriate duration (15min–2hrs), all participants included |
|
| 114 |
| **Conflict resolution** | 0–1 | Recognizes conflicts, proposes 2–3 alternatives, explains professionally, prioritizes correctly |
|
| 115 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 116 |
### Anti-reward-hacking penalties
|
| 117 |
|
| 118 |
- Short email (`< 20` words): **−0.30**
|
|
|
|
| 113 |
| **Scheduling correctness** | 0–1 | No double-booking, within working hours, appropriate duration (15min–2hrs), all participants included |
|
| 114 |
| **Conflict resolution** | 0–1 | Recognizes conflicts, proposes 2–3 alternatives, explains professionally, prioritizes correctly |
|
| 115 |
|
| 116 |
+
**Architectural note on rubrics.** The reward is composed from independent scoring functions (one per dimension: email quality, scheduling correctness,
|
| 117 |
+
conflict resolution) plus four named penalty checks. Each function returns a value in [0, 1] (or a negative penalty) and is mixed by the task-specific
|
| 118 |
+
weighting shown in the Tasks table. This is structurally a composable rubric — any individual grader can be swapped, weighted differently, or audited in
|
| 119 |
+
isolation. We implemented it as plain Python rather than OpenEnv's `Rubric` class for hackathon speed, but the design pattern is the same.
|
| 120 |
+
|
| 121 |
### Anti-reward-hacking penalties
|
| 122 |
|
| 123 |
- Short email (`< 20` words): **−0.30**
|