DevanshuDon commited on
Commit
7d04ee3
·
verified ·
1 Parent(s): cb7bf3f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -0
README.md CHANGED
@@ -113,6 +113,11 @@ All scores are deterministic and bounded to [0, 1].
113
  | **Scheduling correctness** | 0–1 | No double-booking, within working hours, appropriate duration (15min–2hrs), all participants included |
114
  | **Conflict resolution** | 0–1 | Recognizes conflicts, proposes 2–3 alternatives, explains professionally, prioritizes correctly |
115
 
 
 
 
 
 
116
  ### Anti-reward-hacking penalties
117
 
118
  - Short email (`< 20` words): **−0.30**
 
113
  | **Scheduling correctness** | 0–1 | No double-booking, within working hours, appropriate duration (15min–2hrs), all participants included |
114
  | **Conflict resolution** | 0–1 | Recognizes conflicts, proposes 2–3 alternatives, explains professionally, prioritizes correctly |
115
 
116
+ **Architectural note on rubrics.** The reward is composed from independent scoring functions (one per dimension: email quality, scheduling correctness,
117
+ conflict resolution) plus four named penalty checks. Each function returns a value in [0, 1] (or a negative penalty) and is mixed by the task-specific
118
+ weighting shown in the Tasks table. This is structurally a composable rubric — any individual grader can be swapped, weighted differently, or audited in
119
+ isolation. We implemented it as plain Python rather than OpenEnv's `Rubric` class for hackathon speed, but the design pattern is the same.
120
+
121
  ### Anti-reward-hacking penalties
122
 
123
  - Short email (`< 20` words): **−0.30**