Update README.md
Browse files
README.md
CHANGED
|
@@ -24,9 +24,3 @@ Standard generative reward models (GRMs) couple principle generation with respon
|
|
| 24 |
- **Stage 2** `P(J, r | Q, P, R)` — judge the response under pre-defined principles
|
| 25 |
|
| 26 |
This ensures conditional independence `I(P; R | Q) = 0`, and enables **Principle Cache** — generating principles once per prompt and reusing them across all sampled responses in a GRPO group.
|
| 27 |
-
|
| 28 |
-
## Results
|
| 29 |
-
|
| 30 |
-
- **WritingBench 87.6** / **CW-v3 77.8** with Qwen3-8B + IP-GRM (competitive with GPT-5.2 and Claude-Sonnet-4)
|
| 31 |
-
- **23.66% faster** reward computation than baseline GRM via Principle Cache
|
| 32 |
-
|
|
|
|
| 24 |
- **Stage 2** `P(J, r | Q, P, R)` — judge the response under pre-defined principles
|
| 25 |
|
| 26 |
This ensures conditional independence `I(P; R | Q) = 0`, and enables **Principle Cache** — generating principles once per prompt and reusing them across all sampled responses in a GRPO group.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|