ShadeCloak commited on
Commit
d229c6f
·
verified ·
1 Parent(s): d4c4733

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -6
README.md CHANGED
@@ -24,9 +24,3 @@ Standard generative reward models (GRMs) couple principle generation with respon
24
  - **Stage 2** `P(J, r | Q, P, R)` — judge the response under pre-defined principles
25
 
26
  This ensures conditional independence `I(P; R | Q) = 0`, and enables **Principle Cache** — generating principles once per prompt and reusing them across all sampled responses in a GRPO group.
27
-
28
- ## Results
29
-
30
- - **WritingBench 87.6** / **CW-v3 77.8** with Qwen3-8B + IP-GRM (competitive with GPT-5.2 and Claude-Sonnet-4)
31
- - **23.66% faster** reward computation than baseline GRM via Principle Cache
32
-
 
24
  - **Stage 2** `P(J, r | Q, P, R)` — judge the response under pre-defined principles
25
 
26
  This ensures conditional independence `I(P; R | Q) = 0`, and enables **Principle Cache** — generating principles once per prompt and reusing them across all sampled responses in a GRPO group.