Update README.md
Browse files
README.md
CHANGED
|
@@ -242,14 +242,6 @@ We construct a calibrated dataset of **80,856** labeled *(document, summary)* pa
|
|
| 242 |
- Reward modeling for PPO/GRPO-style fine-tuning and iterative data curation
|
| 243 |
|
| 244 |
---
|
| 245 |
-
|
| 246 |
-
## Limitations
|
| 247 |
-
- **Context truncation** may omit relevant evidence for long documents.
|
| 248 |
-
- **Domain shift** can reduce reliability (especially relevance/content-centrality).
|
| 249 |
-
- Scores are **proxy estimates** of human judgment; validate in downstream loops to mitigate reward hacking.
|
| 250 |
-
|
| 251 |
-
---
|
| 252 |
-
|
| 253 |
## Reproducibility & version pinning
|
| 254 |
|
| 255 |
Pin a specific revision/commit when reproducing paper results:
|
|
|
|
| 242 |
- Reward modeling for PPO/GRPO-style fine-tuning and iterative data curation
|
| 243 |
|
| 244 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 245 |
## Reproducibility & version pinning
|
| 246 |
|
| 247 |
Pin a specific revision/commit when reproducing paper results:
|