Validation data NOTICE ====================== The file `val_200.jsonl` in this directory contains 200 publicly posted Reddit comments, included as a small held-out evaluation set with per-token `r_true` labels so that reviewers can reproduce paper ยง5 metrics without rerunning training. Copyright and licensing ----------------------- - The comment text remains the intellectual property of the original Reddit authors. It is included here under a research / fair-use rationale, solely to enable reproduction of published evaluation numbers. - The `r_true` annotations, the schema, and the file packaging are released under Apache-2.0 (see ../LICENSE). - This sample is NOT a license to redistribute the underlying Reddit content for any other purpose. Removal requests ---------------- If you are the author of one of these comments and would like it removed from the distribution, contact the corresponding author listed in `../paper.pdf`. Removals will be honored in the next release. Reproducing the full corpus --------------------------- The 1M-sample training corpus is not redistributed here. See `DATA.md` for the schema and the steps required to reconstruct it from the public Reddit API.