File size: 1,212 Bytes
aa2d4f1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Validation data NOTICE
======================

The file `val_200.jsonl` in this directory contains 200 publicly posted Reddit
comments, included as a small held-out evaluation set with per-token `r_true`
labels so that reviewers can reproduce paper §5 metrics without rerunning
training.

Copyright and licensing
-----------------------
- The comment text remains the intellectual property of the original Reddit
  authors. It is included here under a research / fair-use rationale, solely
  to enable reproduction of published evaluation numbers.
- The `r_true` annotations, the schema, and the file packaging are released
  under Apache-2.0 (see ../LICENSE).
- This sample is NOT a license to redistribute the underlying Reddit content
  for any other purpose.

Removal requests
----------------
If you are the author of one of these comments and would like it removed
from the distribution, contact the corresponding author listed in
`../paper.pdf`. Removals will be honored in the next release.

Reproducing the full corpus
---------------------------
The 1M-sample training corpus is not redistributed here. See `DATA.md` for the
schema and the steps required to reconstruct it from the public Reddit API.