tuenguyen commited on
Commit
b21a08c
·
verified ·
1 Parent(s): c52d7f6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -2
README.md CHANGED
@@ -101,10 +101,9 @@ All R1 reasoning traces were processed through a domain-specific pipeline as fol
101
  ### Preprocessing Data
102
  1. Filtering for Complete Generation
103
  - Retained only traces with complete generation outputs
104
- - Removed incomplete or truncated samples
105
 
106
  2. Length-based Filtering
107
- - Minimum threshold: Keep only the prompt with more than three words.
108
  - Maximum threshold: Keep only the traces with less than 7,143 words.
109
  - Wait Token Filter: Removed traces with has more than 47 occurrences of "Wait" (97th percentile threshold).
110
 
 
101
  ### Preprocessing Data
102
  1. Filtering for Complete Generation
103
  - Retained only traces with complete generation outputs
 
104
 
105
  2. Length-based Filtering
106
+ - Minimum threshold: Keep only the prompt with more than 3 words.
107
  - Maximum threshold: Keep only the traces with less than 7,143 words.
108
  - Wait Token Filter: Removed traces with has more than 47 occurrences of "Wait" (97th percentile threshold).
109