Update README.md
Browse files
README.md
CHANGED
|
@@ -101,10 +101,9 @@ All R1 reasoning traces were processed through a domain-specific pipeline as fol
|
|
| 101 |
### Preprocessing Data
|
| 102 |
1. Filtering for Complete Generation
|
| 103 |
- Retained only traces with complete generation outputs
|
| 104 |
-
- Removed incomplete or truncated samples
|
| 105 |
|
| 106 |
2. Length-based Filtering
|
| 107 |
-
- Minimum threshold: Keep only the prompt with more than
|
| 108 |
- Maximum threshold: Keep only the traces with less than 7,143 words.
|
| 109 |
- Wait Token Filter: Removed traces with has more than 47 occurrences of "Wait" (97th percentile threshold).
|
| 110 |
|
|
|
|
| 101 |
### Preprocessing Data
|
| 102 |
1. Filtering for Complete Generation
|
| 103 |
- Retained only traces with complete generation outputs
|
|
|
|
| 104 |
|
| 105 |
2. Length-based Filtering
|
| 106 |
+
- Minimum threshold: Keep only the prompt with more than 3 words.
|
| 107 |
- Maximum threshold: Keep only the traces with less than 7,143 words.
|
| 108 |
- Wait Token Filter: Removed traces with has more than 47 occurrences of "Wait" (97th percentile threshold).
|
| 109 |
|