tim1900 commited on
Commit
3af0918
·
verified ·
1 Parent(s): 1f736e0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -184,7 +184,9 @@ bert-chunker-3 (prob_threshold=0.50543) | N/A | 0 | 90.4 ± 28.7 | 3.3 ± 3.1 |
184
  ## Future
185
  This model is undertrained due to lack of money and laziness. I observed it is still undertrained for two reasons:
186
  - I trained it on 110 K window data for 2~3 epoch and the eval loss kept decreasing.
187
- - The outputs from two non-overlapping windows show poor comparability in split point probabilities, which will undermines the performance when max_tokens_per_chunk is large. I think more data will amplify probability difference and enhance their comparability. This was corroborated by some of my experiments. So next version will probably just be more data.
 
 
188
  ## Citation
189
  ```bibtex
190
  @article{bert-chunker,
 
184
  ## Future
185
  This model is undertrained due to lack of money and laziness. I observed it is still undertrained for two reasons:
186
  - I trained it on 110 K window data for 2~3 epoch and the eval loss kept decreasing.
187
+ - The outputs from two non-overlapping windows show poor comparability in split point probabilities, which will undermines the performance when max_tokens_per_chunk is large. I think more data will amplify probability difference and enhance their comparability. This was corroborated by some of my experiments.
188
+
189
+ So next version will probably just be more data.
190
  ## Citation
191
  ```bibtex
192
  @article{bert-chunker,