Update README.md
Browse files
README.md
CHANGED
|
@@ -248,7 +248,7 @@ def chunk_text_with_max_chunk_size(model, text, tokenizer, prob_threshold=0.5,ma
|
|
| 248 |
best_logits = torch.finfo(torch.float32).min
|
| 249 |
is_chunk_start = True
|
| 250 |
|
| 251 |
-
STEP = (MAX_TOKENS - 2)//2
|
| 252 |
print(f"Processing {input_ids.shape[1]} tokens...")
|
| 253 |
while windows_end <= input_ids.shape[1]:
|
| 254 |
|
|
@@ -428,8 +428,8 @@ Evaluation is done by code from [brandonstarxel/chunking_evaluation](https://git
|
|
| 428 |
Cluster | 400 (~182) | 0 | 91.3 ± 25.4 | 4.5 ± 3.4 | 20.7 ± 14.5 | 4.5 ± 3.4 | O(N<sup>2</sup>)| No
|
| 429 |
Cluster | 200 (~103) | 0 | 87.3 ± 29.8 | **8.0 ± 6.0** | **34.0 ± 19.7** | **8.0 ± 6.0** | O(N<sup>2</sup>)| No
|
| 430 |
LLM (GPT4o) | N/A (~240) | 0 | **91.9 ± 26.5** | 3.9 ± 3.2 | 19.9 ± 16.3 | 3.9 ± 3.2 | O(N<sup>2</sup>)| No
|
| 431 |
-
★ bert-chunker-3 (experimental, prob_threshold=0.50543) | <= 400 | 0 | 91.
|
| 432 |
-
★ bert-chunker-3 (experimental, prob_threshold=0.50543) | <= 200 | 0 |
|
| 433 |
★ bert-chunker-3 (prob_threshold=0.50543) | N/A | 0 | 90.4 ± 28.7 | 3.3 ± 3.1 | 16.0 ± 17.0 | 3.3 ± 3.1 |**O(N)**| No
|
| 434 |
## Citation
|
| 435 |
```bibtex
|
|
|
|
| 248 |
best_logits = torch.finfo(torch.float32).min
|
| 249 |
is_chunk_start = True
|
| 250 |
|
| 251 |
+
STEP = round(((MAX_TOKENS - 2)//2 )*1.75) #(MAX_TOKENS - 2)//2
|
| 252 |
print(f"Processing {input_ids.shape[1]} tokens...")
|
| 253 |
while windows_end <= input_ids.shape[1]:
|
| 254 |
|
|
|
|
| 428 |
Cluster | 400 (~182) | 0 | 91.3 ± 25.4 | 4.5 ± 3.4 | 20.7 ± 14.5 | 4.5 ± 3.4 | O(N<sup>2</sup>)| No
|
| 429 |
Cluster | 200 (~103) | 0 | 87.3 ± 29.8 | **8.0 ± 6.0** | **34.0 ± 19.7** | **8.0 ± 6.0** | O(N<sup>2</sup>)| No
|
| 430 |
LLM (GPT4o) | N/A (~240) | 0 | **91.9 ± 26.5** | 3.9 ± 3.2 | 19.9 ± 16.3 | 3.9 ± 3.2 | O(N<sup>2</sup>)| No
|
| 431 |
+
★ bert-chunker-3 (experimental, prob_threshold=0.50543) | <= 400 | 0 | 91.3 ± 26.6 | 5.4 ± 4.7 | 23.1 ± 17.6 | 5.4 ± 4.7 |**O(N)** | **Yes**
|
| 432 |
+
★ bert-chunker-3 (experimental, prob_threshold=0.50543) | <= 200 | 0 | 87.9 ± 30.4 | 7.6 ± 6.0 | 30.9 ± 19.1 | 7.5 ± 5.8 |**O(N)**| **Yes**
|
| 433 |
★ bert-chunker-3 (prob_threshold=0.50543) | N/A | 0 | 90.4 ± 28.7 | 3.3 ± 3.1 | 16.0 ± 17.0 | 3.3 ± 3.1 |**O(N)**| No
|
| 434 |
## Citation
|
| 435 |
```bibtex
|