tim1900 commited on
Commit
472b08f
·
verified ·
1 Parent(s): 53eb91e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -248,7 +248,7 @@ def chunk_text_with_max_chunk_size(model, text, tokenizer, prob_threshold=0.5,ma
248
  best_logits = torch.finfo(torch.float32).min
249
  is_chunk_start = True
250
 
251
- STEP = (MAX_TOKENS - 2)//2
252
  print(f"Processing {input_ids.shape[1]} tokens...")
253
  while windows_end <= input_ids.shape[1]:
254
 
@@ -428,8 +428,8 @@ Evaluation is done by code from [brandonstarxel/chunking_evaluation](https://git
428
  Cluster | 400 (~182) | 0 | 91.3 ± 25.4 | 4.5 ± 3.4 | 20.7 ± 14.5 | 4.5 ± 3.4 | O(N<sup>2</sup>)| No
429
  Cluster | 200 (~103) | 0 | 87.3 ± 29.8 | **8.0 ± 6.0** | **34.0 ± 19.7** | **8.0 ± 6.0** | O(N<sup>2</sup>)| No
430
  LLM (GPT4o) | N/A (~240) | 0 | **91.9 ± 26.5** | 3.9 ± 3.2 | 19.9 ± 16.3 | 3.9 ± 3.2 | O(N<sup>2</sup>)| No
431
- ★ bert-chunker-3 (experimental, prob_threshold=0.50543) | <= 400 | 0 | 91.2 ± 26.6 | 5.3 ± 4.5 | 23.2 ± 18.1 | 5.3 ± 4.5 |**O(N)** | **Yes**
432
- ★ bert-chunker-3 (experimental, prob_threshold=0.50543) | <= 200 | 0 | 90.5 ± 27.3 | 7.1 ± 5.5 | 29.3 ± 19.0 | 7.1 ± 5.4 |**O(N)**| **Yes**
433
  ★ bert-chunker-3 (prob_threshold=0.50543) | N/A | 0 | 90.4 ± 28.7 | 3.3 ± 3.1 | 16.0 ± 17.0 | 3.3 ± 3.1 |**O(N)**| No
434
  ## Citation
435
  ```bibtex
 
248
  best_logits = torch.finfo(torch.float32).min
249
  is_chunk_start = True
250
 
251
+ STEP = round(((MAX_TOKENS - 2)//2 )*1.75) #(MAX_TOKENS - 2)//2
252
  print(f"Processing {input_ids.shape[1]} tokens...")
253
  while windows_end <= input_ids.shape[1]:
254
 
 
428
  Cluster | 400 (~182) | 0 | 91.3 ± 25.4 | 4.5 ± 3.4 | 20.7 ± 14.5 | 4.5 ± 3.4 | O(N<sup>2</sup>)| No
429
  Cluster | 200 (~103) | 0 | 87.3 ± 29.8 | **8.0 ± 6.0** | **34.0 ± 19.7** | **8.0 ± 6.0** | O(N<sup>2</sup>)| No
430
  LLM (GPT4o) | N/A (~240) | 0 | **91.9 ± 26.5** | 3.9 ± 3.2 | 19.9 ± 16.3 | 3.9 ± 3.2 | O(N<sup>2</sup>)| No
431
+ ★ bert-chunker-3 (experimental, prob_threshold=0.50543) | <= 400 | 0 | 91.3 ± 26.6 | 5.4 ± 4.7 | 23.1 ± 17.6 | 5.4 ± 4.7 |**O(N)** | **Yes**
432
+ ★ bert-chunker-3 (experimental, prob_threshold=0.50543) | <= 200 | 0 | 87.9 ± 30.4 | 7.6 ± 6.0 | 30.9 ± 19.1 | 7.5 ± 5.8 |**O(N)**| **Yes**
433
  ★ bert-chunker-3 (prob_threshold=0.50543) | N/A | 0 | 90.4 ± 28.7 | 3.3 ± 3.1 | 16.0 ± 17.0 | 3.3 ± 3.1 |**O(N)**| No
434
  ## Citation
435
  ```bibtex