tim1900
/

bert-chunker-3

Token Classification

Model card Files Files and versions

tim1900 commited on May 16, 2025

Commit

f58ae9d

·

verified ·

1 Parent(s): aaed7e2

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -203,7 +203,7 @@ for i, (c, t) in enumerate(zip(chunks, token_pos)):
     print(c)
 ```
 ## Experimental
-The following script supports specifying max tokens per chunk. If max_tokens_per_chunk is specified, texts will be forced to choose a best possible position from history to chunk when it is about to exceed the max_tokens_per_chunk and no token satisfy the prob_threshold. If max_tokens_per_chunk is None, it acts the same as above. This script can be seen as a new experimental version of the scripts above.
 ```python
 def chunk_text_with_max_chunk_size(model, text, tokenizer, prob_threshold=0.5,max_tokens_per_chunk = 400):
     with torch.no_grad():

     print(c)
 ```
 ## Experimental
+The following script supports specifying max tokens per chunk. Chunker will be forced to choose a best possible position from history to chunk when it is about to exceed the max_tokens_per_chunk and no token satisfy the prob_threshold. This script can be seen as a new experimental version of the scripts above.
 ```python
 def chunk_text_with_max_chunk_size(model, text, tokenizer, prob_threshold=0.5,max_tokens_per_chunk = 400):
     with torch.no_grad():