tim1900
/

bert-chunker

Token Classification

feature-extraction

Model card Files Files and versions

tim1900 commited on Sep 17, 2024

Commit

e4ff9ce

·

verified ·

1 Parent(s): 2753894

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -55,17 +55,17 @@ text='''In the heart of the bustling city, where towering skyscrapers touch the
     With each passing light year, the anticipation of unraveling secrets that could alter humanity's
      understanding of life in the universe grew ever stronger.'''
-# chunk the text. The threshold can be (-inf, +inf). The lower threshold is, the more chunks will be generated.
-chunks=model.chunk_text(text, tokenizer, threshold=0)
 # print chunks
 for i, c in enumerate(chunks):
     print(f'-----chunk: {i}------------')
     print(c)
-# chunk the text faster, by using a fixed context window, batchsize is the number of windows run per batch.
 print('----->Here is the result of fast chunk method<------:')
-chunks=model.chunk_text_fast(text, tokenizer, batchsize=20, threshold=0)
 # print chunks
 for i, c in enumerate(chunks):

     With each passing light year, the anticipation of unraveling secrets that could alter humanity's
      understanding of life in the universe grew ever stronger.'''
+# chunk the text. The prob_threshold should be between (0, 1). The lower it is, the more chunks will be generated.
+chunks=model.chunk_text(text, tokenizer, prob_threshold=0.5)
 # print chunks
 for i, c in enumerate(chunks):
     print(f'-----chunk: {i}------------')
     print(c)
+# chunk the text faster but compromising performance a lot, by using a fixed context window, batchsize is the number of windows run per batch.
 print('----->Here is the result of fast chunk method<------:')
+chunks=model.chunk_text_fast(text, tokenizer, batchsize=20, prob_threshold=0.5)
 # print chunks
 for i, c in enumerate(chunks):