tim1900
/

bert-chunker-3

Token Classification

Model card Files Files and versions

tim1900 commited on May 11, 2025

Commit

c10b5ff

·

verified ·

1 Parent(s): 49ecc5e

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -13,6 +13,8 @@ bert-chunker-3 is a text chunker based on BertForTokenClassification to predict
 Different from [bc-2](https://huggingface.co/tim1900/bert-chunker-2) and [bc](https://huggingface.co/tim1900/bert-chunker), to overcome the data distribution shift, our training data were labeled by a LLM and trainng pipeline was improved, therefore it is **more stable**.
 ## Usage
 Run the following:
@@ -199,8 +201,6 @@ for i, (c, t) in enumerate(zip(chunks, token_pos)):
     print(f"-----chunk: {i}----token_idx: {t}--------")
     print(c)
 ```
-## Doing
-A max_tokens_per_chunk button will be implemented very soon.
 ## Citation
 ```bibtex
 @article{bert-chunker,

 Different from [bc-2](https://huggingface.co/tim1900/bert-chunker-2) and [bc](https://huggingface.co/tim1900/bert-chunker), to overcome the data distribution shift, our training data were labeled by a LLM and trainng pipeline was improved, therefore it is **more stable**.
+Updatas: an experimental script that **supports specifying the maximum tokens per chunk** is [here](https://huggingface.co/tim1900/bert-chunker-3.1/blob/main/README.md)
 ## Usage
 Run the following:
     print(f"-----chunk: {i}----token_idx: {t}--------")
     print(c)
 ```
 ## Citation
 ```bibtex
 @article{bert-chunker,