tim1900 commited on
Commit
c10b5ff
·
verified ·
1 Parent(s): 49ecc5e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -13,6 +13,8 @@ bert-chunker-3 is a text chunker based on BertForTokenClassification to predict
13
 
14
  Different from [bc-2](https://huggingface.co/tim1900/bert-chunker-2) and [bc](https://huggingface.co/tim1900/bert-chunker), to overcome the data distribution shift, our training data were labeled by a LLM and trainng pipeline was improved, therefore it is **more stable**.
15
 
 
 
16
  ## Usage
17
  Run the following:
18
 
@@ -199,8 +201,6 @@ for i, (c, t) in enumerate(zip(chunks, token_pos)):
199
  print(f"-----chunk: {i}----token_idx: {t}--------")
200
  print(c)
201
  ```
202
- ## Doing
203
- A max_tokens_per_chunk button will be implemented very soon.
204
  ## Citation
205
  ```bibtex
206
  @article{bert-chunker,
 
13
 
14
  Different from [bc-2](https://huggingface.co/tim1900/bert-chunker-2) and [bc](https://huggingface.co/tim1900/bert-chunker), to overcome the data distribution shift, our training data were labeled by a LLM and trainng pipeline was improved, therefore it is **more stable**.
15
 
16
+ Updatas: an experimental script that **supports specifying the maximum tokens per chunk** is [here](https://huggingface.co/tim1900/bert-chunker-3.1/blob/main/README.md)
17
+
18
  ## Usage
19
  Run the following:
20
 
 
201
  print(f"-----chunk: {i}----token_idx: {t}--------")
202
  print(c)
203
  ```
 
 
204
  ## Citation
205
  ```bibtex
206
  @article{bert-chunker,