Update README.md
Browse files
README.md
CHANGED
|
@@ -203,7 +203,7 @@ for i, (c, t) in enumerate(zip(chunks, token_pos)):
|
|
| 203 |
print(c)
|
| 204 |
```
|
| 205 |
## Experimental
|
| 206 |
-
The following script supports specifying max tokens per chunk.
|
| 207 |
```python
|
| 208 |
def chunk_text_with_max_chunk_size(model, text, tokenizer, prob_threshold=0.5,max_tokens_per_chunk = 400):
|
| 209 |
with torch.no_grad():
|
|
|
|
| 203 |
print(c)
|
| 204 |
```
|
| 205 |
## Experimental
|
| 206 |
+
The following script supports specifying max tokens per chunk. Chunker will be forced to choose a best possible position from history to chunk when it is about to exceed the max_tokens_per_chunk and no token satisfy the prob_threshold. This script can be seen as a new experimental version of the scripts above.
|
| 207 |
```python
|
| 208 |
def chunk_text_with_max_chunk_size(model, text, tokenizer, prob_threshold=0.5,max_tokens_per_chunk = 400):
|
| 209 |
with torch.no_grad():
|