Update README.md
Browse files
README.md
CHANGED
|
@@ -171,7 +171,9 @@ Python,但估计听说过这门语言的读者很少。
|
|
| 171 |
部分,也拥有自己的全局命名空间。内置名称实际上也在模块里,即
|
| 172 |
"builtins" 。
|
| 173 |
'''
|
| 174 |
-
#
|
|
|
|
|
|
|
| 175 |
chunks, token_pos = chunk_text(model, doc, tokenizer, prob_threshold=0.5)
|
| 176 |
|
| 177 |
# print chunks
|
|
|
|
| 171 |
部分,也拥有自己的全局命名空间。内置名称实际上也在模块里,即
|
| 172 |
"builtins" 。
|
| 173 |
'''
|
| 174 |
+
# Chunk the text. The prob_threshold should be between (0, 1). The lower it is, the more chunks will be generated.
|
| 175 |
+
# Therefore adjust it to your need, when prob_threshold is small like 0.000001, one token is one chunk,
|
| 176 |
+
# when it is set to 1, no chunk will be generated.
|
| 177 |
chunks, token_pos = chunk_text(model, doc, tokenizer, prob_threshold=0.5)
|
| 178 |
|
| 179 |
# print chunks
|