writinwaters
commited on
Commit
·
7ddda98
1
Parent(s):
3366eac
Added chunk methods (#3110)
Browse files### What problem does this PR solve?
### Type of change
- [x] Documentation Update
- api/http_api_reference.md +4 -2
- api/python_api_reference.md +16 -2
api/http_api_reference.md
CHANGED
|
@@ -88,6 +88,7 @@ curl --request POST \
|
|
| 88 |
- `"picture"`: Picture
|
| 89 |
- `"one"`: One
|
| 90 |
- `"knowledge_graph"`: Knowledge Graph
|
|
|
|
| 91 |
|
| 92 |
- `"parser_config"`: (*Body parameter*), `object`
|
| 93 |
The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`:
|
|
@@ -100,7 +101,7 @@ curl --request POST \
|
|
| 100 |
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
|
| 101 |
- If `"chunk_method"` is `"qa"`, `"manuel"`, `"paper"`, `"book"`, `"laws"`, or `"presentation"`, the `"parser_config"` object contains the following attribute:
|
| 102 |
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
|
| 103 |
-
- If `"chunk_method"` is `"table"` or `"
|
| 104 |
- If `"chunk_method"` is `"knowledge_graph"`, the `"parser_config"` object contains the following attributes:
|
| 105 |
- `"chunk_token_count"`: Defaults to `128`.
|
| 106 |
- `"delimiter"`: Defaults to `"\n!?。;!?"`.
|
|
@@ -517,6 +518,7 @@ curl --request PUT \
|
|
| 517 |
- `"picture"`: Picture
|
| 518 |
- `"one"`: One
|
| 519 |
- `"knowledge_graph"`: Knowledge Graph
|
|
|
|
| 520 |
- `"parser_config"`: (*Body parameter*), `object`
|
| 521 |
The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`:
|
| 522 |
- If `"chunk_method"` is `"naive"`, the `"parser_config"` object contains the following attributes:
|
|
@@ -528,7 +530,7 @@ curl --request PUT \
|
|
| 528 |
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
|
| 529 |
- If `"chunk_method"` is `"qa"`, `"manuel"`, `"paper"`, `"book"`, `"laws"`, or `"presentation"`, the `"parser_config"` object contains the following attribute:
|
| 530 |
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
|
| 531 |
-
- If `"chunk_method"` is `"table"` or `"
|
| 532 |
- If `"chunk_method"` is `"knowledge_graph"`, the `"parser_config"` object contains the following attributes:
|
| 533 |
- `"chunk_token_count"`: Defaults to `128`.
|
| 534 |
- `"delimiter"`: Defaults to `"\n!?。;!?"`.
|
|
|
|
| 88 |
- `"picture"`: Picture
|
| 89 |
- `"one"`: One
|
| 90 |
- `"knowledge_graph"`: Knowledge Graph
|
| 91 |
+
- `"email"`: Email
|
| 92 |
|
| 93 |
- `"parser_config"`: (*Body parameter*), `object`
|
| 94 |
The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`:
|
|
|
|
| 101 |
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
|
| 102 |
- If `"chunk_method"` is `"qa"`, `"manuel"`, `"paper"`, `"book"`, `"laws"`, or `"presentation"`, the `"parser_config"` object contains the following attribute:
|
| 103 |
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
|
| 104 |
+
- If `"chunk_method"` is `"table"`, `"picture"`, `"one"`, or `"email"`, `"parser_config"` is an empty JSON object.
|
| 105 |
- If `"chunk_method"` is `"knowledge_graph"`, the `"parser_config"` object contains the following attributes:
|
| 106 |
- `"chunk_token_count"`: Defaults to `128`.
|
| 107 |
- `"delimiter"`: Defaults to `"\n!?。;!?"`.
|
|
|
|
| 518 |
- `"picture"`: Picture
|
| 519 |
- `"one"`: One
|
| 520 |
- `"knowledge_graph"`: Knowledge Graph
|
| 521 |
+
- `"email"`: Email
|
| 522 |
- `"parser_config"`: (*Body parameter*), `object`
|
| 523 |
The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`:
|
| 524 |
- If `"chunk_method"` is `"naive"`, the `"parser_config"` object contains the following attributes:
|
|
|
|
| 530 |
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
|
| 531 |
- If `"chunk_method"` is `"qa"`, `"manuel"`, `"paper"`, `"book"`, `"laws"`, or `"presentation"`, the `"parser_config"` object contains the following attribute:
|
| 532 |
- `"raptor"`: Raptor-specific settings. Defaults to: `{"use_raptor": false}`.
|
| 533 |
+
- If `"chunk_method"` is `"table"`, `"picture"`, `"one"`, or `"email"`, `"parser_config"` is an empty JSON object.
|
| 534 |
- If `"chunk_method"` is `"knowledge_graph"`, the `"parser_config"` object contains the following attributes:
|
| 535 |
- `"chunk_token_count"`: Defaults to `128`.
|
| 536 |
- `"delimiter"`: Defaults to `"\n!?。;!?"`.
|
api/python_api_reference.md
CHANGED
|
@@ -75,12 +75,13 @@ The chunking method of the dataset to create. Available options:
|
|
| 75 |
- `"picture"`: Picture
|
| 76 |
- `"one"`: One
|
| 77 |
- `"knowledge_graph"`: Knowledge Graph
|
|
|
|
| 78 |
|
| 79 |
#### parser_config
|
| 80 |
|
| 81 |
-
The parser configuration of the dataset. A `ParserConfig` object's attributes vary based on the selected `
|
| 82 |
|
| 83 |
-
- `
|
| 84 |
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}`.
|
| 85 |
- `chunk_method`=`"qa"`:
|
| 86 |
`{"raptor": {"user_raptor": False}}`
|
|
@@ -94,12 +95,16 @@ The parser configuration of the dataset. A `ParserConfig` object's attributes va
|
|
| 94 |
`{"raptor": {"user_raptor": False}}`
|
| 95 |
- `chunk_method`=`"laws"`:
|
| 96 |
`{"raptor": {"user_raptor": False}}`
|
|
|
|
|
|
|
| 97 |
- `chunk_method`=`"presentation"`:
|
| 98 |
`{"raptor": {"user_raptor": False}}`
|
| 99 |
- `chunk_method`=`"one"`:
|
| 100 |
`None`
|
| 101 |
- `chunk_method`=`"knowledge-graph"`:
|
| 102 |
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`
|
|
|
|
|
|
|
| 103 |
|
| 104 |
### Returns
|
| 105 |
|
|
@@ -322,6 +327,7 @@ A dictionary representing the attributes to update, with the following keys:
|
|
| 322 |
- `"picture"`: Picture
|
| 323 |
- `"one"`: One
|
| 324 |
- `"knowledge_graph"`: Knowledge Graph
|
|
|
|
| 325 |
- `"parser_config"`: `dict[str, Any]` The parsing configuration for the document. Its attributes vary based on the selected `"chunk_method"`:
|
| 326 |
- `"chunk_method"`=`"naive"`:
|
| 327 |
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}`.
|
|
@@ -339,10 +345,14 @@ A dictionary representing the attributes to update, with the following keys:
|
|
| 339 |
`{"raptor": {"user_raptor": False}}`
|
| 340 |
- `chunk_method`=`"presentation"`:
|
| 341 |
`{"raptor": {"user_raptor": False}}`
|
|
|
|
|
|
|
| 342 |
- `chunk_method`=`"one"`:
|
| 343 |
`None`
|
| 344 |
- `chunk_method`=`"knowledge-graph"`:
|
| 345 |
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`
|
|
|
|
|
|
|
| 346 |
|
| 347 |
### Returns
|
| 348 |
|
|
@@ -475,10 +485,14 @@ A `Document` object contains the following attributes:
|
|
| 475 |
`{"raptor": {"user_raptor": False}}`
|
| 476 |
- `chunk_method`=`"presentation"`:
|
| 477 |
`{"raptor": {"user_raptor": False}}`
|
|
|
|
|
|
|
| 478 |
- `chunk_method`=`"one"`:
|
| 479 |
`None`
|
| 480 |
- `chunk_method`=`"knowledge-graph"`:
|
| 481 |
`{"chunk_token_num":128,"delimiter": "\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`
|
|
|
|
|
|
|
| 482 |
|
| 483 |
### Examples
|
| 484 |
|
|
|
|
| 75 |
- `"picture"`: Picture
|
| 76 |
- `"one"`: One
|
| 77 |
- `"knowledge_graph"`: Knowledge Graph
|
| 78 |
+
- `"email"`: Email
|
| 79 |
|
| 80 |
#### parser_config
|
| 81 |
|
| 82 |
+
The parser configuration of the dataset. A `ParserConfig` object's attributes vary based on the selected `chunk_method`:
|
| 83 |
|
| 84 |
+
- `chunk_method`=`"naive"`:
|
| 85 |
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}`.
|
| 86 |
- `chunk_method`=`"qa"`:
|
| 87 |
`{"raptor": {"user_raptor": False}}`
|
|
|
|
| 95 |
`{"raptor": {"user_raptor": False}}`
|
| 96 |
- `chunk_method`=`"laws"`:
|
| 97 |
`{"raptor": {"user_raptor": False}}`
|
| 98 |
+
- `chunk_method`=`"picture"`:
|
| 99 |
+
`None`
|
| 100 |
- `chunk_method`=`"presentation"`:
|
| 101 |
`{"raptor": {"user_raptor": False}}`
|
| 102 |
- `chunk_method`=`"one"`:
|
| 103 |
`None`
|
| 104 |
- `chunk_method`=`"knowledge-graph"`:
|
| 105 |
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`
|
| 106 |
+
- `chunk_method`=`"email"`:
|
| 107 |
+
`None`
|
| 108 |
|
| 109 |
### Returns
|
| 110 |
|
|
|
|
| 327 |
- `"picture"`: Picture
|
| 328 |
- `"one"`: One
|
| 329 |
- `"knowledge_graph"`: Knowledge Graph
|
| 330 |
+
- `"email"`: Email
|
| 331 |
- `"parser_config"`: `dict[str, Any]` The parsing configuration for the document. Its attributes vary based on the selected `"chunk_method"`:
|
| 332 |
- `"chunk_method"`=`"naive"`:
|
| 333 |
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","html4excel":False,"layout_recognize":True,"raptor":{"user_raptor":False}}`.
|
|
|
|
| 345 |
`{"raptor": {"user_raptor": False}}`
|
| 346 |
- `chunk_method`=`"presentation"`:
|
| 347 |
`{"raptor": {"user_raptor": False}}`
|
| 348 |
+
- `chunk_method`=`"picture"`:
|
| 349 |
+
`None`
|
| 350 |
- `chunk_method`=`"one"`:
|
| 351 |
`None`
|
| 352 |
- `chunk_method`=`"knowledge-graph"`:
|
| 353 |
`{"chunk_token_num":128,"delimiter":"\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`
|
| 354 |
+
- `chunk_method`=`"email"`:
|
| 355 |
+
`None`
|
| 356 |
|
| 357 |
### Returns
|
| 358 |
|
|
|
|
| 485 |
`{"raptor": {"user_raptor": False}}`
|
| 486 |
- `chunk_method`=`"presentation"`:
|
| 487 |
`{"raptor": {"user_raptor": False}}`
|
| 488 |
+
- `chunk_method`=`"picure"`:
|
| 489 |
+
`None`
|
| 490 |
- `chunk_method`=`"one"`:
|
| 491 |
`None`
|
| 492 |
- `chunk_method`=`"knowledge-graph"`:
|
| 493 |
`{"chunk_token_num":128,"delimiter": "\\n!?;。;!?","entity_types":["organization","person","location","event","time"]}`
|
| 494 |
+
- `chunk_method`=`"email"`:
|
| 495 |
+
`None`
|
| 496 |
|
| 497 |
### Examples
|
| 498 |
|