writinwaters
commited on
Commit
·
e587fd6
1
Parent(s):
4c39067
DRAFT: Miscellaneous proofedits on Python APIs (#2903)
Browse files### What problem does this PR solve?
### Type of change
- [x] Documentation Update
- api/python_api_reference.md +167 -125
api/python_api_reference.md
CHANGED
|
@@ -2,10 +2,14 @@
|
|
| 2 |
|
| 3 |
**THE API REFERENCES BELOW ARE STILL UNDER DEVELOPMENT.**
|
| 4 |
|
|
|
|
|
|
|
| 5 |
:::tip NOTE
|
| 6 |
Dataset Management
|
| 7 |
:::
|
| 8 |
|
|
|
|
|
|
|
| 9 |
## Create dataset
|
| 10 |
|
| 11 |
```python
|
|
@@ -55,11 +59,24 @@ The language setting of the dataset to create. Available options:
|
|
| 55 |
|
| 56 |
#### permission
|
| 57 |
|
| 58 |
-
Specifies who can
|
| 59 |
|
| 60 |
#### chunk_method, `str`
|
| 61 |
|
| 62 |
-
The
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
|
| 64 |
#### parser_config
|
| 65 |
|
|
@@ -67,7 +84,7 @@ The parser configuration of the dataset. A `ParserConfig` object contains the fo
|
|
| 67 |
|
| 68 |
- `chunk_token_count`: Defaults to `128`.
|
| 69 |
- `layout_recognize`: Defaults to `True`.
|
| 70 |
-
- `delimiter`: Defaults to `
|
| 71 |
- `task_page_size`: Defaults to `12`.
|
| 72 |
|
| 73 |
### Returns
|
|
@@ -81,7 +98,7 @@ The parser configuration of the dataset. A `ParserConfig` object contains the fo
|
|
| 81 |
from ragflow import RAGFlow
|
| 82 |
|
| 83 |
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
| 84 |
-
|
| 85 |
```
|
| 86 |
|
| 87 |
---
|
|
@@ -92,13 +109,13 @@ ds = rag_object.create_dataset(name="kb_1")
|
|
| 92 |
RAGFlow.delete_datasets(ids: list[str] = None)
|
| 93 |
```
|
| 94 |
|
| 95 |
-
Deletes datasets
|
| 96 |
|
| 97 |
### Parameters
|
| 98 |
|
| 99 |
-
#### ids
|
| 100 |
|
| 101 |
-
The IDs of the datasets to delete.
|
| 102 |
|
| 103 |
### Returns
|
| 104 |
|
|
@@ -108,7 +125,7 @@ The IDs of the datasets to delete.
|
|
| 108 |
### Examples
|
| 109 |
|
| 110 |
```python
|
| 111 |
-
|
| 112 |
```
|
| 113 |
|
| 114 |
---
|
|
@@ -132,15 +149,18 @@ Retrieves a list of datasets.
|
|
| 132 |
|
| 133 |
#### page: `int`
|
| 134 |
|
| 135 |
-
|
| 136 |
|
| 137 |
#### page_size: `int`
|
| 138 |
|
| 139 |
-
The number of
|
| 140 |
|
| 141 |
-
####
|
| 142 |
|
| 143 |
-
The field by which
|
|
|
|
|
|
|
|
|
|
| 144 |
|
| 145 |
#### desc: `bool`
|
| 146 |
|
|
@@ -148,15 +168,15 @@ Indicates whether the retrieved datasets should be sorted in descending order. D
|
|
| 148 |
|
| 149 |
#### id: `str`
|
| 150 |
|
| 151 |
-
The
|
| 152 |
|
| 153 |
#### name: `str`
|
| 154 |
|
| 155 |
-
The name of the dataset to
|
| 156 |
|
| 157 |
### Returns
|
| 158 |
|
| 159 |
-
- Success: A list of `DataSet` objects
|
| 160 |
- Failure: `Exception`.
|
| 161 |
|
| 162 |
### Examples
|
|
@@ -164,8 +184,8 @@ The name of the dataset to be got. Defaults to `None`.
|
|
| 164 |
#### List all datasets
|
| 165 |
|
| 166 |
```python
|
| 167 |
-
for
|
| 168 |
-
print(
|
| 169 |
```
|
| 170 |
|
| 171 |
#### Retrieve a dataset by ID
|
|
@@ -183,16 +203,18 @@ print(dataset[0])
|
|
| 183 |
DataSet.update(update_message: dict)
|
| 184 |
```
|
| 185 |
|
| 186 |
-
Updates the current dataset.
|
| 187 |
|
| 188 |
### Parameters
|
| 189 |
|
| 190 |
#### update_message: `dict[str, str|int]`, *Required*
|
| 191 |
|
|
|
|
|
|
|
| 192 |
- `"name"`: `str` The name of the dataset to update.
|
| 193 |
-
- `"embedding_model"`: `str` The embedding model
|
| 194 |
- Ensure that `"chunk_count"` is `0` before updating `"embedding_model"`.
|
| 195 |
-
- `"chunk_method"`: `str` The
|
| 196 |
- `"naive"`: General
|
| 197 |
- `"manual`: Manual
|
| 198 |
- `"qa"`: Q&A
|
|
@@ -216,8 +238,8 @@ Updates the current dataset.
|
|
| 216 |
```python
|
| 217 |
from ragflow import RAGFlow
|
| 218 |
|
| 219 |
-
|
| 220 |
-
dataset =
|
| 221 |
dataset.update({"embedding_model":"BAAI/bge-zh-v1.5", "chunk_method":"manual"})
|
| 222 |
```
|
| 223 |
|
|
@@ -239,7 +261,7 @@ Uploads documents to the current dataset.
|
|
| 239 |
|
| 240 |
### Parameters
|
| 241 |
|
| 242 |
-
#### document_list
|
| 243 |
|
| 244 |
A list of dictionaries representing the documents to upload, each containing the following keys:
|
| 245 |
|
|
@@ -272,6 +294,8 @@ Updates configurations for the current document.
|
|
| 272 |
|
| 273 |
#### update_message: `dict[str, str|dict[]]`, *Required*
|
| 274 |
|
|
|
|
|
|
|
| 275 |
- `"name"`: `str` The name of the document to update.
|
| 276 |
- `"parser_config"`: `dict[str, Any]` The parsing configuration for the document:
|
| 277 |
- `"chunk_token_count"`: Defaults to `128`.
|
|
@@ -302,9 +326,9 @@ Updates configurations for the current document.
|
|
| 302 |
```python
|
| 303 |
from ragflow import RAGFlow
|
| 304 |
|
| 305 |
-
|
| 306 |
-
dataset=
|
| 307 |
-
dataset=dataset[0]
|
| 308 |
doc = dataset.list_documents(id="wdfxb5t547d")
|
| 309 |
doc = doc[0]
|
| 310 |
doc.update([{"parser_config": {"chunk_token_count": 256}}, {"chunk_method": "manual"}])
|
|
@@ -318,7 +342,7 @@ doc.update([{"parser_config": {"chunk_token_count": 256}}, {"chunk_method": "man
|
|
| 318 |
Document.download() -> bytes
|
| 319 |
```
|
| 320 |
|
| 321 |
-
Downloads the current document
|
| 322 |
|
| 323 |
### Returns
|
| 324 |
|
|
@@ -350,30 +374,30 @@ Retrieves a list of documents from the current dataset.
|
|
| 350 |
|
| 351 |
### Parameters
|
| 352 |
|
| 353 |
-
#### id
|
| 354 |
|
| 355 |
The ID of the document to retrieve. Defaults to `None`.
|
| 356 |
|
| 357 |
-
#### keywords
|
| 358 |
|
| 359 |
The keywords to match document titles. Defaults to `None`.
|
| 360 |
|
| 361 |
-
#### offset
|
| 362 |
|
| 363 |
-
The
|
| 364 |
|
| 365 |
-
#### limit
|
| 366 |
|
| 367 |
-
|
| 368 |
|
| 369 |
-
#### orderby
|
| 370 |
|
| 371 |
-
The field by which
|
| 372 |
|
| 373 |
-
- `"create_time"` (
|
| 374 |
- `"update_time"`
|
| 375 |
|
| 376 |
-
#### desc
|
| 377 |
|
| 378 |
Indicates whether the retrieved documents should be sorted in descending order. Defaults to `True`.
|
| 379 |
|
|
@@ -384,22 +408,24 @@ Indicates whether the retrieved documents should be sorted in descending order.
|
|
| 384 |
|
| 385 |
A `Document` object contains the following attributes:
|
| 386 |
|
| 387 |
-
- `id
|
| 388 |
-
- `
|
| 389 |
-
- `
|
| 390 |
-
- `
|
| 391 |
-
- `
|
| 392 |
-
- `
|
| 393 |
-
- `
|
| 394 |
-
- `
|
| 395 |
-
- `
|
| 396 |
-
- `size`: `int`
|
| 397 |
-
- `token_count`: `int`
|
| 398 |
-
- `chunk_count`: `int`
|
| 399 |
-
- `progress`: `float`
|
| 400 |
-
- `progress_msg`: `str`
|
| 401 |
-
- `process_begin_at`: `datetime`
|
| 402 |
-
- `process_duation`: `float` Duration of the processing in seconds or minutes
|
|
|
|
|
|
|
| 403 |
|
| 404 |
### Examples
|
| 405 |
|
|
@@ -410,11 +436,10 @@ rag = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
|
| 410 |
dataset = rag.create_dataset(name="kb_1")
|
| 411 |
|
| 412 |
filename1 = "~/ragflow.txt"
|
| 413 |
-
blob=open(filename1 , "rb").read()
|
| 414 |
-
|
| 415 |
-
dataset.
|
| 416 |
-
|
| 417 |
-
print(d)
|
| 418 |
```
|
| 419 |
|
| 420 |
---
|
|
@@ -425,7 +450,13 @@ for d in dataset.list_documents(keywords="rag", offset=0, limit=12):
|
|
| 425 |
DataSet.delete_documents(ids: list[str] = None)
|
| 426 |
```
|
| 427 |
|
| 428 |
-
Deletes
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 429 |
|
| 430 |
### Returns
|
| 431 |
|
|
@@ -437,10 +468,10 @@ Deletes specified documents or all documents from the current dataset.
|
|
| 437 |
```python
|
| 438 |
from ragflow import RAGFlow
|
| 439 |
|
| 440 |
-
|
| 441 |
-
|
| 442 |
-
|
| 443 |
-
|
| 444 |
```
|
| 445 |
|
| 446 |
---
|
|
@@ -453,7 +484,7 @@ DataSet.async_parse_documents(document_ids:list[str]) -> None
|
|
| 453 |
|
| 454 |
### Parameters
|
| 455 |
|
| 456 |
-
#### document_ids: `list[str]
|
| 457 |
|
| 458 |
The IDs of the documents to parse.
|
| 459 |
|
|
@@ -465,23 +496,20 @@ The IDs of the documents to parse.
|
|
| 465 |
### Examples
|
| 466 |
|
| 467 |
```python
|
| 468 |
-
|
| 469 |
-
|
| 470 |
-
ds = rag.create_dataset(name="dataset_name")
|
| 471 |
documents = [
|
| 472 |
{'name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()},
|
| 473 |
{'name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()},
|
| 474 |
{'name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()}
|
| 475 |
]
|
| 476 |
-
|
| 477 |
-
documents=
|
| 478 |
-
ids=[]
|
| 479 |
for document in documents:
|
| 480 |
ids.append(document.id)
|
| 481 |
-
|
| 482 |
-
print("Async bulk parsing initiated")
|
| 483 |
-
ds.async_cancel_parse_documents(ids)
|
| 484 |
-
print("Async bulk parsing cancelled")
|
| 485 |
```
|
| 486 |
|
| 487 |
---
|
|
@@ -494,9 +522,9 @@ DataSet.async_cancel_parse_documents(document_ids:list[str])-> None
|
|
| 494 |
|
| 495 |
### Parameters
|
| 496 |
|
| 497 |
-
#### document_ids: `list[str]
|
| 498 |
|
| 499 |
-
The IDs of the documents
|
| 500 |
|
| 501 |
### Returns
|
| 502 |
|
|
@@ -506,23 +534,22 @@ The IDs of the documents to stop parsing.
|
|
| 506 |
### Examples
|
| 507 |
|
| 508 |
```python
|
| 509 |
-
|
| 510 |
-
|
| 511 |
-
ds = rag.create_dataset(name="dataset_name")
|
| 512 |
documents = [
|
| 513 |
{'name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()},
|
| 514 |
{'name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()},
|
| 515 |
{'name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()}
|
| 516 |
]
|
| 517 |
-
|
| 518 |
-
documents=
|
| 519 |
-
ids=[]
|
| 520 |
for document in documents:
|
| 521 |
ids.append(document.id)
|
| 522 |
-
|
| 523 |
-
print("Async bulk parsing initiated")
|
| 524 |
-
|
| 525 |
-
print("Async bulk parsing cancelled")
|
| 526 |
```
|
| 527 |
|
| 528 |
---
|
|
@@ -533,19 +560,21 @@ print("Async bulk parsing cancelled")
|
|
| 533 |
Document.list_chunks(keywords: str = None, offset: int = 0, limit: int = -1, id : str = None) -> list[Chunk]
|
| 534 |
```
|
| 535 |
|
|
|
|
|
|
|
| 536 |
### Parameters
|
| 537 |
|
| 538 |
-
#### keywords
|
| 539 |
|
| 540 |
List chunks whose name has the given keywords. Defaults to `None`
|
| 541 |
|
| 542 |
-
#### offset
|
| 543 |
|
| 544 |
-
The
|
| 545 |
|
| 546 |
#### limit
|
| 547 |
|
| 548 |
-
|
| 549 |
|
| 550 |
#### id
|
| 551 |
|
|
@@ -553,19 +582,20 @@ The ID of the chunk to retrieve. Default: `None`
|
|
| 553 |
|
| 554 |
### Returns
|
| 555 |
|
| 556 |
-
list
|
|
|
|
| 557 |
|
| 558 |
### Examples
|
| 559 |
|
| 560 |
```python
|
| 561 |
from ragflow import RAGFlow
|
| 562 |
|
| 563 |
-
|
| 564 |
-
|
| 565 |
-
|
| 566 |
-
|
| 567 |
-
for
|
| 568 |
-
print(
|
| 569 |
```
|
| 570 |
|
| 571 |
## Add chunk
|
|
@@ -578,7 +608,7 @@ Document.add_chunk(content:str) -> Chunk
|
|
| 578 |
|
| 579 |
#### content: *Required*
|
| 580 |
|
| 581 |
-
The
|
| 582 |
|
| 583 |
#### important_keywords :`list[str]`
|
| 584 |
|
|
@@ -609,11 +639,13 @@ chunk = doc.add_chunk(content="xxxxxxx")
|
|
| 609 |
Document.delete_chunks(chunk_ids: list[str])
|
| 610 |
```
|
| 611 |
|
|
|
|
|
|
|
| 612 |
### Parameters
|
| 613 |
|
| 614 |
-
#### chunk_ids
|
| 615 |
|
| 616 |
-
|
| 617 |
|
| 618 |
### Returns
|
| 619 |
|
|
@@ -642,15 +674,17 @@ doc.delete_chunks(["id_1","id_2"])
|
|
| 642 |
Chunk.update(update_message: dict)
|
| 643 |
```
|
| 644 |
|
| 645 |
-
Updates the current chunk.
|
| 646 |
|
| 647 |
### Parameters
|
| 648 |
|
| 649 |
#### update_message: `dict[str, str|list[str]|int]` *Required*
|
| 650 |
|
|
|
|
|
|
|
| 651 |
- `"content"`: `str` Content of the chunk.
|
| 652 |
- `"important_keywords"`: `list[str]` A list of key terms to attach to the chunk.
|
| 653 |
-
- `"available"`: `int` The chunk's availability status in the dataset.
|
| 654 |
- `0`: Unavailable
|
| 655 |
- `1`: Available
|
| 656 |
|
|
@@ -697,11 +731,11 @@ The documents to search from. `None` means no limitation. Defaults to `None`.
|
|
| 697 |
|
| 698 |
#### offset: `int`
|
| 699 |
|
| 700 |
-
The
|
| 701 |
|
| 702 |
#### limit: `int`
|
| 703 |
|
| 704 |
-
The maximum number of chunks to
|
| 705 |
|
| 706 |
#### Similarity_threshold: `float`
|
| 707 |
|
|
@@ -764,6 +798,8 @@ for c in rag_object.retrieve(question="What's ragflow?",
|
|
| 764 |
Chat Assistant Management
|
| 765 |
:::
|
| 766 |
|
|
|
|
|
|
|
| 767 |
## Create chat assistant
|
| 768 |
|
| 769 |
```python
|
|
@@ -856,15 +892,17 @@ assi = rag.create_chat("Miss R", knowledgebases=list_kb)
|
|
| 856 |
Chat.update(update_message: dict)
|
| 857 |
```
|
| 858 |
|
| 859 |
-
Updates the current chat assistant.
|
| 860 |
|
| 861 |
### Parameters
|
| 862 |
|
| 863 |
-
#### update_message: `dict[str,
|
|
|
|
|
|
|
| 864 |
|
| 865 |
- `"name"`: `str` The name of the chat assistant to update.
|
| 866 |
- `"avatar"`: `str` Base64 encoding of the avatar. Defaults to `""`
|
| 867 |
-
- `"knowledgebases"`: `list[str]` datasets to update.
|
| 868 |
- `"llm"`: `dict` The LLM settings:
|
| 869 |
- `"model_name"`, `str` The chat model name.
|
| 870 |
- `"temperature"`, `float` Controls the randomness of the model's predictions.
|
|
@@ -906,17 +944,17 @@ assistant.update({"name": "Stefan", "llm": {"temperature": 0.8}, "prompt": {"top
|
|
| 906 |
|
| 907 |
## Delete chats
|
| 908 |
|
| 909 |
-
Deletes specified chat assistants.
|
| 910 |
-
|
| 911 |
```python
|
| 912 |
RAGFlow.delete_chats(ids: list[str] = None)
|
| 913 |
```
|
| 914 |
|
|
|
|
|
|
|
| 915 |
### Parameters
|
| 916 |
|
| 917 |
-
#### ids
|
| 918 |
|
| 919 |
-
IDs of the chat assistants to delete. If not specified, all chat assistants will be deleted.
|
| 920 |
|
| 921 |
### Returns
|
| 922 |
|
|
@@ -953,11 +991,11 @@ Retrieves a list of chat assistants.
|
|
| 953 |
|
| 954 |
#### page
|
| 955 |
|
| 956 |
-
Specifies the page on which the
|
| 957 |
|
| 958 |
#### page_size
|
| 959 |
|
| 960 |
-
The number of
|
| 961 |
|
| 962 |
#### order_by
|
| 963 |
|
|
@@ -985,8 +1023,8 @@ The name of the chat to retrieve. Defaults to `None`.
|
|
| 985 |
```python
|
| 986 |
from ragflow import RAGFlow
|
| 987 |
|
| 988 |
-
|
| 989 |
-
for assistant in
|
| 990 |
print(assistant)
|
| 991 |
```
|
| 992 |
|
|
@@ -996,6 +1034,8 @@ for assistant in rag.list_chats():
|
|
| 996 |
Chat-session APIs
|
| 997 |
:::
|
| 998 |
|
|
|
|
|
|
|
| 999 |
## Create session
|
| 1000 |
|
| 1001 |
```python
|
|
@@ -1036,12 +1076,14 @@ session = assistant.create_session()
|
|
| 1036 |
Session.update(update_message: dict)
|
| 1037 |
```
|
| 1038 |
|
| 1039 |
-
Updates the current session.
|
| 1040 |
|
| 1041 |
### Parameters
|
| 1042 |
|
| 1043 |
#### update_message: `dict[str, Any]`, *Required*
|
| 1044 |
|
|
|
|
|
|
|
| 1045 |
- `"name"`: `str` The name of the session to update.
|
| 1046 |
|
| 1047 |
### Returns
|
|
@@ -1169,17 +1211,17 @@ Lists sessions associated with the current chat assistant.
|
|
| 1169 |
|
| 1170 |
#### page
|
| 1171 |
|
| 1172 |
-
Specifies the page on which
|
| 1173 |
|
| 1174 |
#### page_size
|
| 1175 |
|
| 1176 |
-
The number of
|
| 1177 |
|
| 1178 |
#### orderby
|
| 1179 |
|
| 1180 |
-
The field by which
|
| 1181 |
|
| 1182 |
-
- `"create_time"` (
|
| 1183 |
- `"update_time"`
|
| 1184 |
|
| 1185 |
#### desc
|
|
@@ -1204,8 +1246,8 @@ The name of the chat to retrieve. Defaults to `None`.
|
|
| 1204 |
```python
|
| 1205 |
from ragflow import RAGFlow
|
| 1206 |
|
| 1207 |
-
|
| 1208 |
-
assistant =
|
| 1209 |
assistant = assistant[0]
|
| 1210 |
for session in assistant.list_sessions():
|
| 1211 |
print(session)
|
|
@@ -1219,13 +1261,13 @@ for session in assistant.list_sessions():
|
|
| 1219 |
Chat.delete_sessions(ids:list[str] = None)
|
| 1220 |
```
|
| 1221 |
|
| 1222 |
-
Deletes
|
| 1223 |
|
| 1224 |
### Parameters
|
| 1225 |
|
| 1226 |
-
#### ids
|
| 1227 |
|
| 1228 |
-
IDs of the sessions to delete. If not specified, all sessions associated with the current chat assistant will be deleted.
|
| 1229 |
|
| 1230 |
### Returns
|
| 1231 |
|
|
|
|
| 2 |
|
| 3 |
**THE API REFERENCES BELOW ARE STILL UNDER DEVELOPMENT.**
|
| 4 |
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
:::tip NOTE
|
| 8 |
Dataset Management
|
| 9 |
:::
|
| 10 |
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
## Create dataset
|
| 14 |
|
| 15 |
```python
|
|
|
|
| 59 |
|
| 60 |
#### permission
|
| 61 |
|
| 62 |
+
Specifies who can access the dataset to create. You can set it only to `"me"` for now.
|
| 63 |
|
| 64 |
#### chunk_method, `str`
|
| 65 |
|
| 66 |
+
The chunking method of the dataset to create. Available options:
|
| 67 |
+
|
| 68 |
+
- `"naive"`: General (default)
|
| 69 |
+
- `"manual`: Manual
|
| 70 |
+
- `"qa"`: Q&A
|
| 71 |
+
- `"table"`: Table
|
| 72 |
+
- `"paper"`: Paper
|
| 73 |
+
- `"book"`: Book
|
| 74 |
+
- `"laws"`: Laws
|
| 75 |
+
- `"presentation"`: Presentation
|
| 76 |
+
- `"picture"`: Picture
|
| 77 |
+
- `"one"`:One
|
| 78 |
+
- `"knowledge_graph"`: Knowledge Graph
|
| 79 |
+
- `"email"`: Email
|
| 80 |
|
| 81 |
#### parser_config
|
| 82 |
|
|
|
|
| 84 |
|
| 85 |
- `chunk_token_count`: Defaults to `128`.
|
| 86 |
- `layout_recognize`: Defaults to `True`.
|
| 87 |
+
- `delimiter`: Defaults to `"\n!?。;!?"`.
|
| 88 |
- `task_page_size`: Defaults to `12`.
|
| 89 |
|
| 90 |
### Returns
|
|
|
|
| 98 |
from ragflow import RAGFlow
|
| 99 |
|
| 100 |
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
| 101 |
+
dataset = rag_object.create_dataset(name="kb_1")
|
| 102 |
```
|
| 103 |
|
| 104 |
---
|
|
|
|
| 109 |
RAGFlow.delete_datasets(ids: list[str] = None)
|
| 110 |
```
|
| 111 |
|
| 112 |
+
Deletes specified datasets or all datasets in the system.
|
| 113 |
|
| 114 |
### Parameters
|
| 115 |
|
| 116 |
+
#### ids: `list[str]`
|
| 117 |
|
| 118 |
+
The IDs of the datasets to delete. Defaults to `None`. If not specified, all datasets in the system will be deleted.
|
| 119 |
|
| 120 |
### Returns
|
| 121 |
|
|
|
|
| 125 |
### Examples
|
| 126 |
|
| 127 |
```python
|
| 128 |
+
rag_object.delete_datasets(ids=["id_1","id_2"])
|
| 129 |
```
|
| 130 |
|
| 131 |
---
|
|
|
|
| 149 |
|
| 150 |
#### page: `int`
|
| 151 |
|
| 152 |
+
Specifies the page on which the datasets will be displayed. Defaults to `1`.
|
| 153 |
|
| 154 |
#### page_size: `int`
|
| 155 |
|
| 156 |
+
The number of datasets on each page. Defaults to `1024`.
|
| 157 |
|
| 158 |
+
#### orderby: `str`
|
| 159 |
|
| 160 |
+
The field by which datasets should be sorted. Available options:
|
| 161 |
+
|
| 162 |
+
- `"create_time"` (default)
|
| 163 |
+
- `"update_time"`
|
| 164 |
|
| 165 |
#### desc: `bool`
|
| 166 |
|
|
|
|
| 168 |
|
| 169 |
#### id: `str`
|
| 170 |
|
| 171 |
+
The ID of the dataset to retrieve. Defaults to `None`.
|
| 172 |
|
| 173 |
#### name: `str`
|
| 174 |
|
| 175 |
+
The name of the dataset to retrieve. Defaults to `None`.
|
| 176 |
|
| 177 |
### Returns
|
| 178 |
|
| 179 |
+
- Success: A list of `DataSet` objects.
|
| 180 |
- Failure: `Exception`.
|
| 181 |
|
| 182 |
### Examples
|
|
|
|
| 184 |
#### List all datasets
|
| 185 |
|
| 186 |
```python
|
| 187 |
+
for dataset in rag_object.list_datasets():
|
| 188 |
+
print(dataset)
|
| 189 |
```
|
| 190 |
|
| 191 |
#### Retrieve a dataset by ID
|
|
|
|
| 203 |
DataSet.update(update_message: dict)
|
| 204 |
```
|
| 205 |
|
| 206 |
+
Updates configurations for the current dataset.
|
| 207 |
|
| 208 |
### Parameters
|
| 209 |
|
| 210 |
#### update_message: `dict[str, str|int]`, *Required*
|
| 211 |
|
| 212 |
+
A dictionary representing the attributes to update, with the following keys:
|
| 213 |
+
|
| 214 |
- `"name"`: `str` The name of the dataset to update.
|
| 215 |
+
- `"embedding_model"`: `str` The embedding model name to update.
|
| 216 |
- Ensure that `"chunk_count"` is `0` before updating `"embedding_model"`.
|
| 217 |
+
- `"chunk_method"`: `str` The chunking method for the dataset. Available options:
|
| 218 |
- `"naive"`: General
|
| 219 |
- `"manual`: Manual
|
| 220 |
- `"qa"`: Q&A
|
|
|
|
| 238 |
```python
|
| 239 |
from ragflow import RAGFlow
|
| 240 |
|
| 241 |
+
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
| 242 |
+
dataset = rag_object.list_datasets(name="kb_name")
|
| 243 |
dataset.update({"embedding_model":"BAAI/bge-zh-v1.5", "chunk_method":"manual"})
|
| 244 |
```
|
| 245 |
|
|
|
|
| 261 |
|
| 262 |
### Parameters
|
| 263 |
|
| 264 |
+
#### document_list: `list[dict]`, *Required*
|
| 265 |
|
| 266 |
A list of dictionaries representing the documents to upload, each containing the following keys:
|
| 267 |
|
|
|
|
| 294 |
|
| 295 |
#### update_message: `dict[str, str|dict[]]`, *Required*
|
| 296 |
|
| 297 |
+
A dictionary representing the attributes to update, with the following keys:
|
| 298 |
+
|
| 299 |
- `"name"`: `str` The name of the document to update.
|
| 300 |
- `"parser_config"`: `dict[str, Any]` The parsing configuration for the document:
|
| 301 |
- `"chunk_token_count"`: Defaults to `128`.
|
|
|
|
| 326 |
```python
|
| 327 |
from ragflow import RAGFlow
|
| 328 |
|
| 329 |
+
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
| 330 |
+
dataset = rag_object.list_datasets(id='id')
|
| 331 |
+
dataset = dataset[0]
|
| 332 |
doc = dataset.list_documents(id="wdfxb5t547d")
|
| 333 |
doc = doc[0]
|
| 334 |
doc.update([{"parser_config": {"chunk_token_count": 256}}, {"chunk_method": "manual"}])
|
|
|
|
| 342 |
Document.download() -> bytes
|
| 343 |
```
|
| 344 |
|
| 345 |
+
Downloads the current document.
|
| 346 |
|
| 347 |
### Returns
|
| 348 |
|
|
|
|
| 374 |
|
| 375 |
### Parameters
|
| 376 |
|
| 377 |
+
#### id: `str`
|
| 378 |
|
| 379 |
The ID of the document to retrieve. Defaults to `None`.
|
| 380 |
|
| 381 |
+
#### keywords: `str`
|
| 382 |
|
| 383 |
The keywords to match document titles. Defaults to `None`.
|
| 384 |
|
| 385 |
+
#### offset: `int`
|
| 386 |
|
| 387 |
+
The starting index for the documents to retrieve. Typically used in confunction with `limit`. Defaults to `0`.
|
| 388 |
|
| 389 |
+
#### limit: `int`
|
| 390 |
|
| 391 |
+
The maximum number of documents to retrieve. Defaults to `1024`. A value of `-1` indicates that all documents should be returned.
|
| 392 |
|
| 393 |
+
#### orderby: `str`
|
| 394 |
|
| 395 |
+
The field by which documents should be sorted. Available options:
|
| 396 |
|
| 397 |
+
- `"create_time"` (default)
|
| 398 |
- `"update_time"`
|
| 399 |
|
| 400 |
+
#### desc: `bool`
|
| 401 |
|
| 402 |
Indicates whether the retrieved documents should be sorted in descending order. Defaults to `True`.
|
| 403 |
|
|
|
|
| 408 |
|
| 409 |
A `Document` object contains the following attributes:
|
| 410 |
|
| 411 |
+
- `id`: The document ID. Defaults to `""`.
|
| 412 |
+
- `name`: The document name. Defaults to `""`.
|
| 413 |
+
- `thumbnail`: The thumbnail image of the document. Defaults to `None`.
|
| 414 |
+
- `knowledgebase_id`: The dataset ID associated with the document. Defaults to `None`.
|
| 415 |
+
- `chunk_method` The chunk method name. Defaults to `""`. ?????naive??????
|
| 416 |
+
- `parser_config`: `ParserConfig` Configuration object for the parser. Defaults to `{"pages": [[1, 1000000]]}`.
|
| 417 |
+
- `source_type`: The source type of the document. Defaults to `"local"`.
|
| 418 |
+
- `type`: Type or category of the document???????????. Defaults to `""`.
|
| 419 |
+
- `created_by`: `str` The creator of the document. Defaults to `""`.
|
| 420 |
+
- `size`: `int` The document size in bytes. Defaults to `0`.
|
| 421 |
+
- `token_count`: `int` The number of tokens in the document. Defaults to `0`.
|
| 422 |
+
- `chunk_count`: `int` The number of chunks that the document is split into. Defaults to `0`.
|
| 423 |
+
- `progress`: `float` The current processing progress as a percentage. Defaults to `0.0`.
|
| 424 |
+
- `progress_msg`: `str` A message indicating the current progress status. Defaults to `""`.
|
| 425 |
+
- `process_begin_at`: `datetime` The start time of document processing. Defaults to `None`.
|
| 426 |
+
- `process_duation`: `float` Duration of the processing in seconds or minutes.??????? Defaults to `0.0`.
|
| 427 |
+
- `run`: `str` ?????????????????? Defaults to `"0"`.
|
| 428 |
+
- `status`: `str` ??????????????????? Defaults to `"1"`.
|
| 429 |
|
| 430 |
### Examples
|
| 431 |
|
|
|
|
| 436 |
dataset = rag.create_dataset(name="kb_1")
|
| 437 |
|
| 438 |
filename1 = "~/ragflow.txt"
|
| 439 |
+
blob = open(filename1 , "rb").read()
|
| 440 |
+
dataset.upload_documents([{"name":filename1,"blob":blob}])
|
| 441 |
+
for doc in dataset.list_documents(keywords="rag", offset=0, limit=12):
|
| 442 |
+
print(doc)
|
|
|
|
| 443 |
```
|
| 444 |
|
| 445 |
---
|
|
|
|
| 450 |
DataSet.delete_documents(ids: list[str] = None)
|
| 451 |
```
|
| 452 |
|
| 453 |
+
Deletes documents by ID.
|
| 454 |
+
|
| 455 |
+
### Parameters
|
| 456 |
+
|
| 457 |
+
#### ids: `list[list]`
|
| 458 |
+
|
| 459 |
+
The IDs of the documents to delete. Defaults to `None`. If not specified, all documents in the dataset will be deleted.
|
| 460 |
|
| 461 |
### Returns
|
| 462 |
|
|
|
|
| 468 |
```python
|
| 469 |
from ragflow import RAGFlow
|
| 470 |
|
| 471 |
+
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
| 472 |
+
dataset = rag_object.list_datasets(name="kb_1")
|
| 473 |
+
dataset = dataset[0]
|
| 474 |
+
dataset.delete_documents(ids=["id_1","id_2"])
|
| 475 |
```
|
| 476 |
|
| 477 |
---
|
|
|
|
| 484 |
|
| 485 |
### Parameters
|
| 486 |
|
| 487 |
+
#### document_ids: `list[str]`, *Required*
|
| 488 |
|
| 489 |
The IDs of the documents to parse.
|
| 490 |
|
|
|
|
| 496 |
### Examples
|
| 497 |
|
| 498 |
```python
|
| 499 |
+
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
| 500 |
+
dataset = rag_object.create_dataset(name="dataset_name")
|
|
|
|
| 501 |
documents = [
|
| 502 |
{'name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()},
|
| 503 |
{'name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()},
|
| 504 |
{'name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()}
|
| 505 |
]
|
| 506 |
+
dataset.upload_documents(documents)
|
| 507 |
+
documents = dataset.list_documents(keywords="test")
|
| 508 |
+
ids = []
|
| 509 |
for document in documents:
|
| 510 |
ids.append(document.id)
|
| 511 |
+
dataset.async_parse_documents(ids)
|
| 512 |
+
print("Async bulk parsing initiated.")
|
|
|
|
|
|
|
| 513 |
```
|
| 514 |
|
| 515 |
---
|
|
|
|
| 522 |
|
| 523 |
### Parameters
|
| 524 |
|
| 525 |
+
#### document_ids: `list[str]`, *Required*
|
| 526 |
|
| 527 |
+
The IDs of the documents for which parsing should be stopped.
|
| 528 |
|
| 529 |
### Returns
|
| 530 |
|
|
|
|
| 534 |
### Examples
|
| 535 |
|
| 536 |
```python
|
| 537 |
+
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
| 538 |
+
dataset = rag_object.create_dataset(name="dataset_name")
|
|
|
|
| 539 |
documents = [
|
| 540 |
{'name': 'test1.txt', 'blob': open('./test_data/test1.txt',"rb").read()},
|
| 541 |
{'name': 'test2.txt', 'blob': open('./test_data/test2.txt',"rb").read()},
|
| 542 |
{'name': 'test3.txt', 'blob': open('./test_data/test3.txt',"rb").read()}
|
| 543 |
]
|
| 544 |
+
dataset.upload_documents(documents)
|
| 545 |
+
documents = dataset.list_documents(keywords="test")
|
| 546 |
+
ids = []
|
| 547 |
for document in documents:
|
| 548 |
ids.append(document.id)
|
| 549 |
+
dataset.async_parse_documents(ids)
|
| 550 |
+
print("Async bulk parsing initiated.")
|
| 551 |
+
dataset.async_cancel_parse_documents(ids)
|
| 552 |
+
print("Async bulk parsing cancelled.")
|
| 553 |
```
|
| 554 |
|
| 555 |
---
|
|
|
|
| 560 |
Document.list_chunks(keywords: str = None, offset: int = 0, limit: int = -1, id : str = None) -> list[Chunk]
|
| 561 |
```
|
| 562 |
|
| 563 |
+
Retrieves a list of document chunks.
|
| 564 |
+
|
| 565 |
### Parameters
|
| 566 |
|
| 567 |
+
#### keywords: `str`
|
| 568 |
|
| 569 |
List chunks whose name has the given keywords. Defaults to `None`
|
| 570 |
|
| 571 |
+
#### offset: `int`
|
| 572 |
|
| 573 |
+
The starting index for the chunks to retrieve. Defaults to `1`
|
| 574 |
|
| 575 |
#### limit
|
| 576 |
|
| 577 |
+
The maximum number of chunks to retrieve. Default: `30`
|
| 578 |
|
| 579 |
#### id
|
| 580 |
|
|
|
|
| 582 |
|
| 583 |
### Returns
|
| 584 |
|
| 585 |
+
- Success: A list of `Chunk` objects.
|
| 586 |
+
- Failure: `Exception`.
|
| 587 |
|
| 588 |
### Examples
|
| 589 |
|
| 590 |
```python
|
| 591 |
from ragflow import RAGFlow
|
| 592 |
|
| 593 |
+
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
| 594 |
+
dataset = rag_object.list_datasets("123")
|
| 595 |
+
dataset = dataset[0]
|
| 596 |
+
dataset.async_parse_documents(["wdfxb5t547d"])
|
| 597 |
+
for chunk in doc.list_chunks(keywords="rag", offset=0, limit=12):
|
| 598 |
+
print(chunk)
|
| 599 |
```
|
| 600 |
|
| 601 |
## Add chunk
|
|
|
|
| 608 |
|
| 609 |
#### content: *Required*
|
| 610 |
|
| 611 |
+
The text content of the chunk.
|
| 612 |
|
| 613 |
#### important_keywords :`list[str]`
|
| 614 |
|
|
|
|
| 639 |
Document.delete_chunks(chunk_ids: list[str])
|
| 640 |
```
|
| 641 |
|
| 642 |
+
Deletes chunks by ID.
|
| 643 |
+
|
| 644 |
### Parameters
|
| 645 |
|
| 646 |
+
#### chunk_ids: `list[str]`
|
| 647 |
|
| 648 |
+
The IDs of the chunks to delete. Defaults to `None`. If not specified, all chunks of the current document will be deleted.
|
| 649 |
|
| 650 |
### Returns
|
| 651 |
|
|
|
|
| 674 |
Chunk.update(update_message: dict)
|
| 675 |
```
|
| 676 |
|
| 677 |
+
Updates content or configurations for the current chunk.
|
| 678 |
|
| 679 |
### Parameters
|
| 680 |
|
| 681 |
#### update_message: `dict[str, str|list[str]|int]` *Required*
|
| 682 |
|
| 683 |
+
A dictionary representing the attributes to update, with the following keys:
|
| 684 |
+
|
| 685 |
- `"content"`: `str` Content of the chunk.
|
| 686 |
- `"important_keywords"`: `list[str]` A list of key terms to attach to the chunk.
|
| 687 |
+
- `"available"`: `int` The chunk's availability status in the dataset. Value options:
|
| 688 |
- `0`: Unavailable
|
| 689 |
- `1`: Available
|
| 690 |
|
|
|
|
| 731 |
|
| 732 |
#### offset: `int`
|
| 733 |
|
| 734 |
+
The starting index for the documents to retrieve. Defaults to `0`??????.
|
| 735 |
|
| 736 |
#### limit: `int`
|
| 737 |
|
| 738 |
+
The maximum number of chunks to retrieve. Defaults to `6`.
|
| 739 |
|
| 740 |
#### Similarity_threshold: `float`
|
| 741 |
|
|
|
|
| 798 |
Chat Assistant Management
|
| 799 |
:::
|
| 800 |
|
| 801 |
+
---
|
| 802 |
+
|
| 803 |
## Create chat assistant
|
| 804 |
|
| 805 |
```python
|
|
|
|
| 892 |
Chat.update(update_message: dict)
|
| 893 |
```
|
| 894 |
|
| 895 |
+
Updates configurations for the current chat assistant.
|
| 896 |
|
| 897 |
### Parameters
|
| 898 |
|
| 899 |
+
#### update_message: `dict[str, str|list[str]|dict[]]`, *Required*
|
| 900 |
+
|
| 901 |
+
A dictionary representing the attributes to update, with the following keys:
|
| 902 |
|
| 903 |
- `"name"`: `str` The name of the chat assistant to update.
|
| 904 |
- `"avatar"`: `str` Base64 encoding of the avatar. Defaults to `""`
|
| 905 |
+
- `"knowledgebases"`: `list[str]` The datasets to update.
|
| 906 |
- `"llm"`: `dict` The LLM settings:
|
| 907 |
- `"model_name"`, `str` The chat model name.
|
| 908 |
- `"temperature"`, `float` Controls the randomness of the model's predictions.
|
|
|
|
| 944 |
|
| 945 |
## Delete chats
|
| 946 |
|
|
|
|
|
|
|
| 947 |
```python
|
| 948 |
RAGFlow.delete_chats(ids: list[str] = None)
|
| 949 |
```
|
| 950 |
|
| 951 |
+
Deletes chat assistants by ID.
|
| 952 |
+
|
| 953 |
### Parameters
|
| 954 |
|
| 955 |
+
#### ids: `list[str]`
|
| 956 |
|
| 957 |
+
The IDs of the chat assistants to delete. Defaults to `None`. If not specified, all chat assistants in the system will be deleted.
|
| 958 |
|
| 959 |
### Returns
|
| 960 |
|
|
|
|
| 991 |
|
| 992 |
#### page
|
| 993 |
|
| 994 |
+
Specifies the page on which the chat assistants will be displayed. Defaults to `1`.
|
| 995 |
|
| 996 |
#### page_size
|
| 997 |
|
| 998 |
+
The number of chat assistants on each page. Defaults to `1024`.
|
| 999 |
|
| 1000 |
#### order_by
|
| 1001 |
|
|
|
|
| 1023 |
```python
|
| 1024 |
from ragflow import RAGFlow
|
| 1025 |
|
| 1026 |
+
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
| 1027 |
+
for assistant in rag_object.list_chats():
|
| 1028 |
print(assistant)
|
| 1029 |
```
|
| 1030 |
|
|
|
|
| 1034 |
Chat-session APIs
|
| 1035 |
:::
|
| 1036 |
|
| 1037 |
+
---
|
| 1038 |
+
|
| 1039 |
## Create session
|
| 1040 |
|
| 1041 |
```python
|
|
|
|
| 1076 |
Session.update(update_message: dict)
|
| 1077 |
```
|
| 1078 |
|
| 1079 |
+
Updates the current session name.
|
| 1080 |
|
| 1081 |
### Parameters
|
| 1082 |
|
| 1083 |
#### update_message: `dict[str, Any]`, *Required*
|
| 1084 |
|
| 1085 |
+
A dictionary representing the attributes to update, with only one key:
|
| 1086 |
+
|
| 1087 |
- `"name"`: `str` The name of the session to update.
|
| 1088 |
|
| 1089 |
### Returns
|
|
|
|
| 1211 |
|
| 1212 |
#### page
|
| 1213 |
|
| 1214 |
+
Specifies the page on which the sessions will be displayed. Defaults to `1`.
|
| 1215 |
|
| 1216 |
#### page_size
|
| 1217 |
|
| 1218 |
+
The number of sessions on each page. Defaults to `1024`.
|
| 1219 |
|
| 1220 |
#### orderby
|
| 1221 |
|
| 1222 |
+
The field by which sessions should be sorted. Available options:
|
| 1223 |
|
| 1224 |
+
- `"create_time"` (default)
|
| 1225 |
- `"update_time"`
|
| 1226 |
|
| 1227 |
#### desc
|
|
|
|
| 1246 |
```python
|
| 1247 |
from ragflow import RAGFlow
|
| 1248 |
|
| 1249 |
+
rag_object = RAGFlow(api_key="<YOUR_API_KEY>", base_url="http://<YOUR_BASE_URL>:9380")
|
| 1250 |
+
assistant = rag_object.list_chats(name="Miss R")
|
| 1251 |
assistant = assistant[0]
|
| 1252 |
for session in assistant.list_sessions():
|
| 1253 |
print(session)
|
|
|
|
| 1261 |
Chat.delete_sessions(ids:list[str] = None)
|
| 1262 |
```
|
| 1263 |
|
| 1264 |
+
Deletes sessions by ID.
|
| 1265 |
|
| 1266 |
### Parameters
|
| 1267 |
|
| 1268 |
+
#### ids: `list[str]`
|
| 1269 |
|
| 1270 |
+
The IDs of the sessions to delete. Defaults to `None`. If not specified, all sessions associated with the current chat assistant will be deleted.
|
| 1271 |
|
| 1272 |
### Returns
|
| 1273 |
|