File size: 2,745 Bytes
87a665c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
import requests
import logging, os
from typing import Iterator, List, Union
from urllib.parse import quote

from langchain_core.document_loaders import BaseLoader
from langchain_core.documents import Document
from open_webui.utils.headers import include_user_info_headers

log = logging.getLogger(__name__)


class ExternalDocumentLoader(BaseLoader):
    def __init__(
        self,
        file_path,
        url: str,
        api_key: str,
        mime_type=None,
        user=None,
        **kwargs,
    ) -> None:
        self.url = url
        self.api_key = api_key

        self.file_path = file_path
        self.mime_type = mime_type

        self.user = user

    def load(self) -> List[Document]:
        with open(self.file_path, 'rb') as f:
            data = f.read()

        headers = {}
        if self.mime_type is not None:
            headers['Content-Type'] = self.mime_type

        if self.api_key is not None:
            headers['Authorization'] = f'Bearer {self.api_key}'

        try:
            headers['X-Filename'] = quote(os.path.basename(self.file_path))
        except Exception:
            pass

        if self.user is not None:
            headers = include_user_info_headers(headers, self.user)

        url = self.url
        if url.endswith('/'):
            url = url[:-1]

        try:
            response = requests.put(f'{url}/process', data=data, headers=headers)
        except Exception as e:
            log.error(f'Error connecting to endpoint: {e}')
            raise Exception(f'Error connecting to endpoint: {e}')

        if response.ok:
            response_data = response.json()
            if response_data:
                if isinstance(response_data, dict):
                    return [
                        Document(
                            page_content=response_data.get('page_content'),
                            metadata=response_data.get('metadata'),
                        )
                    ]
                elif isinstance(response_data, list):
                    documents = []
                    for document in response_data:
                        documents.append(
                            Document(
                                page_content=document.get('page_content'),
                                metadata=document.get('metadata'),
                            )
                        )
                    return documents
                else:
                    raise Exception('Error loading document: Unable to parse content')

            else:
                raise Exception('Error loading document: No content returned')
        else:
            raise Exception(f'Error loading document: {response.status_code} {response.text}')