File size: 5,926 Bytes
9a39da2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6e4c2b6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
from typing import Dict, Any, Optional, List, Union, BinaryIO
from pathlib import Path
import aiofiles
import json
import logging
from ...core.document.processor import ProcessedDocument
from ..client import LatticeClient

class DocumentClient:
    """Document processing client for SDK"""
    
    def __init__(self, client: LatticeClient):
        self.client = client
        self.logger = logging.getLogger("lattice.sdk.document")
    
    async def process_document(
        self,
        file: Union[str, Path, BinaryIO],
        config: Optional[Dict[str, Any]] = None
    ) -> ProcessedDocument:
        """Process a document"""
        try:
            # Prepare file
            if isinstance(file, (str, Path)):
                file_path = Path(file)
                async with aiofiles.open(file_path, 'rb') as f:
                    file_content = await f.read()
                filename = file_path.name
            else:
                file_content = file.read()
                filename = getattr(file, 'name', 'document')
            
            # Prepare form data
            form = aiofiles.tempfile.SpooledTemporaryFile()
            form.write(file_content)
            form.seek(0)
            
            files = {
                'file': (filename, form, 'application/octet-stream')
            }
            
            # Add configuration if provided
            data = {}
            if config:
                data['config'] = json.dumps(config)
            
            # Make request
            response = await self.client.post(
                "/api/v1/document/process",
                data=data,
                files=files
            )
            
            return ProcessedDocument(**response['document'])
            
        except Exception as e:
            self.logger.error(f"Document processing failed: {str(e)}")
            raise
        finally:
            if 'form' in locals():
                form.close()
    
    async def batch_process(
        self,
        files: List[Union[str, Path, BinaryIO]]
    ) -> Dict[str, ProcessedDocument]:
        """Batch process documents"""
        try:
            upload_files = []
            temp_files = []
            
            # Prepare files
            for file in files:
                if isinstance(file, (str, Path)):
                    file_path = Path(file)
                    async with aiofiles.open(file_path, 'rb') as f:
                        file_content = await f.read()
                    filename = file_path.name
                else:
                    file_content = file.read()
                    filename = getattr(file, 'name', f'document_{len(upload_files)}')
                
                # Create temporary file
                temp_file = aiofiles.tempfile.SpooledTemporaryFile()
                temp_file.write(file_content)
                temp_file.seek(0)
                temp_files.append(temp_file)
                
                upload_files.append(
                    ('files', (filename, temp_file, 'application/octet-stream'))
                )
            
            # Make request
            response = await self.client.post(
                "/api/v1/document/batch",
                files=upload_files
            )
            
            # Process response
            return {
                filename: ProcessedDocument(**doc['document'])
                for filename, doc in response.items()
            }
            
        except Exception as e:
            self.logger.error(f"Batch processing failed: {str(e)}")
            raise
        finally:
            # Clean up temporary files
            for temp_file in temp_files:
                temp_file.close()
    
    async def get_supported_types(self) -> Dict[str, List[str]]:
        """Get supported document types"""
        try:
            response = await self.client.get("/api/v1/document/supported-types")
            return response['supported_types']
        except Exception as e:
            self.logger.error(f"Failed to get supported types: {str(e)}")
            raise
    
    async def validate_config(self, config: Dict[str, Any]) -> bool:
        """Validate document processing configuration"""
        try:
            response = await self.client.get(
                "/api/v1/document/config/validate",
                params={"config": json.dumps(config)}
            )
            return response['valid']
        except Exception as e:
            self.logger.error(f"Config validation failed: {str(e)}")
            raise
    
    async def health_check(self) -> Dict[str, Any]:
        """Check document processor health"""
        try:
            return await self.client.get("/api/v1/document/health")
        except Exception as e:
            self.logger.error(f"Health check failed: {str(e)}")
            raise

# Usage example:
async def example_usage():
    # Initialize client
    client = LatticeClient(api_key="your-api-key")
    
    # Configure document processing
    config = {
        "extract_text": True,
        "extract_metadata": True,
        "chunk_size": 500,
        "chunk_overlap": 50
    }
    
    # Process single document
    doc_result = await client.document.process_document(
        "example.pdf",
        config=config
    )
    
    print(f"Processed document: {doc_result.doc_id}")
    print(f"Number of chunks: {len(doc_result.chunks)}")
    
    # Batch process documents
    files = ["doc1.pdf", "doc2.docx", "doc3.txt"]
    batch_results = await client.document.batch_process(files)
    
    for filename, result in batch_results.items():
        print(f"{filename}: {len(result.chunks)} chunks")
    
    # Check supported types
    supported_types = await client.document.get_supported_types()
    print(f"Supported types: {supported_types}")

if __name__ == "__main__":
    import asyncio
    asyncio.run(example_usage())