init
Browse files
README.md
CHANGED
|
@@ -51,34 +51,49 @@ The SOCAR Historical Documents AI System is a sophisticated document intelligenc
|
|
| 51 |
|
| 52 |
## System Architecture
|
| 53 |
|
| 54 |
-
```
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
```
|
| 83 |
|
| 84 |
**Data Flow**:
|
|
|
|
| 51 |
|
| 52 |
## System Architecture
|
| 53 |
|
| 54 |
+
```mermaid
|
| 55 |
+
graph TB
|
| 56 |
+
subgraph "SOCAR AI System"
|
| 57 |
+
subgraph "Processing Layer"
|
| 58 |
+
OCR[OCR Engine<br/>VLM-Based Text Extraction<br/>87.75% Accuracy]
|
| 59 |
+
RAG[RAG Engine<br/>Semantic Search + LLM<br/>4.0s Response Time]
|
| 60 |
+
end
|
| 61 |
+
|
| 62 |
+
subgraph "API Layer"
|
| 63 |
+
API[FastAPI REST API<br/>Async Architecture<br/>POST /ocr, /llm]
|
| 64 |
+
end
|
| 65 |
+
|
| 66 |
+
subgraph "Infrastructure Layer"
|
| 67 |
+
Azure[Azure OpenAI<br/>Llama-4-Maverick-17B<br/>VLM + LLM Inference]
|
| 68 |
+
Pinecone[Pinecone Vector DB<br/>2,100 Vectors<br/>1024 Dimensions]
|
| 69 |
+
PyMuPDF[PyMuPDF<br/>PDF Processing<br/>Image Extraction]
|
| 70 |
+
end
|
| 71 |
+
end
|
| 72 |
+
|
| 73 |
+
User([User]) -->|Upload PDF| API
|
| 74 |
+
User -->|Ask Question| API
|
| 75 |
+
|
| 76 |
+
API -->|PDF to Image| OCR
|
| 77 |
+
API -->|Query| RAG
|
| 78 |
+
|
| 79 |
+
OCR -->|Images| Azure
|
| 80 |
+
RAG -->|Embedding| Azure
|
| 81 |
+
RAG -->|Search| Pinecone
|
| 82 |
+
|
| 83 |
+
Azure -->|Text| OCR
|
| 84 |
+
Azure -->|Answer| RAG
|
| 85 |
+
|
| 86 |
+
OCR -->|Parse PDF| PyMuPDF
|
| 87 |
+
|
| 88 |
+
API -->|Response| User
|
| 89 |
+
|
| 90 |
+
style OCR fill:#8b5cf6,stroke:#7c3aed,stroke-width:2px,color:#fff
|
| 91 |
+
style RAG fill:#10b981,stroke:#059669,stroke-width:2px,color:#fff
|
| 92 |
+
style API fill:#3b82f6,stroke:#2563eb,stroke-width:2px,color:#fff
|
| 93 |
+
style Azure fill:#f59e0b,stroke:#d97706,stroke-width:2px,color:#fff
|
| 94 |
+
style Pinecone fill:#ec4899,stroke:#db2777,stroke-width:2px,color:#fff
|
| 95 |
+
style PyMuPDF fill:#6366f1,stroke:#4f46e5,stroke-width:2px,color:#fff
|
| 96 |
+
style User fill:#64748b,stroke:#475569,stroke-width:2px,color:#fff
|
| 97 |
```
|
| 98 |
|
| 99 |
**Data Flow**:
|