Spaces:
Paused
Paused
| # 🎯 Data Ingestion System - Final Status | |
| **Dato:** 2025-11-24 kl. 15:26 | |
| **Status:** ✅ KLAR TIL BRUG | |
| --- | |
| ## 📊 Tilgængelige Data Kilder | |
| | Data Kilde | Status | Beskrivelse | | |
| |------------|--------|-------------| | |
| | 📁 **Lokale Filer** | ✅ | Scannerdokumenter, Downloads, Desktop | | |
| | 🌐 **Browser Historik** | ✅ | Chrome og Edge besøgshistorik | | |
| | 📧 **Outlook Email** | ✅ | Læser fra JSON eksport | | |
| | 📅 **Aula Kalender** | ⚠️ | Udkommenteret (mangler dependencies) | | |
| | ☁️ **Google Drive** | 🚧 | Klar til implementation | | |
| --- | |
| ## 🚀 Sådan bruges systemet | |
| ### **Via MCP Tool (Anbefalet)** | |
| ```typescript | |
| // Start fuld data indsamling | |
| await mcpClient.callTool({ | |
| tool: 'ingestion.start', | |
| payload: {} | |
| }); | |
| // Tjek status | |
| await mcpClient.callTool({ | |
| tool: 'ingestion.status', | |
| payload: {} | |
| }); | |
| ``` | |
| ### **Via REST API** | |
| ```bash | |
| # Start ingestion | |
| curl -X POST http://localhost:3001/api/mcp/route \ | |
| -H "Content-Type: application/json" \ | |
| -d '{"tool": "ingestion.start", "payload": {}}' | |
| ``` | |
| ### **Programmatisk** | |
| ```typescript | |
| import { dataIngestionEngine } from './services/ingestion/DataIngestionEngine.js'; | |
| await dataIngestionEngine.ingestAll(); | |
| ``` | |
| --- | |
| ## 📋 Hvad Scanner Systemet? | |
| ### **Lokale Filer** | |
| - **Mapper:** Documents, Downloads, Desktop | |
| - **Filtyper:** .txt, .md, .pdf, .docx, .xlsx, .csv, .json | |
| - **Maks dybde:** 3 niveauer | |
| - **Maks størrelse:** 10MB per fil | |
| - **Ekskluderer:** node_modules, .git, dist, build, $RECYCLE.BIN | |
| ### **Browser Historik** | |
| - **Chrome:** `AppData\Local\Google\Chrome\User Data\Default\History` | |
| - **Edge:** `AppData\Local\Microsoft\Edge\User Data\Default\History` | |
| - **Antal:** Sidste 1000 besøg | |
| ### **Outlook Email** | |
| - **Format:** JSON eksport | |
| - **Placering:** `apps/backend/data/outlook-mails.json` | |
| - **Data:** Subject, afsender, dato, preview, vigtighed | |
| --- | |
| ## 🔧 Outlook Email Eksport | |
| For at bruge Outlook integration: | |
| ### **Option 1: PowerS | |
| hell Script** | |
| ```powershell | |
| # Eksporter emails til JSON | |
| Add-Type -AssemblyName "Microsoft.Office.Interop.Outlook" | |
| $outlook = New-Object -ComObject Outlook.Application | |
| $namespace = $outlook.GetNamespace("MAPI") | |
| $inbox = $namespace.GetDefaultFolder(6) # 6 = Inbox | |
| $emails = @() | |
| foreach ($mail in $inbox.Items | Select-Object -First 100) { | |
| $emails += @{ | |
| id = $mail.EntryID | |
| subject = $mail.Subject | |
| sender = @{ | |
| name = $mail.SenderName | |
| address = $mail.SenderEmailAddress | |
| } | |
| receivedDateTime = $mail.ReceivedTime | |
| bodyPreview = $mail.Body.Substring(0, [Math]::Min(200, $mail.Body.Length)) | |
| importance = $mail.Importance | |
| isRead = $mail.UnRead -eq $false | |
| } | |
| } | |
| $emails | ConvertTo-Json | Out-File "apps/backend/data/outlook-mails.json" | |
| ``` | |
| ### **Option 2: Graph API** | |
| ```bash | |
| # Hent emails via Microsoft Graph | |
| curl -X GET "https://graph.microsoft.com/v1.0/me/messages" \ | |
| -H "Authorization: Bearer YOUR_TOKEN" \ | |
| > apps/backend/data/outlook-mails.json | |
| ``` | |
| --- | |
| ## 📈 Næste Skridt (Autonom Fortsættelse) | |
| Systemet vil nu autonomt: | |
| 1. ✅ **Scanne lokale filer** - Documents, Downloads, Desktop | |
| 2. ✅ **Indlæse browser historik** - Chrome/Edge | |
| 3. ✅ **Læse Outlook emails** - Hvis JSON fil findes | |
| 4. 🔄 **Gemme i database** - Næste fase | |
| 5. 🔄 **Enable semantisk søgning** - Gør data søgbart | |
| 6. 🔄 **Aula integration** - Når dependencies er klar | |
| --- | |
| ## ✅ Test Systemet | |
| Kør en test ingestion: | |
| ```bash | |
| npx tsx -e " | |
| import { dataIngestionEngine } from './apps/backend/src/services/ingestion/DataIngestionEngine.js'; | |
| import { LocalFileScanner } from './apps/backend/src/services/ingestion/LocalFileScanner.js'; | |
| const scanner = new LocalFileScanner({ | |
| rootPaths: ['C:\\\\Users\\\\claus\\\\Desktop'], | |
| extensions: ['.txt', '.md'], | |
| maxDepth: 1 | |
| }); | |
| dataIngestionEngine.registerAdapter(scanner); | |
| await dataIngestionEngine.ingestAll(); | |
| " | |
| ``` | |
| --- | |
| **Status:** ✅ Alle systemer klar | |
| **Total Sources:** 3 aktive (Lokale Filer, Browser, Outlook) | |
| **Backend:** ✅ Kører | |
| **Database:** ✅ Initialiseret | |