# 🎯 Data Ingestion System - Final Status **Dato:** 2025-11-24 kl. 15:26 **Status:** ✅ KLAR TIL BRUG --- ## 📊 Tilgængelige Data Kilder | Data Kilde | Status | Beskrivelse | |------------|--------|-------------| | 📁 **Lokale Filer** | ✅ | Scannerdokumenter, Downloads, Desktop | | 🌐 **Browser Historik** | ✅ | Chrome og Edge besøgshistorik | | 📧 **Outlook Email** | ✅ | Læser fra JSON eksport | | 📅 **Aula Kalender** | ⚠️ | Udkommenteret (mangler dependencies) | | ☁️ **Google Drive** | 🚧 | Klar til implementation | --- ## 🚀 Sådan bruges systemet ### **Via MCP Tool (Anbefalet)** ```typescript // Start fuld data indsamling await mcpClient.callTool({ tool: 'ingestion.start', payload: {} }); // Tjek status await mcpClient.callTool({ tool: 'ingestion.status', payload: {} }); ``` ### **Via REST API** ```bash # Start ingestion curl -X POST http://localhost:3001/api/mcp/route \ -H "Content-Type: application/json" \ -d '{"tool": "ingestion.start", "payload": {}}' ``` ### **Programmatisk** ```typescript import { dataIngestionEngine } from './services/ingestion/DataIngestionEngine.js'; await dataIngestionEngine.ingestAll(); ``` --- ## 📋 Hvad Scanner Systemet? ### **Lokale Filer** - **Mapper:** Documents, Downloads, Desktop - **Filtyper:** .txt, .md, .pdf, .docx, .xlsx, .csv, .json - **Maks dybde:** 3 niveauer - **Maks størrelse:** 10MB per fil - **Ekskluderer:** node_modules, .git, dist, build, $RECYCLE.BIN ### **Browser Historik** - **Chrome:** `AppData\Local\Google\Chrome\User Data\Default\History` - **Edge:** `AppData\Local\Microsoft\Edge\User Data\Default\History` - **Antal:** Sidste 1000 besøg ### **Outlook Email** - **Format:** JSON eksport - **Placering:** `apps/backend/data/outlook-mails.json` - **Data:** Subject, afsender, dato, preview, vigtighed --- ## 🔧 Outlook Email Eksport For at bruge Outlook integration: ### **Option 1: PowerS hell Script** ```powershell # Eksporter emails til JSON Add-Type -AssemblyName "Microsoft.Office.Interop.Outlook" $outlook = New-Object -ComObject Outlook.Application $namespace = $outlook.GetNamespace("MAPI") $inbox = $namespace.GetDefaultFolder(6) # 6 = Inbox $emails = @() foreach ($mail in $inbox.Items | Select-Object -First 100) { $emails += @{ id = $mail.EntryID subject = $mail.Subject sender = @{ name = $mail.SenderName address = $mail.SenderEmailAddress } receivedDateTime = $mail.ReceivedTime bodyPreview = $mail.Body.Substring(0, [Math]::Min(200, $mail.Body.Length)) importance = $mail.Importance isRead = $mail.UnRead -eq $false } } $emails | ConvertTo-Json | Out-File "apps/backend/data/outlook-mails.json" ``` ### **Option 2: Graph API** ```bash # Hent emails via Microsoft Graph curl -X GET "https://graph.microsoft.com/v1.0/me/messages" \ -H "Authorization: Bearer YOUR_TOKEN" \ > apps/backend/data/outlook-mails.json ``` --- ## 📈 Næste Skridt (Autonom Fortsættelse) Systemet vil nu autonomt: 1. ✅ **Scanne lokale filer** - Documents, Downloads, Desktop 2. ✅ **Indlæse browser historik** - Chrome/Edge 3. ✅ **Læse Outlook emails** - Hvis JSON fil findes 4. 🔄 **Gemme i database** - Næste fase 5. 🔄 **Enable semantisk søgning** - Gør data søgbart 6. 🔄 **Aula integration** - Når dependencies er klar --- ## ✅ Test Systemet Kør en test ingestion: ```bash npx tsx -e " import { dataIngestionEngine } from './apps/backend/src/services/ingestion/DataIngestionEngine.js'; import { LocalFileScanner } from './apps/backend/src/services/ingestion/LocalFileScanner.js'; const scanner = new LocalFileScanner({ rootPaths: ['C:\\\\Users\\\\claus\\\\Desktop'], extensions: ['.txt', '.md'], maxDepth: 1 }); dataIngestionEngine.registerAdapter(scanner); await dataIngestionEngine.ingestAll(); " ``` --- **Status:** ✅ Alle systemer klar **Total Sources:** 3 aktive (Lokale Filer, Browser, Outlook) **Backend:** ✅ Kører **Database:** ✅ Initialiseret