widgettdc-api / docs /technical /DATA_INGESTION_GUIDE_DA.md
Kraft102's picture
fix: sql.js Docker/Alpine compatibility layer for PatternMemory and FailureMemory
5a81b95

🎯 Data Ingestion System - Final Status

Dato: 2025-11-24 kl. 15:26
Status: ✅ KLAR TIL BRUG


📊 Tilgængelige Data Kilder

Data Kilde Status Beskrivelse
📁 Lokale Filer Scannerdokumenter, Downloads, Desktop
🌐 Browser Historik Chrome og Edge besøgshistorik
📧 Outlook Email Læser fra JSON eksport
📅 Aula Kalender ⚠️ Udkommenteret (mangler dependencies)
☁️ Google Drive 🚧 Klar til implementation

🚀 Sådan bruges systemet

Via MCP Tool (Anbefalet)

// Start fuld data indsamling
await mcpClient.callTool({
  tool: 'ingestion.start',
  payload: {}
});

// Tjek status
await mcpClient.callTool({
  tool: 'ingestion.status',
  payload: {}
});

Via REST API

# Start ingestion
curl -X POST http://localhost:3001/api/mcp/route \
  -H "Content-Type: application/json" \
  -d '{"tool": "ingestion.start", "payload": {}}'

Programmatisk

import { dataIngestionEngine } from './services/ingestion/DataIngestionEngine.js';

await dataIngestionEngine.ingestAll();

📋 Hvad Scanner Systemet?

Lokale Filer

  • Mapper: Documents, Downloads, Desktop
  • Filtyper: .txt, .md, .pdf, .docx, .xlsx, .csv, .json
  • Maks dybde: 3 niveauer
  • Maks størrelse: 10MB per fil
  • Ekskluderer: node_modules, .git, dist, build, $RECYCLE.BIN

Browser Historik

  • Chrome: AppData\Local\Google\Chrome\User Data\Default\History
  • Edge: AppData\Local\Microsoft\Edge\User Data\Default\History
  • Antal: Sidste 1000 besøg

Outlook Email

  • Format: JSON eksport
  • Placering: apps/backend/data/outlook-mails.json
  • Data: Subject, afsender, dato, preview, vigtighed

🔧 Outlook Email Eksport

For at bruge Outlook integration:

**Option 1: PowerS

hell Script**

# Eksporter emails til JSON
Add-Type -AssemblyName "Microsoft.Office.Interop.Outlook"
$outlook = New-Object -ComObject Outlook.Application
$namespace = $outlook.GetNamespace("MAPI")
$inbox = $namespace.GetDefaultFolder(6) # 6 = Inbox

$emails = @()
foreach ($mail in $inbox.Items | Select-Object -First 100) {
    $emails += @{
        id = $mail.EntryID
        subject = $mail.Subject
        sender = @{
            name = $mail.SenderName
            address = $mail.SenderEmailAddress
        }
        receivedDateTime = $mail.ReceivedTime
        bodyPreview = $mail.Body.Substring(0, [Math]::Min(200, $mail.Body.Length))
        importance = $mail.Importance
        isRead = $mail.UnRead -eq $false
    }
}

$emails | ConvertTo-Json | Out-File "apps/backend/data/outlook-mails.json"

Option 2: Graph API

# Hent emails via Microsoft Graph
curl -X GET "https://graph.microsoft.com/v1.0/me/messages" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  > apps/backend/data/outlook-mails.json

📈 Næste Skridt (Autonom Fortsættelse)

Systemet vil nu autonomt:

  1. Scanne lokale filer - Documents, Downloads, Desktop
  2. Indlæse browser historik - Chrome/Edge
  3. Læse Outlook emails - Hvis JSON fil findes
  4. 🔄 Gemme i database - Næste fase
  5. 🔄 Enable semantisk søgning - Gør data søgbart
  6. 🔄 Aula integration - Når dependencies er klar

✅ Test Systemet

Kør en test ingestion:

npx tsx -e "
import { dataIngestionEngine } from './apps/backend/src/services/ingestion/DataIngestionEngine.js';
import { LocalFileScanner } from './apps/backend/src/services/ingestion/LocalFileScanner.js';

const scanner = new LocalFileScanner({
    rootPaths: ['C:\\\\Users\\\\claus\\\\Desktop'],
    extensions: ['.txt', '.md'],
    maxDepth: 1
});

dataIngestionEngine.registerAdapter(scanner);
await dataIngestionEngine.ingestAll();
"

Status: ✅ Alle systemer klar
Total Sources: 3 aktive (Lokale Filer, Browser, Outlook)
Backend: ✅ Kører
Database: ✅ Initialiseret