Spaces:
Paused
Paused
🎯 Data Ingestion System - Final Status
Dato: 2025-11-24 kl. 15:26
Status: ✅ KLAR TIL BRUG
📊 Tilgængelige Data Kilder
| Data Kilde | Status | Beskrivelse |
|---|---|---|
| 📁 Lokale Filer | ✅ | Scannerdokumenter, Downloads, Desktop |
| 🌐 Browser Historik | ✅ | Chrome og Edge besøgshistorik |
| 📧 Outlook Email | ✅ | Læser fra JSON eksport |
| 📅 Aula Kalender | ⚠️ | Udkommenteret (mangler dependencies) |
| ☁️ Google Drive | 🚧 | Klar til implementation |
🚀 Sådan bruges systemet
Via MCP Tool (Anbefalet)
// Start fuld data indsamling
await mcpClient.callTool({
tool: 'ingestion.start',
payload: {}
});
// Tjek status
await mcpClient.callTool({
tool: 'ingestion.status',
payload: {}
});
Via REST API
# Start ingestion
curl -X POST http://localhost:3001/api/mcp/route \
-H "Content-Type: application/json" \
-d '{"tool": "ingestion.start", "payload": {}}'
Programmatisk
import { dataIngestionEngine } from './services/ingestion/DataIngestionEngine.js';
await dataIngestionEngine.ingestAll();
📋 Hvad Scanner Systemet?
Lokale Filer
- Mapper: Documents, Downloads, Desktop
- Filtyper: .txt, .md, .pdf, .docx, .xlsx, .csv, .json
- Maks dybde: 3 niveauer
- Maks størrelse: 10MB per fil
- Ekskluderer: node_modules, .git, dist, build, $RECYCLE.BIN
Browser Historik
- Chrome:
AppData\Local\Google\Chrome\User Data\Default\History - Edge:
AppData\Local\Microsoft\Edge\User Data\Default\History - Antal: Sidste 1000 besøg
Outlook Email
- Format: JSON eksport
- Placering:
apps/backend/data/outlook-mails.json - Data: Subject, afsender, dato, preview, vigtighed
🔧 Outlook Email Eksport
For at bruge Outlook integration:
**Option 1: PowerS
hell Script**
# Eksporter emails til JSON
Add-Type -AssemblyName "Microsoft.Office.Interop.Outlook"
$outlook = New-Object -ComObject Outlook.Application
$namespace = $outlook.GetNamespace("MAPI")
$inbox = $namespace.GetDefaultFolder(6) # 6 = Inbox
$emails = @()
foreach ($mail in $inbox.Items | Select-Object -First 100) {
$emails += @{
id = $mail.EntryID
subject = $mail.Subject
sender = @{
name = $mail.SenderName
address = $mail.SenderEmailAddress
}
receivedDateTime = $mail.ReceivedTime
bodyPreview = $mail.Body.Substring(0, [Math]::Min(200, $mail.Body.Length))
importance = $mail.Importance
isRead = $mail.UnRead -eq $false
}
}
$emails | ConvertTo-Json | Out-File "apps/backend/data/outlook-mails.json"
Option 2: Graph API
# Hent emails via Microsoft Graph
curl -X GET "https://graph.microsoft.com/v1.0/me/messages" \
-H "Authorization: Bearer YOUR_TOKEN" \
> apps/backend/data/outlook-mails.json
📈 Næste Skridt (Autonom Fortsættelse)
Systemet vil nu autonomt:
- ✅ Scanne lokale filer - Documents, Downloads, Desktop
- ✅ Indlæse browser historik - Chrome/Edge
- ✅ Læse Outlook emails - Hvis JSON fil findes
- 🔄 Gemme i database - Næste fase
- 🔄 Enable semantisk søgning - Gør data søgbart
- 🔄 Aula integration - Når dependencies er klar
✅ Test Systemet
Kør en test ingestion:
npx tsx -e "
import { dataIngestionEngine } from './apps/backend/src/services/ingestion/DataIngestionEngine.js';
import { LocalFileScanner } from './apps/backend/src/services/ingestion/LocalFileScanner.js';
const scanner = new LocalFileScanner({
rootPaths: ['C:\\\\Users\\\\claus\\\\Desktop'],
extensions: ['.txt', '.md'],
maxDepth: 1
});
dataIngestionEngine.registerAdapter(scanner);
await dataIngestionEngine.ingestAll();
"
Status: ✅ Alle systemer klar
Total Sources: 3 aktive (Lokale Filer, Browser, Outlook)
Backend: ✅ Kører
Database: ✅ Initialiseret