widgettdc-api / docs /technical /DATA_INGESTION_GUIDE_DA.md
Kraft102's picture
fix: sql.js Docker/Alpine compatibility layer for PatternMemory and FailureMemory
5a81b95
# 🎯 Data Ingestion System - Final Status
**Dato:** 2025-11-24 kl. 15:26
**Status:** ✅ KLAR TIL BRUG
---
## 📊 Tilgængelige Data Kilder
| Data Kilde | Status | Beskrivelse |
|------------|--------|-------------|
| 📁 **Lokale Filer** | ✅ | Scannerdokumenter, Downloads, Desktop |
| 🌐 **Browser Historik** | ✅ | Chrome og Edge besøgshistorik |
| 📧 **Outlook Email** | ✅ | Læser fra JSON eksport |
| 📅 **Aula Kalender** | ⚠️ | Udkommenteret (mangler dependencies) |
| ☁️ **Google Drive** | 🚧 | Klar til implementation |
---
## 🚀 Sådan bruges systemet
### **Via MCP Tool (Anbefalet)**
```typescript
// Start fuld data indsamling
await mcpClient.callTool({
tool: 'ingestion.start',
payload: {}
});
// Tjek status
await mcpClient.callTool({
tool: 'ingestion.status',
payload: {}
});
```
### **Via REST API**
```bash
# Start ingestion
curl -X POST http://localhost:3001/api/mcp/route \
-H "Content-Type: application/json" \
-d '{"tool": "ingestion.start", "payload": {}}'
```
### **Programmatisk**
```typescript
import { dataIngestionEngine } from './services/ingestion/DataIngestionEngine.js';
await dataIngestionEngine.ingestAll();
```
---
## 📋 Hvad Scanner Systemet?
### **Lokale Filer**
- **Mapper:** Documents, Downloads, Desktop
- **Filtyper:** .txt, .md, .pdf, .docx, .xlsx, .csv, .json
- **Maks dybde:** 3 niveauer
- **Maks størrelse:** 10MB per fil
- **Ekskluderer:** node_modules, .git, dist, build, $RECYCLE.BIN
### **Browser Historik**
- **Chrome:** `AppData\Local\Google\Chrome\User Data\Default\History`
- **Edge:** `AppData\Local\Microsoft\Edge\User Data\Default\History`
- **Antal:** Sidste 1000 besøg
### **Outlook Email**
- **Format:** JSON eksport
- **Placering:** `apps/backend/data/outlook-mails.json`
- **Data:** Subject, afsender, dato, preview, vigtighed
---
## 🔧 Outlook Email Eksport
For at bruge Outlook integration:
### **Option 1: PowerS
hell Script**
```powershell
# Eksporter emails til JSON
Add-Type -AssemblyName "Microsoft.Office.Interop.Outlook"
$outlook = New-Object -ComObject Outlook.Application
$namespace = $outlook.GetNamespace("MAPI")
$inbox = $namespace.GetDefaultFolder(6) # 6 = Inbox
$emails = @()
foreach ($mail in $inbox.Items | Select-Object -First 100) {
$emails += @{
id = $mail.EntryID
subject = $mail.Subject
sender = @{
name = $mail.SenderName
address = $mail.SenderEmailAddress
}
receivedDateTime = $mail.ReceivedTime
bodyPreview = $mail.Body.Substring(0, [Math]::Min(200, $mail.Body.Length))
importance = $mail.Importance
isRead = $mail.UnRead -eq $false
}
}
$emails | ConvertTo-Json | Out-File "apps/backend/data/outlook-mails.json"
```
### **Option 2: Graph API**
```bash
# Hent emails via Microsoft Graph
curl -X GET "https://graph.microsoft.com/v1.0/me/messages" \
-H "Authorization: Bearer YOUR_TOKEN" \
> apps/backend/data/outlook-mails.json
```
---
## 📈 Næste Skridt (Autonom Fortsættelse)
Systemet vil nu autonomt:
1. ✅ **Scanne lokale filer** - Documents, Downloads, Desktop
2. ✅ **Indlæse browser historik** - Chrome/Edge
3. ✅ **Læse Outlook emails** - Hvis JSON fil findes
4. 🔄 **Gemme i database** - Næste fase
5. 🔄 **Enable semantisk søgning** - Gør data søgbart
6. 🔄 **Aula integration** - Når dependencies er klar
---
## ✅ Test Systemet
Kør en test ingestion:
```bash
npx tsx -e "
import { dataIngestionEngine } from './apps/backend/src/services/ingestion/DataIngestionEngine.js';
import { LocalFileScanner } from './apps/backend/src/services/ingestion/LocalFileScanner.js';
const scanner = new LocalFileScanner({
rootPaths: ['C:\\\\Users\\\\claus\\\\Desktop'],
extensions: ['.txt', '.md'],
maxDepth: 1
});
dataIngestionEngine.registerAdapter(scanner);
await dataIngestionEngine.ingestAll();
"
```
---
**Status:** ✅ Alle systemer klar
**Total Sources:** 3 aktive (Lokale Filer, Browser, Outlook)
**Backend:** ✅ Kører
**Database:** ✅ Initialiseret