Spaces:
Paused
Paused
File size: 4,068 Bytes
5a81b95 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 | # 🎯 Data Ingestion System - Final Status
**Dato:** 2025-11-24 kl. 15:26
**Status:** ✅ KLAR TIL BRUG
---
## 📊 Tilgængelige Data Kilder
| Data Kilde | Status | Beskrivelse |
|------------|--------|-------------|
| 📁 **Lokale Filer** | ✅ | Scannerdokumenter, Downloads, Desktop |
| 🌐 **Browser Historik** | ✅ | Chrome og Edge besøgshistorik |
| 📧 **Outlook Email** | ✅ | Læser fra JSON eksport |
| 📅 **Aula Kalender** | ⚠️ | Udkommenteret (mangler dependencies) |
| ☁️ **Google Drive** | 🚧 | Klar til implementation |
---
## 🚀 Sådan bruges systemet
### **Via MCP Tool (Anbefalet)**
```typescript
// Start fuld data indsamling
await mcpClient.callTool({
tool: 'ingestion.start',
payload: {}
});
// Tjek status
await mcpClient.callTool({
tool: 'ingestion.status',
payload: {}
});
```
### **Via REST API**
```bash
# Start ingestion
curl -X POST http://localhost:3001/api/mcp/route \
-H "Content-Type: application/json" \
-d '{"tool": "ingestion.start", "payload": {}}'
```
### **Programmatisk**
```typescript
import { dataIngestionEngine } from './services/ingestion/DataIngestionEngine.js';
await dataIngestionEngine.ingestAll();
```
---
## 📋 Hvad Scanner Systemet?
### **Lokale Filer**
- **Mapper:** Documents, Downloads, Desktop
- **Filtyper:** .txt, .md, .pdf, .docx, .xlsx, .csv, .json
- **Maks dybde:** 3 niveauer
- **Maks størrelse:** 10MB per fil
- **Ekskluderer:** node_modules, .git, dist, build, $RECYCLE.BIN
### **Browser Historik**
- **Chrome:** `AppData\Local\Google\Chrome\User Data\Default\History`
- **Edge:** `AppData\Local\Microsoft\Edge\User Data\Default\History`
- **Antal:** Sidste 1000 besøg
### **Outlook Email**
- **Format:** JSON eksport
- **Placering:** `apps/backend/data/outlook-mails.json`
- **Data:** Subject, afsender, dato, preview, vigtighed
---
## 🔧 Outlook Email Eksport
For at bruge Outlook integration:
### **Option 1: PowerS
hell Script**
```powershell
# Eksporter emails til JSON
Add-Type -AssemblyName "Microsoft.Office.Interop.Outlook"
$outlook = New-Object -ComObject Outlook.Application
$namespace = $outlook.GetNamespace("MAPI")
$inbox = $namespace.GetDefaultFolder(6) # 6 = Inbox
$emails = @()
foreach ($mail in $inbox.Items | Select-Object -First 100) {
$emails += @{
id = $mail.EntryID
subject = $mail.Subject
sender = @{
name = $mail.SenderName
address = $mail.SenderEmailAddress
}
receivedDateTime = $mail.ReceivedTime
bodyPreview = $mail.Body.Substring(0, [Math]::Min(200, $mail.Body.Length))
importance = $mail.Importance
isRead = $mail.UnRead -eq $false
}
}
$emails | ConvertTo-Json | Out-File "apps/backend/data/outlook-mails.json"
```
### **Option 2: Graph API**
```bash
# Hent emails via Microsoft Graph
curl -X GET "https://graph.microsoft.com/v1.0/me/messages" \
-H "Authorization: Bearer YOUR_TOKEN" \
> apps/backend/data/outlook-mails.json
```
---
## 📈 Næste Skridt (Autonom Fortsættelse)
Systemet vil nu autonomt:
1. ✅ **Scanne lokale filer** - Documents, Downloads, Desktop
2. ✅ **Indlæse browser historik** - Chrome/Edge
3. ✅ **Læse Outlook emails** - Hvis JSON fil findes
4. 🔄 **Gemme i database** - Næste fase
5. 🔄 **Enable semantisk søgning** - Gør data søgbart
6. 🔄 **Aula integration** - Når dependencies er klar
---
## ✅ Test Systemet
Kør en test ingestion:
```bash
npx tsx -e "
import { dataIngestionEngine } from './apps/backend/src/services/ingestion/DataIngestionEngine.js';
import { LocalFileScanner } from './apps/backend/src/services/ingestion/LocalFileScanner.js';
const scanner = new LocalFileScanner({
rootPaths: ['C:\\\\Users\\\\claus\\\\Desktop'],
extensions: ['.txt', '.md'],
maxDepth: 1
});
dataIngestionEngine.registerAdapter(scanner);
await dataIngestionEngine.ingestAll();
"
```
---
**Status:** ✅ Alle systemer klar
**Total Sources:** 3 aktive (Lokale Filer, Browser, Outlook)
**Backend:** ✅ Kører
**Database:** ✅ Initialiseret
|