File size: 4,068 Bytes
5a81b95
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
# 🎯 Data Ingestion System - Final Status

**Dato:** 2025-11-24 kl. 15:26  
**Status:** ✅ KLAR TIL BRUG

---

## 📊 Tilgængelige Data Kilder

| Data Kilde | Status | Beskrivelse |
|------------|--------|-------------|
| 📁 **Lokale Filer** | ✅ | Scannerdokumenter, Downloads, Desktop |
| 🌐 **Browser Historik** | ✅ | Chrome og Edge besøgshistorik |
| 📧 **Outlook Email** | ✅ | Læser fra JSON eksport |
| 📅 **Aula Kalender** | ⚠️ | Udkommenteret (mangler dependencies) |
| ☁️ **Google Drive** | 🚧 | Klar til implementation |

---

## 🚀 Sådan bruges systemet

### **Via MCP Tool (Anbefalet)**

```typescript
// Start fuld data indsamling
await mcpClient.callTool({
  tool: 'ingestion.start',
  payload: {}
});

// Tjek status
await mcpClient.callTool({
  tool: 'ingestion.status',
  payload: {}
});
```

### **Via REST API**

```bash
# Start ingestion
curl -X POST http://localhost:3001/api/mcp/route \
  -H "Content-Type: application/json" \
  -d '{"tool": "ingestion.start", "payload": {}}'
```

### **Programmatisk**

```typescript
import { dataIngestionEngine } from './services/ingestion/DataIngestionEngine.js';

await dataIngestionEngine.ingestAll();
```

---

## 📋 Hvad Scanner Systemet?

### **Lokale Filer**
- **Mapper:** Documents, Downloads, Desktop
- **Filtyper:** .txt, .md, .pdf, .docx, .xlsx, .csv, .json
- **Maks dybde:** 3 niveauer
- **Maks størrelse:** 10MB per fil
- **Ekskluderer:** node_modules, .git, dist, build, $RECYCLE.BIN

### **Browser Historik**
- **Chrome:** `AppData\Local\Google\Chrome\User Data\Default\History`
- **Edge:** `AppData\Local\Microsoft\Edge\User Data\Default\History`
- **Antal:** Sidste 1000 besøg

### **Outlook Email**
- **Format:** JSON eksport
- **Placering:** `apps/backend/data/outlook-mails.json`
- **Data:** Subject, afsender, dato, preview, vigtighed

---

## 🔧 Outlook Email Eksport

For at bruge Outlook integration:

### **Option 1: PowerS

hell Script**
```powershell
# Eksporter emails til JSON
Add-Type -AssemblyName "Microsoft.Office.Interop.Outlook"
$outlook = New-Object -ComObject Outlook.Application
$namespace = $outlook.GetNamespace("MAPI")
$inbox = $namespace.GetDefaultFolder(6) # 6 = Inbox

$emails = @()
foreach ($mail in $inbox.Items | Select-Object -First 100) {
    $emails += @{
        id = $mail.EntryID
        subject = $mail.Subject
        sender = @{
            name = $mail.SenderName
            address = $mail.SenderEmailAddress
        }
        receivedDateTime = $mail.ReceivedTime
        bodyPreview = $mail.Body.Substring(0, [Math]::Min(200, $mail.Body.Length))
        importance = $mail.Importance
        isRead = $mail.UnRead -eq $false
    }
}

$emails | ConvertTo-Json | Out-File "apps/backend/data/outlook-mails.json"
```

### **Option 2: Graph API**
```bash
# Hent emails via Microsoft Graph
curl -X GET "https://graph.microsoft.com/v1.0/me/messages" \
  -H "Authorization: Bearer YOUR_TOKEN" \
  > apps/backend/data/outlook-mails.json
```

---

## 📈 Næste Skridt (Autonom Fortsættelse)

Systemet vil nu autonomt:

1. ✅ **Scanne lokale filer** - Documents, Downloads, Desktop
2. ✅ **Indlæse browser historik** - Chrome/Edge
3. ✅ **Læse Outlook emails** - Hvis JSON fil findes
4. 🔄 **Gemme i database** - Næste fase
5. 🔄 **Enable semantisk søgning** - Gør data søgbart
6. 🔄 **Aula integration** - Når dependencies er klar

---

## ✅ Test Systemet

Kør en test ingestion:

```bash
npx tsx -e "
import { dataIngestionEngine } from './apps/backend/src/services/ingestion/DataIngestionEngine.js';
import { LocalFileScanner } from './apps/backend/src/services/ingestion/LocalFileScanner.js';

const scanner = new LocalFileScanner({
    rootPaths: ['C:\\\\Users\\\\claus\\\\Desktop'],
    extensions: ['.txt', '.md'],
    maxDepth: 1
});

dataIngestionEngine.registerAdapter(scanner);
await dataIngestionEngine.ingestAll();
"
```

---

**Status:** ✅ Alle systemer klar  
**Total Sources:** 3 aktive (Lokale Filer, Browser, Outlook)  
**Backend:** ✅ Kører  
**Database:** ✅ Initialiseret