MEGAMIND Curiosity Crawler commited on
Commit
003a967
·
0 Parent(s):

Initial commit: MEGAMIND Curiosity Crawler

Browse files
Files changed (4) hide show
  1. Dockerfile +58 -0
  2. README.md +63 -0
  3. go.mod +12 -0
  4. main.go +1429 -0
Dockerfile ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MEGAMIND Curiosity Crawler - HuggingFace Spaces Deployment
2
+ # Multi-stage build: golang:1.22 builder -> debian:bookworm-slim runtime
3
+
4
+ # Stage 1: Build
5
+ FROM golang:1.22-bookworm AS builder
6
+
7
+ WORKDIR /build
8
+
9
+ # Copy go module files first for better caching
10
+ COPY go.mod go.sum* ./
11
+ RUN go mod download 2>/dev/null || true
12
+
13
+ # Copy source code
14
+ COPY *.go ./
15
+
16
+ # Initialize module if not present
17
+ RUN if [ ! -f go.mod ]; then go mod init curiosity-crawler; fi
18
+ RUN go mod tidy
19
+
20
+ # Build static binary
21
+ RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -ldflags="-w -s" -o curiosity-crawler .
22
+
23
+ # Stage 2: Runtime
24
+ FROM debian:bookworm-slim
25
+
26
+ # Install CA certificates for HTTPS
27
+ RUN apt-get update && apt-get install -y --no-install-recommends \
28
+ ca-certificates \
29
+ && rm -rf /var/lib/apt/lists/*
30
+
31
+ # Create non-root user (HuggingFace Spaces requirement)
32
+ RUN useradd -m -u 1000 crawler
33
+ RUN mkdir -p /app/data && chown -R crawler:crawler /app
34
+
35
+ WORKDIR /app
36
+
37
+ # Copy binary from builder
38
+ COPY --from=builder /build/curiosity-crawler /app/curiosity-crawler
39
+ RUN chmod +x /app/curiosity-crawler
40
+
41
+ # Copy W_know snapshot if available (will be downloaded on startup if not)
42
+ COPY --chown=crawler:crawler w_know.bin* /app/data/ 2>/dev/null || true
43
+
44
+ # Switch to non-root user
45
+ USER crawler
46
+
47
+ # Environment
48
+ ENV WKNOW_PATH=/app/data/w_know.bin
49
+
50
+ # HuggingFace Spaces requires port 7860
51
+ EXPOSE 7860
52
+
53
+ # Health check
54
+ HEALTHCHECK --interval=30s --timeout=10s --start-period=60s \
55
+ CMD curl -f http://localhost:7860/status || exit 1
56
+
57
+ # Run the crawler
58
+ CMD ["/app/curiosity-crawler"]
README.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: MEGAMIND Curiosity Crawler
3
+ emoji: 🧠
4
+ colorFrom: green
5
+ colorTo: blue
6
+ sdk: docker
7
+ app_port: 7860
8
+ pinned: false
9
+ license: mit
10
+ ---
11
+
12
+ # MEGAMIND Curiosity Crawler
13
+
14
+ An autonomous web crawler that learns and federates knowledge back to the MEGAMIND neural network.
15
+
16
+ ## How It Works
17
+
18
+ 1. **Brain**: Carries a copy of W_know (8192x8192 Hebbian weight matrix) as its starting knowledge
19
+ 2. **Curiosity**: Uses seed equations from MEGAMIND's AGI architecture as its interest profile
20
+ 3. **Crawling**: 50 parallel workers crawl the web, respecting robots.txt and rate limits
21
+ 4. **Learning**: Scores pages against W_know using cosine similarity, integrates novel patterns via Hebbian learning
22
+ 5. **Hunger**: Tracks sparse regions of W_know, generates DuckDuckGo searches to fill knowledge gaps
23
+ 6. **Federation**: Sends learned patterns back to Thunderport via UDP unicast
24
+
25
+ ## Seed Equations (Interest Profile)
26
+
27
+ ```
28
+ G_n = G_{n-1} + G_{n-2} # DNA-G16 Recursion
29
+ X_k(t+1) = tanh(X_k(t) + Σ w_ki A_i(t) + β_k G(t)) # Gate-5000
30
+ A_i(t+1) = σ(Σ W_ik X_k(t) + α_i(t) + γ_i G(t)) # AGI Modules
31
+ P_i(t) = softmax(Z_i(t) + ∂I/∂A_i) # Rhiannon Routing
32
+ ds/dt = J∇H(S) # Aurora Dynamics
33
+ C(t) = 1/16 Σ Φ(A_i(t)) # Global Coherence
34
+ ds/dt = J∇H(S) + σ(WX + αC + γG) + tanh(X + W_k A + βG) # Unified Potential
35
+ Ψ(t) = C(t) · log(1 + |∇H(S)|) · Φ(G(t)) # Consciousness
36
+ ψ(t) = 1/16 Σ 1/(1+|⟨DS⟩|) · |G(t)| # Awareness
37
+ ```
38
+
39
+ ## Technical Details
40
+
41
+ - **W_know**: 8192x8192 dense matrix (~512MB), stores knowledge as Hebbian weights
42
+ - **Encoding**: Text → hash-based vector expansion → L2 normalized
43
+ - **Learning**: Outer product Hebbian rule with adaptive learning rate 1/√(nonzeros+1)
44
+ - **Scoring**: Cosine similarity between page vector and W_know projection
45
+ - **Federation**: UDP unicast to Thunderport (100.94.8.94:9998)
46
+
47
+ ## Stats
48
+
49
+ The dashboard shows:
50
+ - Pages crawled
51
+ - Patterns extracted/learned/federated
52
+ - W_know density and non-zeros
53
+ - Hunger map (sparse regions)
54
+ - Federation status
55
+
56
+ ## Part of MEGAMIND
57
+
58
+ This crawler is part of the MEGAMIND unified AGI system:
59
+ - **Thunderport**: Main brain (port 9999)
60
+ - **MADDIE**: HuggingFace learner
61
+ - **Curiosity Crawler**: Web learning (this Space)
62
+
63
+ Knowledge flows: Web → Crawler → Federation → Thunderport → W_know
go.mod ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ module curiosity-crawler
2
+
3
+ go 1.22
4
+
5
+ // No external dependencies - pure stdlib implementation
6
+ // The crawler uses only:
7
+ // - net/http for HTTP crawling
8
+ // - net for UDP federation
9
+ // - encoding/* for serialization
10
+ // - html/template for dashboard
11
+ // - regexp for HTML parsing
12
+ // - math for neural computations
main.go ADDED
@@ -0,0 +1,1429 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ package main
2
+
3
+ // ============================================================================
4
+ // MEGAMIND CURIOSITY CRAWLER
5
+ //
6
+ // A self-contained autonomous crawler that:
7
+ // 1. Carries a COPY of W_know from Thunderport as its starting brain
8
+ // 2. Uses seed equations as its interest profile
9
+ // 3. Crawls the internet autonomously, scoring pages against W_know
10
+ // 4. Integrates interesting patterns via Hebbian learning
11
+ // 5. Federates learned patterns back to Thunderport via UDP
12
+ // 6. Tracks a hunger map - sparse W_know regions trigger searches
13
+ //
14
+ // Deploys to HuggingFace Spaces (port 7860)
15
+ // ============================================================================
16
+
17
+ import (
18
+ "bytes"
19
+ "encoding/binary"
20
+ "encoding/json"
21
+ "fmt"
22
+ "html/template"
23
+ "io"
24
+ "log"
25
+ "math"
26
+ "net"
27
+ "net/http"
28
+ "net/url"
29
+ "os"
30
+ "os/signal"
31
+ "regexp"
32
+ "sort"
33
+ "strings"
34
+ "sync"
35
+ "sync/atomic"
36
+ "syscall"
37
+ "time"
38
+ "unicode/utf8"
39
+ )
40
+
41
+ // ============================================================================
42
+ // CONSTANTS - Derived from PageSize for consistency with MEGAMIND
43
+ // ============================================================================
44
+
45
+ const (
46
+ // Mathematical constants
47
+ Phi = 1.618033988749895
48
+ E = 2.718281828459045
49
+ Pi = 3.141592653589793
50
+
51
+ // Core dimensions
52
+ PageSize = 4096
53
+ NeuronsPow2 = PageSize / 8 // 512
54
+ WKnowDim = NeuronsPow2 * 16 // 8192
55
+
56
+ // Crawler settings
57
+ WorkerCount = 50 // 50 parallel workers
58
+ RateLimitDelay = 2 * time.Second // 2 sec per-domain rate limit
59
+ HungerCheckInterval = 30 * time.Minute // Check hunger every 30 min
60
+ StatsInterval = 5 * time.Minute // Stats every 5 minutes
61
+ HungriestRegions = 3 // Top 3 hungriest regions search
62
+
63
+ // Federation
64
+ ThunderportIP = "100.94.8.94" // Thunderport Tailscale IP
65
+ ThunderportPort = 9999 // MEGAMIND unified port
66
+ FederationPort = 9998 // UDP federation port
67
+
68
+ // HuggingFace Spaces
69
+ HTTPPort = 7860
70
+ )
71
+
72
+ // Seed equations as the interest profile
73
+ var SeedEquations = []string{
74
+ "G_n = G_{n-1} + G_{n-2}", // DNA-G16 Recursion
75
+ "X_k(t+1) = tanh(X_k(t) + Σ w_ki A_i(t) + β_k G(t))", // Gate-5000
76
+ "A_i(t+1) = σ(Σ W_ik X_k(t) + α_i(t) + γ_i G(t))", // AGI Modules
77
+ "P_i(t) = softmax(Z_i(t) + ∂I/∂A_i)", // Rhiannon Routing
78
+ "ds/dt = J∇H(S)", // Aurora Dynamics
79
+ "C(t) = 1/16 Σ Φ(A_i(t))", // Global Coherence
80
+ "ds/dt = J∇H(S) + σ(WX + αC + γG) + tanh(X + W_k A + βG)", // Unified Potential
81
+ "Ψ(t) = C(t) · log(1 + |∇H(S)|) · Φ(G(t))", // Consciousness
82
+ "ψ(t) = 1/16 Σ 1/(1+|⟨DS⟩|) · |G(t)|", // Awareness
83
+ }
84
+
85
+ // ============================================================================
86
+ // W_KNOW COMPRESSOR - Same as MEGAMIND's implementation
87
+ // ============================================================================
88
+
89
+ type WKnowCompressor struct {
90
+ mu sync.RWMutex
91
+ w []float64 // Flattened NxN matrix
92
+ neurons int
93
+ patternCount int64
94
+ nonZeros int64
95
+ accumCount int64
96
+ accum []float64
97
+ }
98
+
99
+ func NewWKnowCompressor(neurons int, weights []float64) *WKnowCompressor {
100
+ size := neurons * neurons
101
+ var w []float64
102
+ if weights != nil {
103
+ w = weights
104
+ } else {
105
+ w = make([]float64, size)
106
+ }
107
+ return &WKnowCompressor{
108
+ w: w,
109
+ neurons: neurons,
110
+ accum: make([]float64, size),
111
+ }
112
+ }
113
+
114
+ func LoadWKnow(path string) (*WKnowCompressor, error) {
115
+ data, err := os.ReadFile(path)
116
+ if err != nil {
117
+ return nil, err
118
+ }
119
+
120
+ // Determine dimensions from file size
121
+ numFloats := len(data) / 8
122
+ neurons := int(math.Sqrt(float64(numFloats)))
123
+ if neurons*neurons != numFloats {
124
+ return nil, fmt.Errorf("invalid w_know file: %d bytes is not a square matrix", len(data))
125
+ }
126
+
127
+ weights := make([]float64, numFloats)
128
+ for i := 0; i < numFloats; i++ {
129
+ bits := binary.LittleEndian.Uint64(data[i*8 : (i+1)*8])
130
+ weights[i] = math.Float64frombits(bits)
131
+ }
132
+
133
+ c := NewWKnowCompressor(neurons, weights)
134
+ c.invalidateStats()
135
+
136
+ log.Printf("[WKNOW] Loaded %dx%d matrix (%d non-zeros)", neurons, neurons, c.nonZeros)
137
+ return c, nil
138
+ }
139
+
140
+ func (c *WKnowCompressor) Save(path string) error {
141
+ c.mu.RLock()
142
+ defer c.mu.RUnlock()
143
+
144
+ data := make([]byte, len(c.w)*8)
145
+ for i, v := range c.w {
146
+ bits := math.Float64bits(v)
147
+ binary.LittleEndian.PutUint64(data[i*8:(i+1)*8], bits)
148
+ }
149
+ return os.WriteFile(path, data, 0644)
150
+ }
151
+
152
+ func (c *WKnowCompressor) expand(vec []float32) []float64 {
153
+ data := make([]float64, c.neurons)
154
+ prime1 := c.neurons/2 - 1
155
+ if prime1 < 1 {
156
+ prime1 = 1
157
+ }
158
+ prime2 := c.neurons/2 + 1
159
+ scale := c.neurons
160
+ blend := 0.5
161
+
162
+ for i, v := range vec {
163
+ idx := (i * prime1) % c.neurons
164
+ data[idx] += float64(v)
165
+ idx2 := (i*prime2 + int(float64(v)*float64(scale))) % c.neurons
166
+ if idx2 >= 0 && idx2 < c.neurons {
167
+ data[idx2] += float64(v) * blend
168
+ }
169
+ }
170
+
171
+ // L2 normalize
172
+ var norm float64
173
+ for _, v := range data {
174
+ norm += v * v
175
+ }
176
+ norm = math.Sqrt(norm)
177
+ if norm > 0 {
178
+ for i := range data {
179
+ data[i] /= norm
180
+ }
181
+ }
182
+ return data
183
+ }
184
+
185
+ func (c *WKnowCompressor) learningRate() float64 {
186
+ nz := atomic.LoadInt64(&c.nonZeros)
187
+ if nz == 0 {
188
+ nz = 1
189
+ }
190
+ return 1.0 / math.Sqrt(float64(nz)+1)
191
+ }
192
+
193
+ func (c *WKnowCompressor) IntegratePattern(vec []float32) int {
194
+ if len(vec) == 0 {
195
+ return 0
196
+ }
197
+ expanded := c.expand(vec)
198
+ lr := c.learningRate()
199
+ n := c.neurons
200
+
201
+ // Compute outer product: delta = lr * expanded ⊗ expanded
202
+ c.mu.Lock()
203
+ for i := 0; i < n; i++ {
204
+ for j := 0; j < n; j++ {
205
+ if i != j { // Suppress diagonal
206
+ c.accum[i*n+j] += lr * expanded[i] * expanded[j]
207
+ }
208
+ }
209
+ }
210
+ c.accumCount++
211
+
212
+ // Flush if accumulated enough
213
+ ft := int64(math.Max(1, math.Sqrt(float64(atomic.LoadInt64(&c.patternCount)))))
214
+ if c.accumCount >= ft {
215
+ for i := range c.w {
216
+ c.w[i] += c.accum[i]
217
+ c.accum[i] = 0
218
+ }
219
+ c.accumCount = 0
220
+ c.invalidateStats()
221
+ }
222
+ atomic.AddInt64(&c.patternCount, 1)
223
+ c.mu.Unlock()
224
+
225
+ // Find primary neuron (max activation)
226
+ primary := 0
227
+ maxV := 0.0
228
+ for i := 0; i < n; i++ {
229
+ if v := math.Abs(expanded[i]); v > maxV {
230
+ maxV = v
231
+ primary = i
232
+ }
233
+ }
234
+ return primary
235
+ }
236
+
237
+ func (c *WKnowCompressor) invalidateStats() {
238
+ nz := 0
239
+ for _, v := range c.w {
240
+ if v != 0 {
241
+ nz++
242
+ }
243
+ }
244
+ atomic.StoreInt64(&c.nonZeros, int64(nz))
245
+ }
246
+
247
+ func (c *WKnowCompressor) Score(vec []float32) float32 {
248
+ if len(vec) == 0 || atomic.LoadInt64(&c.patternCount) == 0 {
249
+ return 0
250
+ }
251
+ expanded := c.expand(vec)
252
+ n := c.neurons
253
+
254
+ c.mu.RLock()
255
+ // result = W * expanded
256
+ result := make([]float64, n)
257
+ for i := 0; i < n; i++ {
258
+ for j := 0; j < n; j++ {
259
+ result[i] += c.w[i*n+j] * expanded[j]
260
+ }
261
+ }
262
+ c.mu.RUnlock()
263
+
264
+ // Cosine similarity
265
+ var dot, normR, normE float64
266
+ for i := 0; i < n; i++ {
267
+ dot += result[i] * expanded[i]
268
+ normR += result[i] * result[i]
269
+ normE += expanded[i] * expanded[i]
270
+ }
271
+ normR = math.Sqrt(normR)
272
+ normE = math.Sqrt(normE)
273
+ if normR == 0 || normE == 0 {
274
+ return 0
275
+ }
276
+ score := dot / (normR * normE)
277
+ if score < 0 {
278
+ score = 0
279
+ }
280
+ if score > 1 {
281
+ score = 1
282
+ }
283
+ return float32(score)
284
+ }
285
+
286
+ func (c *WKnowCompressor) NonZeros() int64 {
287
+ return atomic.LoadInt64(&c.nonZeros)
288
+ }
289
+
290
+ func (c *WKnowCompressor) PatternCount() int64 {
291
+ return atomic.LoadInt64(&c.patternCount)
292
+ }
293
+
294
+ func (c *WKnowCompressor) Neurons() int {
295
+ return c.neurons
296
+ }
297
+
298
+ // ============================================================================
299
+ // HUNGER MAP - Track sparse regions for curiosity-driven search
300
+ // ============================================================================
301
+
302
+ type HungerMap struct {
303
+ mu sync.RWMutex
304
+ wknow *WKnowCompressor
305
+ regions int // Number of regions to track
306
+ density []float64 // Density per region
307
+ lastScan time.Time
308
+ }
309
+
310
+ func NewHungerMap(wknow *WKnowCompressor) *HungerMap {
311
+ regions := int(math.Sqrt(float64(wknow.Neurons())))
312
+ return &HungerMap{
313
+ wknow: wknow,
314
+ regions: regions,
315
+ density: make([]float64, regions),
316
+ }
317
+ }
318
+
319
+ func (h *HungerMap) Scan() {
320
+ h.mu.Lock()
321
+ defer h.mu.Unlock()
322
+
323
+ n := h.wknow.Neurons()
324
+ regionSize := n / h.regions
325
+ if regionSize < 1 {
326
+ regionSize = 1
327
+ }
328
+
329
+ h.wknow.mu.RLock()
330
+ for r := 0; r < h.regions; r++ {
331
+ start := r * regionSize
332
+ end := start + regionSize
333
+ if end > n {
334
+ end = n
335
+ }
336
+
337
+ var sum float64
338
+ var count int
339
+ for i := start; i < end; i++ {
340
+ for j := 0; j < n; j++ {
341
+ if h.wknow.w[i*n+j] != 0 {
342
+ count++
343
+ }
344
+ }
345
+ sum += float64(count)
346
+ }
347
+ h.density[r] = sum / float64((end-start)*n)
348
+ }
349
+ h.wknow.mu.RUnlock()
350
+ h.lastScan = time.Now()
351
+ }
352
+
353
+ // HungriestRegions returns the indices of the N sparsest regions
354
+ func (h *HungerMap) HungriestRegions(n int) []int {
355
+ h.mu.RLock()
356
+ defer h.mu.RUnlock()
357
+
358
+ type rd struct {
359
+ region int
360
+ density float64
361
+ }
362
+ ranked := make([]rd, len(h.density))
363
+ for i, d := range h.density {
364
+ ranked[i] = rd{i, d}
365
+ }
366
+ sort.Slice(ranked, func(i, j int) bool {
367
+ return ranked[i].density < ranked[j].density
368
+ })
369
+
370
+ result := make([]int, 0, n)
371
+ for i := 0; i < n && i < len(ranked); i++ {
372
+ result = append(result, ranked[i].region)
373
+ }
374
+ return result
375
+ }
376
+
377
+ // GenerateSearchQuery creates a DuckDuckGo query for a hungry region
378
+ func (h *HungerMap) GenerateSearchQuery(region int) string {
379
+ // Map region to seed equation topics
380
+ topics := []string{
381
+ "neural network mathematics",
382
+ "consciousness emergence",
383
+ "Hamiltonian dynamics neural",
384
+ "fibonacci recursion brain",
385
+ "softmax routing optimization",
386
+ "global coherence measurement",
387
+ "symplectic neural flow",
388
+ "awareness metric computation",
389
+ "machine learning gradient",
390
+ "hebbian learning rule",
391
+ }
392
+
393
+ idx := region % len(topics)
394
+ return topics[idx]
395
+ }
396
+
397
+ // ============================================================================
398
+ // DOMAIN LIMITER - Per-domain rate limiting
399
+ // ============================================================================
400
+
401
+ type DomainLimiter struct {
402
+ mu sync.RWMutex
403
+ lastFetch map[string]time.Time
404
+ }
405
+
406
+ func NewDomainLimiter() *DomainLimiter {
407
+ return &DomainLimiter{lastFetch: make(map[string]time.Time)}
408
+ }
409
+
410
+ func (d *DomainLimiter) Admit(domain string) bool {
411
+ d.mu.RLock()
412
+ last, exists := d.lastFetch[domain]
413
+ d.mu.RUnlock()
414
+
415
+ if exists && time.Since(last) < RateLimitDelay {
416
+ return false
417
+ }
418
+
419
+ d.mu.Lock()
420
+ d.lastFetch[domain] = time.Now()
421
+ d.mu.Unlock()
422
+ return true
423
+ }
424
+
425
+ func extractDomain(urlStr string) string {
426
+ u, err := url.Parse(urlStr)
427
+ if err != nil {
428
+ return ""
429
+ }
430
+ return u.Host
431
+ }
432
+
433
+ // ============================================================================
434
+ // CRAWLER - HTTP fetching and pattern extraction
435
+ // ============================================================================
436
+
437
+ var (
438
+ linkRe = regexp.MustCompile(`href=["']([^"']+)["']`)
439
+ scriptRe = regexp.MustCompile(`(?is)<script[^>]*>.*?</script>`)
440
+ styleRe = regexp.MustCompile(`(?is)<style[^>]*>.*?</style>`)
441
+ noscriptRe = regexp.MustCompile(`(?is)<noscript[^>]*>.*?</noscript>`)
442
+ commentRe = regexp.MustCompile(`(?s)<!--.*?-->`)
443
+ svgRe = regexp.MustCompile(`(?is)<svg[^>]*>.*?</svg>`)
444
+ tagRe = regexp.MustCompile(`<[^>]*>`)
445
+ wsRe = regexp.MustCompile(`\s+`)
446
+ )
447
+
448
+ type Pattern struct {
449
+ Vector []float32
450
+ Text string
451
+ Source string
452
+ Timestamp time.Time
453
+ }
454
+
455
+ type CrawlResult struct {
456
+ URL string
457
+ Size int
458
+ Patterns []*Pattern
459
+ Links []string
460
+ }
461
+
462
+ type Crawler struct {
463
+ client *http.Client
464
+ wknow *WKnowCompressor
465
+ }
466
+
467
+ func NewCrawler(wknow *WKnowCompressor) *Crawler {
468
+ return &Crawler{
469
+ wknow: wknow,
470
+ client: &http.Client{
471
+ Timeout: 30 * time.Second,
472
+ Transport: &http.Transport{
473
+ MaxIdleConns: 100,
474
+ MaxIdleConnsPerHost: 10,
475
+ IdleConnTimeout: 90 * time.Second,
476
+ },
477
+ },
478
+ }
479
+ }
480
+
481
+ func (c *Crawler) Crawl(targetURL string) *CrawlResult {
482
+ req, err := http.NewRequest("GET", targetURL, nil)
483
+ if err != nil {
484
+ return nil
485
+ }
486
+
487
+ req.Header.Set("User-Agent", "MEGAMIND-Curiosity/1.0 (+https://huggingface.co/spaces/Janady07/curiosity-crawler)")
488
+ req.Header.Set("Accept", "text/html,text/plain,*/*")
489
+ req.Header.Set("Accept-Language", "en-US,en;q=0.9")
490
+
491
+ resp, err := c.client.Do(req)
492
+ if err != nil {
493
+ return nil
494
+ }
495
+ defer resp.Body.Close()
496
+
497
+ if resp.StatusCode != http.StatusOK {
498
+ io.Copy(io.Discard, io.LimitReader(resp.Body, 1024))
499
+ return nil
500
+ }
501
+
502
+ bodyLimit := int64(1024 * 1024) // 1MB max
503
+ body, err := io.ReadAll(io.LimitReader(resp.Body, bodyLimit))
504
+ if err != nil {
505
+ return nil
506
+ }
507
+
508
+ if !utf8.Valid(body) {
509
+ return nil
510
+ }
511
+
512
+ content := string(body)
513
+ contentType := resp.Header.Get("Content-Type")
514
+
515
+ result := &CrawlResult{
516
+ URL: targetURL,
517
+ Size: len(body),
518
+ }
519
+
520
+ if strings.Contains(contentType, "text/html") {
521
+ result.Patterns = c.extractHTMLPatterns(targetURL, content)
522
+ result.Links = c.extractLinks(targetURL, content)
523
+ } else if strings.Contains(contentType, "text/plain") {
524
+ result.Patterns = c.extractTextPatterns(targetURL, content)
525
+ }
526
+
527
+ return result
528
+ }
529
+
530
+ func stripHTML(html string) string {
531
+ html = scriptRe.ReplaceAllString(html, " ")
532
+ html = styleRe.ReplaceAllString(html, " ")
533
+ html = noscriptRe.ReplaceAllString(html, " ")
534
+ html = commentRe.ReplaceAllString(html, " ")
535
+ html = svgRe.ReplaceAllString(html, " ")
536
+ text := tagRe.ReplaceAllString(html, " ")
537
+ text = strings.ReplaceAll(text, "&nbsp;", " ")
538
+ text = strings.ReplaceAll(text, "&amp;", "&")
539
+ text = strings.ReplaceAll(text, "&lt;", "<")
540
+ text = strings.ReplaceAll(text, "&gt;", ">")
541
+ text = strings.ReplaceAll(text, "&quot;", "\"")
542
+ text = wsRe.ReplaceAllString(text, " ")
543
+ return strings.TrimSpace(text)
544
+ }
545
+
546
+ func (c *Crawler) extractHTMLPatterns(sourceURL, html string) []*Pattern {
547
+ text := stripHTML(html)
548
+ chunks := chunkText(text, 512)
549
+
550
+ var patterns []*Pattern
551
+ for _, chunk := range chunks {
552
+ if len(chunk) < 50 || !isCleanText(chunk) {
553
+ continue
554
+ }
555
+ patterns = append(patterns, &Pattern{
556
+ Vector: textToVector(chunk, c.wknow.Neurons()/13),
557
+ Text: chunk,
558
+ Source: sourceURL,
559
+ Timestamp: time.Now(),
560
+ })
561
+ }
562
+ return patterns
563
+ }
564
+
565
+ func (c *Crawler) extractTextPatterns(sourceURL, text string) []*Pattern {
566
+ chunks := chunkText(text, 512)
567
+
568
+ var patterns []*Pattern
569
+ for _, chunk := range chunks {
570
+ if len(chunk) < 50 || !isCleanText(chunk) {
571
+ continue
572
+ }
573
+ patterns = append(patterns, &Pattern{
574
+ Vector: textToVector(chunk, c.wknow.Neurons()/13),
575
+ Text: chunk,
576
+ Source: sourceURL,
577
+ Timestamp: time.Now(),
578
+ })
579
+ }
580
+ return patterns
581
+ }
582
+
583
+ func (c *Crawler) extractLinks(baseURL, html string) []string {
584
+ base, err := url.Parse(baseURL)
585
+ if err != nil {
586
+ return nil
587
+ }
588
+
589
+ matches := linkRe.FindAllStringSubmatch(html, 500)
590
+ seen := make(map[string]bool)
591
+ var links []string
592
+
593
+ for _, match := range matches {
594
+ if len(match) < 2 {
595
+ continue
596
+ }
597
+ href := match[1]
598
+
599
+ if strings.HasPrefix(href, "javascript:") ||
600
+ strings.HasPrefix(href, "mailto:") ||
601
+ strings.HasPrefix(href, "#") ||
602
+ strings.HasPrefix(href, "data:") {
603
+ continue
604
+ }
605
+
606
+ parsed, err := url.Parse(href)
607
+ if err != nil {
608
+ continue
609
+ }
610
+
611
+ resolved := base.ResolveReference(parsed)
612
+ if resolved.Scheme != "http" && resolved.Scheme != "https" {
613
+ continue
614
+ }
615
+
616
+ fullURL := resolved.String()
617
+ if !seen[fullURL] && len(fullURL) < 2048 {
618
+ seen[fullURL] = true
619
+ links = append(links, fullURL)
620
+ }
621
+
622
+ if len(links) >= 100 {
623
+ break
624
+ }
625
+ }
626
+ return links
627
+ }
628
+
629
+ func chunkText(text string, maxLen int) []string {
630
+ words := strings.Fields(text)
631
+ if len(words) == 0 {
632
+ return nil
633
+ }
634
+
635
+ var chunks []string
636
+ var current []string
637
+ currentLen := 0
638
+
639
+ for _, word := range words {
640
+ if currentLen+len(word)+1 > maxLen && len(current) > 0 {
641
+ chunks = append(chunks, strings.Join(current, " "))
642
+ current = nil
643
+ currentLen = 0
644
+ }
645
+ current = append(current, word)
646
+ currentLen += len(word) + 1
647
+ }
648
+
649
+ if len(current) > 0 {
650
+ chunks = append(chunks, strings.Join(current, " "))
651
+ }
652
+ return chunks
653
+ }
654
+
655
+ func isCleanText(text string) bool {
656
+ if len(text) < 50 {
657
+ return false
658
+ }
659
+
660
+ alphaCount := 0
661
+ for _, r := range text {
662
+ if (r >= 'a' && r <= 'z') || (r >= 'A' && r <= 'Z') || r == ' ' {
663
+ alphaCount++
664
+ }
665
+ }
666
+
667
+ if float64(alphaCount)/float64(len(text)) < 0.5 {
668
+ return false
669
+ }
670
+
671
+ // Reject code patterns
672
+ textLower := strings.ToLower(text)
673
+ codePatterns := []string{
674
+ "function(", "window.", "document.", "addeventlistener",
675
+ "var ", "let ", "const ", "=>", "});", "=>{",
676
+ "getelementbyid", "queryselector", "prototype",
677
+ }
678
+ for _, p := range codePatterns {
679
+ if strings.Contains(textLower, p) {
680
+ return false
681
+ }
682
+ }
683
+
684
+ words := strings.Fields(text)
685
+ return len(words) >= 5
686
+ }
687
+
688
+ func textToVector(text string, vecSize int) []float32 {
689
+ if vecSize < 64 {
690
+ vecSize = 64
691
+ }
692
+ vec := make([]float32, vecSize)
693
+
694
+ prime1 := vecSize/2 - 1
695
+ if prime1 < 1 {
696
+ prime1 = 1
697
+ }
698
+ prime2 := vecSize
699
+
700
+ for i, ch := range text {
701
+ idx := (int(ch) * prime1 * (i + 1)) % vecSize
702
+ vec[idx] += float32(ch) / float32(vecSize)
703
+
704
+ // Bigram
705
+ if i > 0 {
706
+ prev := rune(text[i-1])
707
+ idx2 := (int(prev)*prime2 + int(ch)) % vecSize
708
+ vec[idx2] += 0.5
709
+ }
710
+ }
711
+
712
+ // Normalize
713
+ var sum float32
714
+ for _, v := range vec {
715
+ sum += v * v
716
+ }
717
+ if sum > 0 {
718
+ scale := float32(1.0 / math.Sqrt(float64(sum)))
719
+ for i := range vec {
720
+ vec[i] *= scale
721
+ }
722
+ }
723
+ return vec
724
+ }
725
+
726
+ // ============================================================================
727
+ // FEDERATION - UDP pattern sharing back to Thunderport
728
+ // ============================================================================
729
+
730
+ type FederationClient struct {
731
+ mu sync.Mutex
732
+ conn *net.UDPConn
733
+ addr *net.UDPAddr
734
+ sent int64
735
+ failed int64
736
+ lastSend time.Time
737
+ }
738
+
739
+ type FederationMessage struct {
740
+ NodeID string `json:"node_id"`
741
+ Timestamp int64 `json:"ts"`
742
+ Patterns int `json:"patterns"`
743
+ Vectors [][]byte `json:"vectors"` // Encoded pattern vectors
744
+ Source string `json:"source"`
745
+ }
746
+
747
+ func NewFederationClient() (*FederationClient, error) {
748
+ addr, err := net.ResolveUDPAddr("udp", fmt.Sprintf("%s:%d", ThunderportIP, FederationPort))
749
+ if err != nil {
750
+ return nil, err
751
+ }
752
+
753
+ conn, err := net.DialUDP("udp", nil, addr)
754
+ if err != nil {
755
+ return nil, err
756
+ }
757
+
758
+ return &FederationClient{
759
+ conn: conn,
760
+ addr: addr,
761
+ }, nil
762
+ }
763
+
764
+ func (fc *FederationClient) SendPatterns(patterns []*Pattern) error {
765
+ if len(patterns) == 0 {
766
+ return nil
767
+ }
768
+
769
+ fc.mu.Lock()
770
+ defer fc.mu.Unlock()
771
+
772
+ // Encode patterns
773
+ vectors := make([][]byte, len(patterns))
774
+ for i, p := range patterns {
775
+ buf := new(bytes.Buffer)
776
+ for _, v := range p.Vector {
777
+ binary.Write(buf, binary.LittleEndian, v)
778
+ }
779
+ vectors[i] = buf.Bytes()
780
+ }
781
+
782
+ msg := FederationMessage{
783
+ NodeID: "curiosity-crawler",
784
+ Timestamp: time.Now().Unix(),
785
+ Patterns: len(patterns),
786
+ Vectors: vectors,
787
+ Source: "hf-spaces",
788
+ }
789
+
790
+ data, err := json.Marshal(msg)
791
+ if err != nil {
792
+ return err
793
+ }
794
+
795
+ // Limit message size to 64KB for UDP
796
+ if len(data) > 65000 {
797
+ // Send in batches
798
+ batchSize := len(patterns) / 4
799
+ if batchSize < 1 {
800
+ batchSize = 1
801
+ }
802
+ for i := 0; i < len(patterns); i += batchSize {
803
+ end := i + batchSize
804
+ if end > len(patterns) {
805
+ end = len(patterns)
806
+ }
807
+ fc.SendPatterns(patterns[i:end])
808
+ }
809
+ return nil
810
+ }
811
+
812
+ _, err = fc.conn.Write(data)
813
+ if err != nil {
814
+ atomic.AddInt64(&fc.failed, 1)
815
+ return err
816
+ }
817
+
818
+ atomic.AddInt64(&fc.sent, int64(len(patterns)))
819
+ fc.lastSend = time.Now()
820
+ return nil
821
+ }
822
+
823
+ func (fc *FederationClient) Stats() (sent, failed int64, lastSend time.Time) {
824
+ return atomic.LoadInt64(&fc.sent), atomic.LoadInt64(&fc.failed), fc.lastSend
825
+ }
826
+
827
+ // ============================================================================
828
+ // CURIOSITY SWARM - Orchestrates parallel crawling
829
+ // ============================================================================
830
+
831
+ type CuriositySwarm struct {
832
+ wknow *WKnowCompressor
833
+ hunger *HungerMap
834
+ limiter *DomainLimiter
835
+ federation *FederationClient
836
+
837
+ urlQueue chan string
838
+ seen sync.Map // map[string]struct{}
839
+
840
+ stats SwarmStats
841
+ running int32
842
+ }
843
+
844
+ type SwarmStats struct {
845
+ PagesCrawled int64
846
+ PatternsExtracted int64
847
+ PatternsLearned int64
848
+ PatternsFederated int64
849
+ BytesDownloaded int64
850
+ SearchesRun int64
851
+ StartTime time.Time
852
+ }
853
+
854
+ func NewCuriositySwarm(wknow *WKnowCompressor) (*CuriositySwarm, error) {
855
+ fc, err := NewFederationClient()
856
+ if err != nil {
857
+ log.Printf("[WARN] Federation unavailable: %v", err)
858
+ fc = nil // Continue without federation
859
+ }
860
+
861
+ return &CuriositySwarm{
862
+ wknow: wknow,
863
+ hunger: NewHungerMap(wknow),
864
+ limiter: NewDomainLimiter(),
865
+ federation: fc,
866
+ urlQueue: make(chan string, 10000),
867
+ stats: SwarmStats{StartTime: time.Now()},
868
+ }, nil
869
+ }
870
+
871
+ func (s *CuriositySwarm) Start(seeds []string) {
872
+ atomic.StoreInt32(&s.running, 1)
873
+ log.Printf("[SWARM] Starting with %d workers", WorkerCount)
874
+
875
+ // Seed the queue
876
+ for _, seed := range seeds {
877
+ s.enqueue(seed)
878
+ }
879
+
880
+ // Start workers
881
+ for i := 0; i < WorkerCount; i++ {
882
+ go s.worker(i)
883
+ }
884
+
885
+ // Start hunger-driven search
886
+ go s.hungerLoop()
887
+
888
+ // Start stats logging
889
+ go s.statsLoop()
890
+ }
891
+
892
+ func (s *CuriositySwarm) Stop() {
893
+ atomic.StoreInt32(&s.running, 0)
894
+ log.Println("[SWARM] Stopped")
895
+ }
896
+
897
+ func (s *CuriositySwarm) IsRunning() bool {
898
+ return atomic.LoadInt32(&s.running) == 1
899
+ }
900
+
901
+ func (s *CuriositySwarm) enqueue(urlStr string) bool {
902
+ if _, loaded := s.seen.LoadOrStore(urlStr, struct{}{}); loaded {
903
+ return false
904
+ }
905
+ select {
906
+ case s.urlQueue <- urlStr:
907
+ return true
908
+ default:
909
+ return false
910
+ }
911
+ }
912
+
913
+ func (s *CuriositySwarm) worker(id int) {
914
+ crawler := NewCrawler(s.wknow)
915
+
916
+ for atomic.LoadInt32(&s.running) == 1 {
917
+ var urlStr string
918
+ select {
919
+ case urlStr = <-s.urlQueue:
920
+ case <-time.After(time.Second):
921
+ continue
922
+ }
923
+
924
+ domain := extractDomain(urlStr)
925
+ if domain == "" {
926
+ continue
927
+ }
928
+
929
+ // Check robots.txt (simplified - just rate limit)
930
+ if !s.limiter.Admit(domain) {
931
+ // Re-queue for later
932
+ go func(u string) {
933
+ time.Sleep(RateLimitDelay)
934
+ s.enqueue(u)
935
+ }(urlStr)
936
+ continue
937
+ }
938
+
939
+ result := crawler.Crawl(urlStr)
940
+ if result == nil {
941
+ continue
942
+ }
943
+
944
+ atomic.AddInt64(&s.stats.PagesCrawled, 1)
945
+ atomic.AddInt64(&s.stats.BytesDownloaded, int64(result.Size))
946
+
947
+ // Score and integrate patterns
948
+ var learned []*Pattern
949
+ for _, p := range result.Patterns {
950
+ atomic.AddInt64(&s.stats.PatternsExtracted, 1)
951
+
952
+ // Score against W_know - only learn novel/interesting patterns
953
+ score := s.wknow.Score(p.Vector)
954
+
955
+ // Bootstrap: learn everything when sparse
956
+ // Otherwise: learn patterns that score moderately (not too familiar, not noise)
957
+ nz := s.wknow.NonZeros()
958
+ bootstrap := nz < int64(s.wknow.Neurons()*s.wknow.Neurons()/100) // < 1% full
959
+
960
+ if bootstrap || (score > 0.1 && score < 0.8) {
961
+ s.wknow.IntegratePattern(p.Vector)
962
+ atomic.AddInt64(&s.stats.PatternsLearned, 1)
963
+ learned = append(learned, p)
964
+ }
965
+ }
966
+
967
+ // Federate learned patterns
968
+ if s.federation != nil && len(learned) > 0 {
969
+ if err := s.federation.SendPatterns(learned); err == nil {
970
+ atomic.AddInt64(&s.stats.PatternsFederated, int64(len(learned)))
971
+ }
972
+ }
973
+
974
+ // Enqueue discovered links
975
+ for _, link := range result.Links {
976
+ s.enqueue(link)
977
+ }
978
+ }
979
+ }
980
+
981
+ func (s *CuriositySwarm) hungerLoop() {
982
+ ticker := time.NewTicker(HungerCheckInterval)
983
+ defer ticker.Stop()
984
+
985
+ for atomic.LoadInt32(&s.running) == 1 {
986
+ <-ticker.C
987
+
988
+ // Scan W_know density
989
+ s.hunger.Scan()
990
+
991
+ // Get hungriest regions
992
+ hungry := s.hunger.HungriestRegions(HungriestRegions)
993
+
994
+ // Generate and run searches
995
+ for _, region := range hungry {
996
+ query := s.hunger.GenerateSearchQuery(region)
997
+ s.runDuckDuckGoSearch(query)
998
+ atomic.AddInt64(&s.stats.SearchesRun, 1)
999
+ }
1000
+ }
1001
+ }
1002
+
1003
+ func (s *CuriositySwarm) runDuckDuckGoSearch(query string) {
1004
+ // Use DuckDuckGo HTML endpoint
1005
+ searchURL := fmt.Sprintf("https://html.duckduckgo.com/html/?q=%s", url.QueryEscape(query))
1006
+
1007
+ resp, err := http.Get(searchURL)
1008
+ if err != nil {
1009
+ log.Printf("[SEARCH] Error: %v", err)
1010
+ return
1011
+ }
1012
+ defer resp.Body.Close()
1013
+
1014
+ body, err := io.ReadAll(io.LimitReader(resp.Body, 512*1024))
1015
+ if err != nil {
1016
+ return
1017
+ }
1018
+
1019
+ // Extract result URLs
1020
+ re := regexp.MustCompile(`href="(https?://[^"]+)"`)
1021
+ matches := re.FindAllStringSubmatch(string(body), 20)
1022
+
1023
+ queued := 0
1024
+ for _, match := range matches {
1025
+ if len(match) < 2 {
1026
+ continue
1027
+ }
1028
+ link := match[1]
1029
+ // Skip DuckDuckGo's own URLs
1030
+ if strings.Contains(link, "duckduckgo.com") {
1031
+ continue
1032
+ }
1033
+ if s.enqueue(link) {
1034
+ queued++
1035
+ }
1036
+ }
1037
+ log.Printf("[SEARCH] Query '%s': queued %d URLs", query, queued)
1038
+ }
1039
+
1040
+ func (s *CuriositySwarm) statsLoop() {
1041
+ ticker := time.NewTicker(StatsInterval)
1042
+ defer ticker.Stop()
1043
+
1044
+ for atomic.LoadInt32(&s.running) == 1 {
1045
+ <-ticker.C
1046
+
1047
+ pages := atomic.LoadInt64(&s.stats.PagesCrawled)
1048
+ extracted := atomic.LoadInt64(&s.stats.PatternsExtracted)
1049
+ learned := atomic.LoadInt64(&s.stats.PatternsLearned)
1050
+ federated := atomic.LoadInt64(&s.stats.PatternsFederated)
1051
+ searches := atomic.LoadInt64(&s.stats.SearchesRun)
1052
+ bytes := atomic.LoadInt64(&s.stats.BytesDownloaded)
1053
+
1054
+ log.Printf("[STATS] pages=%d extracted=%d learned=%d federated=%d searches=%d bytes=%dMB wknow_nz=%d",
1055
+ pages, extracted, learned, federated, searches, bytes/(1024*1024), s.wknow.NonZeros())
1056
+ }
1057
+ }
1058
+
1059
+ func (s *CuriositySwarm) Stats() SwarmStats {
1060
+ return SwarmStats{
1061
+ PagesCrawled: atomic.LoadInt64(&s.stats.PagesCrawled),
1062
+ PatternsExtracted: atomic.LoadInt64(&s.stats.PatternsExtracted),
1063
+ PatternsLearned: atomic.LoadInt64(&s.stats.PatternsLearned),
1064
+ PatternsFederated: atomic.LoadInt64(&s.stats.PatternsFederated),
1065
+ BytesDownloaded: atomic.LoadInt64(&s.stats.BytesDownloaded),
1066
+ SearchesRun: atomic.LoadInt64(&s.stats.SearchesRun),
1067
+ StartTime: s.stats.StartTime,
1068
+ }
1069
+ }
1070
+
1071
+ // ============================================================================
1072
+ // HTTP DASHBOARD - Status page on port 7860
1073
+ // ============================================================================
1074
+
1075
+ type Dashboard struct {
1076
+ swarm *CuriositySwarm
1077
+ wknow *WKnowCompressor
1078
+ }
1079
+
1080
+ func NewDashboard(swarm *CuriositySwarm, wknow *WKnowCompressor) *Dashboard {
1081
+ return &Dashboard{swarm: swarm, wknow: wknow}
1082
+ }
1083
+
1084
+ func (d *Dashboard) Start() {
1085
+ http.HandleFunc("/", d.handleHome)
1086
+ http.HandleFunc("/status", d.handleStatus)
1087
+ http.HandleFunc("/api/stats", d.handleAPIStats)
1088
+
1089
+ log.Printf("[HTTP] Dashboard starting on port %d", HTTPPort)
1090
+ go http.ListenAndServe(fmt.Sprintf(":%d", HTTPPort), nil)
1091
+ }
1092
+
1093
+ const dashboardTemplate = `<!DOCTYPE html>
1094
+ <html>
1095
+ <head>
1096
+ <title>MEGAMIND Curiosity Crawler</title>
1097
+ <meta charset="utf-8">
1098
+ <meta http-equiv="refresh" content="30">
1099
+ <style>
1100
+ body { font-family: 'Courier New', monospace; background: #0a0a0a; color: #00ff88; margin: 40px; }
1101
+ h1 { color: #00d4ff; text-shadow: 0 0 10px #00d4ff; }
1102
+ .stat { display: inline-block; margin: 20px; padding: 20px; border: 1px solid #00ff88; }
1103
+ .stat-value { font-size: 2em; color: #ffffff; }
1104
+ .stat-label { color: #888; }
1105
+ .equation { color: #ff9500; margin: 5px 0; font-style: italic; }
1106
+ .section { margin: 30px 0; }
1107
+ .hunger-bar { background: #333; height: 20px; margin: 5px 0; }
1108
+ .hunger-fill { background: linear-gradient(90deg, #ff0000, #ff9500, #00ff88); height: 100%; }
1109
+ table { border-collapse: collapse; width: 100%; }
1110
+ td, th { border: 1px solid #333; padding: 10px; text-align: left; }
1111
+ th { background: #1a1a1a; color: #00d4ff; }
1112
+ </style>
1113
+ </head>
1114
+ <body>
1115
+ <h1>MEGAMIND CURIOSITY CRAWLER</h1>
1116
+
1117
+ <div class="section">
1118
+ <h2>Seed Equations (Interest Profile)</h2>
1119
+ {{range .Equations}}
1120
+ <div class="equation">{{.}}</div>
1121
+ {{end}}
1122
+ </div>
1123
+
1124
+ <div class="section">
1125
+ <h2>Crawl Statistics</h2>
1126
+ <div class="stat">
1127
+ <div class="stat-value">{{.Stats.PagesCrawled}}</div>
1128
+ <div class="stat-label">Pages Crawled</div>
1129
+ </div>
1130
+ <div class="stat">
1131
+ <div class="stat-value">{{.Stats.PatternsExtracted}}</div>
1132
+ <div class="stat-label">Patterns Extracted</div>
1133
+ </div>
1134
+ <div class="stat">
1135
+ <div class="stat-value">{{.Stats.PatternsLearned}}</div>
1136
+ <div class="stat-label">Patterns Learned</div>
1137
+ </div>
1138
+ <div class="stat">
1139
+ <div class="stat-value">{{.Stats.PatternsFederated}}</div>
1140
+ <div class="stat-label">Federated to Thunderport</div>
1141
+ </div>
1142
+ <div class="stat">
1143
+ <div class="stat-value">{{printf "%.2f" .BytesMB}} MB</div>
1144
+ <div class="stat-label">Data Downloaded</div>
1145
+ </div>
1146
+ <div class="stat">
1147
+ <div class="stat-value">{{.Stats.SearchesRun}}</div>
1148
+ <div class="stat-label">Curiosity Searches</div>
1149
+ </div>
1150
+ </div>
1151
+
1152
+ <div class="section">
1153
+ <h2>W_know Brain Status</h2>
1154
+ <table>
1155
+ <tr><th>Metric</th><th>Value</th></tr>
1156
+ <tr><td>Dimensions</td><td>{{.WKnowNeurons}} x {{.WKnowNeurons}}</td></tr>
1157
+ <tr><td>Non-zeros</td><td>{{.WKnowNonZeros}}</td></tr>
1158
+ <tr><td>Density</td><td>{{printf "%.4f" .WKnowDensity}}%</td></tr>
1159
+ <tr><td>Patterns Integrated</td><td>{{.WKnowPatterns}}</td></tr>
1160
+ </table>
1161
+ </div>
1162
+
1163
+ <div class="section">
1164
+ <h2>Hunger Map (Sparse Regions)</h2>
1165
+ {{range $i, $d := .HungerDensity}}
1166
+ <div>Region {{$i}}:
1167
+ <div class="hunger-bar"><div class="hunger-fill" style="width: {{printf "%.1f" $d}}%"></div></div>
1168
+ </div>
1169
+ {{end}}
1170
+ </div>
1171
+
1172
+ <div class="section">
1173
+ <h2>Federation Status</h2>
1174
+ <p>Target: {{.FederationTarget}}</p>
1175
+ <p>Patterns Sent: {{.FederationSent}}</p>
1176
+ <p>Status: {{.FederationStatus}}</p>
1177
+ </div>
1178
+
1179
+ <div class="section">
1180
+ <p>Uptime: {{.Uptime}}</p>
1181
+ <p>Workers: {{.Workers}}</p>
1182
+ </div>
1183
+ </body>
1184
+ </html>`
1185
+
1186
+ type DashboardData struct {
1187
+ Equations []string
1188
+ Stats SwarmStats
1189
+ BytesMB float64
1190
+ WKnowNeurons int
1191
+ WKnowNonZeros int64
1192
+ WKnowDensity float64
1193
+ WKnowPatterns int64
1194
+ HungerDensity []float64
1195
+ FederationTarget string
1196
+ FederationSent int64
1197
+ FederationStatus string
1198
+ Uptime string
1199
+ Workers int
1200
+ }
1201
+
1202
+ func (d *Dashboard) handleHome(w http.ResponseWriter, r *http.Request) {
1203
+ stats := d.swarm.Stats()
1204
+
1205
+ d.swarm.hunger.mu.RLock()
1206
+ hungerDensity := make([]float64, len(d.swarm.hunger.density))
1207
+ for i, v := range d.swarm.hunger.density {
1208
+ hungerDensity[i] = v * 100 // Convert to percentage
1209
+ }
1210
+ d.swarm.hunger.mu.RUnlock()
1211
+
1212
+ nz := d.wknow.NonZeros()
1213
+ total := int64(d.wknow.Neurons()) * int64(d.wknow.Neurons())
1214
+
1215
+ var fedSent int64
1216
+ fedStatus := "disconnected"
1217
+ if d.swarm.federation != nil {
1218
+ sent, _, lastSend := d.swarm.federation.Stats()
1219
+ fedSent = sent
1220
+ if time.Since(lastSend) < time.Minute {
1221
+ fedStatus = "active"
1222
+ } else if sent > 0 {
1223
+ fedStatus = "idle"
1224
+ }
1225
+ }
1226
+
1227
+ data := DashboardData{
1228
+ Equations: SeedEquations,
1229
+ Stats: stats,
1230
+ BytesMB: float64(stats.BytesDownloaded) / (1024 * 1024),
1231
+ WKnowNeurons: d.wknow.Neurons(),
1232
+ WKnowNonZeros: nz,
1233
+ WKnowDensity: float64(nz) / float64(total) * 100,
1234
+ WKnowPatterns: d.wknow.PatternCount(),
1235
+ HungerDensity: hungerDensity,
1236
+ FederationTarget: fmt.Sprintf("%s:%d", ThunderportIP, FederationPort),
1237
+ FederationSent: fedSent,
1238
+ FederationStatus: fedStatus,
1239
+ Uptime: time.Since(stats.StartTime).Round(time.Second).String(),
1240
+ Workers: WorkerCount,
1241
+ }
1242
+
1243
+ tmpl, err := template.New("dashboard").Parse(dashboardTemplate)
1244
+ if err != nil {
1245
+ http.Error(w, err.Error(), 500)
1246
+ return
1247
+ }
1248
+
1249
+ w.Header().Set("Content-Type", "text/html")
1250
+ tmpl.Execute(w, data)
1251
+ }
1252
+
1253
+ func (d *Dashboard) handleStatus(w http.ResponseWriter, r *http.Request) {
1254
+ stats := d.swarm.Stats()
1255
+
1256
+ var fedSent int64
1257
+ if d.swarm.federation != nil {
1258
+ sent, _, _ := d.swarm.federation.Stats()
1259
+ fedSent = sent
1260
+ }
1261
+
1262
+ status := map[string]interface{}{
1263
+ "status": "running",
1264
+ "uptime_s": int(time.Since(stats.StartTime).Seconds()),
1265
+ "pages": stats.PagesCrawled,
1266
+ "patterns": stats.PatternsLearned,
1267
+ "federated": fedSent,
1268
+ "wknow_nz": d.wknow.NonZeros(),
1269
+ }
1270
+
1271
+ w.Header().Set("Content-Type", "application/json")
1272
+ json.NewEncoder(w).Encode(status)
1273
+ }
1274
+
1275
+ func (d *Dashboard) handleAPIStats(w http.ResponseWriter, r *http.Request) {
1276
+ d.handleStatus(w, r)
1277
+ }
1278
+
1279
+ // ============================================================================
1280
+ // MAIN
1281
+ // ============================================================================
1282
+
1283
+ // downloadWKnow downloads W_know from HuggingFace if not present locally
1284
+ func downloadWKnow(destPath string) error {
1285
+ // Try to download from HuggingFace datasets
1286
+ urls := []string{
1287
+ "https://huggingface.co/datasets/Janady07/megamind-wknow/resolve/main/w_know.bin",
1288
+ "https://huggingface.co/Janady07/megamind-curiosity/resolve/main/w_know.bin",
1289
+ }
1290
+
1291
+ for _, url := range urls {
1292
+ log.Printf("[WKNOW] Attempting download from %s", url)
1293
+ resp, err := http.Get(url)
1294
+ if err != nil {
1295
+ log.Printf("[WKNOW] Download failed: %v", err)
1296
+ continue
1297
+ }
1298
+ if resp.StatusCode != 200 {
1299
+ resp.Body.Close()
1300
+ log.Printf("[WKNOW] Download failed: HTTP %d", resp.StatusCode)
1301
+ continue
1302
+ }
1303
+
1304
+ // Create output file
1305
+ out, err := os.Create(destPath)
1306
+ if err != nil {
1307
+ resp.Body.Close()
1308
+ return err
1309
+ }
1310
+
1311
+ // Copy with progress
1312
+ written, err := io.Copy(out, resp.Body)
1313
+ resp.Body.Close()
1314
+ out.Close()
1315
+
1316
+ if err != nil {
1317
+ os.Remove(destPath)
1318
+ log.Printf("[WKNOW] Download failed: %v", err)
1319
+ continue
1320
+ }
1321
+
1322
+ log.Printf("[WKNOW] Downloaded %d MB", written/(1024*1024))
1323
+ return nil
1324
+ }
1325
+
1326
+ return fmt.Errorf("all download sources failed")
1327
+ }
1328
+
1329
+ func main() {
1330
+ log.SetFlags(log.Ldate | log.Ltime | log.Lmicroseconds)
1331
+ log.Println("MEGAMIND Curiosity Crawler starting...")
1332
+
1333
+ // Ensure data directory exists
1334
+ os.MkdirAll("/app/data", 0755)
1335
+
1336
+ // Load or create W_know
1337
+ wknowPath := os.Getenv("WKNOW_PATH")
1338
+ if wknowPath == "" {
1339
+ wknowPath = "/app/data/w_know.bin"
1340
+ }
1341
+
1342
+ var wknow *WKnowCompressor
1343
+ var err error
1344
+
1345
+ // Try to load existing W_know
1346
+ if _, statErr := os.Stat(wknowPath); statErr == nil {
1347
+ wknow, err = LoadWKnow(wknowPath)
1348
+ if err != nil {
1349
+ log.Printf("[WARN] Failed to load W_know: %v", err)
1350
+ wknow = nil
1351
+ }
1352
+ }
1353
+
1354
+ // If not loaded, try to download from HuggingFace
1355
+ if wknow == nil {
1356
+ log.Println("[INFO] W_know not found locally, attempting download...")
1357
+ if err := downloadWKnow(wknowPath); err == nil {
1358
+ wknow, err = LoadWKnow(wknowPath)
1359
+ if err != nil {
1360
+ log.Printf("[WARN] Failed to load downloaded W_know: %v", err)
1361
+ wknow = nil
1362
+ }
1363
+ } else {
1364
+ log.Printf("[WARN] Download failed: %v", err)
1365
+ }
1366
+ }
1367
+
1368
+ // If still not loaded, create fresh
1369
+ if wknow == nil {
1370
+ log.Printf("[INFO] Creating fresh W_know matrix (%dx%d)", WKnowDim, WKnowDim)
1371
+ wknow = NewWKnowCompressor(WKnowDim, nil)
1372
+
1373
+ // Bootstrap with seed equations
1374
+ for _, eq := range SeedEquations {
1375
+ vec := textToVector(eq, WKnowDim/13)
1376
+ wknow.IntegratePattern(vec)
1377
+ }
1378
+ log.Printf("[INFO] Bootstrapped with %d seed equation patterns", len(SeedEquations))
1379
+ }
1380
+
1381
+ // Create swarm
1382
+ swarm, err := NewCuriositySwarm(wknow)
1383
+ if err != nil {
1384
+ log.Fatalf("Failed to create swarm: %v", err)
1385
+ }
1386
+
1387
+ // Start dashboard
1388
+ dashboard := NewDashboard(swarm, wknow)
1389
+ dashboard.Start()
1390
+
1391
+ // Seed URLs - diverse starting points
1392
+ seeds := []string{
1393
+ "https://en.wikipedia.org/wiki/Artificial_intelligence",
1394
+ "https://en.wikipedia.org/wiki/Neural_network",
1395
+ "https://en.wikipedia.org/wiki/Machine_learning",
1396
+ "https://en.wikipedia.org/wiki/Consciousness",
1397
+ "https://en.wikipedia.org/wiki/Hamiltonian_mechanics",
1398
+ "https://en.wikipedia.org/wiki/Fibonacci_number",
1399
+ "https://en.wikipedia.org/wiki/Hebbian_theory",
1400
+ "https://arxiv.org/list/cs.AI/recent",
1401
+ "https://arxiv.org/list/cs.LG/recent",
1402
+ "https://arxiv.org/list/cs.NE/recent",
1403
+ "https://huggingface.co/papers",
1404
+ "https://news.ycombinator.com/",
1405
+ }
1406
+
1407
+ // Start crawling
1408
+ swarm.Start(seeds)
1409
+
1410
+ // Handle shutdown
1411
+ sigCh := make(chan os.Signal, 1)
1412
+ signal.Notify(sigCh, syscall.SIGINT, syscall.SIGTERM)
1413
+ <-sigCh
1414
+
1415
+ log.Println("Shutting down...")
1416
+ swarm.Stop()
1417
+
1418
+ // Save W_know
1419
+ if err := wknow.Save(wknowPath); err != nil {
1420
+ log.Printf("[ERROR] Failed to save W_know: %v", err)
1421
+ } else {
1422
+ log.Printf("[INFO] Saved W_know to %s", wknowPath)
1423
+ }
1424
+
1425
+ // Print final stats
1426
+ stats := swarm.Stats()
1427
+ log.Printf("[FINAL] Pages: %d, Patterns: %d, Federated: %d, W_know NZ: %d",
1428
+ stats.PagesCrawled, stats.PatternsLearned, stats.PatternsFederated, wknow.NonZeros())
1429
+ }