Spaces:

build-small-hackathon
/

HearthNet

Running on Zero

File size: 20,155 Bytes

# HearthNet — Architecture Reference

> **Local-first community AI mesh.** Each participant runs a node on their own hardware.
> Nodes discover each other automatically and share AI capabilities, files, and community
> posts — no central server required.

---

## High-Level Concept

```
┌──────────────────────────────────────────────────────────────────────────┐
│                         Community Mesh (LAN / overlay)                    │
│                                                                           │
│   ┌─────────────┐    mDNS/UDP     ┌─────────────┐    mDNS/UDP            │
│   │  Node A     │◄───────────────►│  Node B     │◄──────────────         │
│   │  (anchor)   │                 │  (hearth)   │                         │
│   │             │   capability    │             │                         │
│   │  CapBus ◄───┼─────bus.call───►─►  CapBus   │                         │
│   │  LLM svc    │                 │  RAG svc    │                         │
│   │  RAG svc    │                 │  OCR svc    │                         │
│   │  Gradio UI  │                 │  Gradio UI  │                         │
│   └─────────────┘                 └─────────────┘                         │
└──────────────────────────────────────────────────────────────────────────┘
```

HearthNet is structured around three ideas:

1. **Node** — a Python process on someone's hardware (Raspberry Pi, laptop, server).
2. **CapabilityBus** — a message bus where services register *capabilities* (e.g. `llm.chat@1.0`). Any code, local or remote, calls a capability by name.
3. **Services** — pure-Python objects that handle capability calls. A node installs whichever services its hardware supports.

---

## Module Map

### Phase 1 — Foundation

| Module | Location | What it does |
|--------|----------|-------------|
| **M01 Identity** | `hearthnet/identity/` | Ed25519 node keys, community manifests, invite tokens |
| **M02 Discovery** | `hearthnet/discovery/` | mDNS + UDP multicast peer discovery |
| **M03 Bus** | `hearthnet/bus/` | Capability router, health ring buffer, trust levels |
| **M04 LLM** | `hearthnet/services/llm/` | Local model backends (Ollama, llama.cpp, LM Studio, HF, Anthropic) |
| **M05 RAG** | `hearthnet/services/rag/` | Chunker → embedder → Chroma vector store + retrieval |
| **M06 Marketplace** | `hearthnet/services/marketplace/` | Event-sourced community board (posts, offers, requests) |
| **M07 Blobs** | `hearthnet/blobs/` | BLAKE3 content-addressed file store with chunked transfer |
| **M08 UI** | `hearthnet/ui/` | Gradio 8-tab interface + themes + topology component |
| **M09 Emergency** | `hearthnet/emergency/` | Async probe loop → emergency state machine |
| **M10 Chat** | `hearthnet/services/chat/` | Event-backed direct messages between nodes |
| **M11 Embedding** | `hearthnet/services/embedding/` | Sentence-transformer embeddings (BAAI/bge-small) |
| **M12 CLI** | `hearthnet/cli.py` | Click CLI: run, call, log, rag, invite, version, … |
| **M13 Onboarding** | `hearthnet/ui/onboarding.py` | Invite QR flow + first-run wizard |

### Phase 2 — Resilience & Rich Services

| Module | Location | What it does |
|--------|----------|-------------|
| **M14 Federation** | `hearthnet/federation/` | Cross-community node manifests + signed bridges |
| **M15 Relay** | `hearthnet/relay/` | Public-IP relay tier for NAT traversal |
| **M16 Tokens** | `hearthnet/identity/tokens.py` | AuthToken / CapabilityToken scoped access |
| **M17 OCR** | `hearthnet/services/ocr/` | Tesseract / TrOCR text extraction |
| **M18 Translation** | `hearthnet/services/translation/` | NLLB-200 local translation |
| **M19 STT/TTS** | `hearthnet/services/stt_tts/` | Whisper STT + Coqui/pyttsx3 TTS |
| **M20 Vision** | `hearthnet/services/vision/` | Florence-2 image captioning / VQA |
| **M21 Tool Calls** | `hearthnet/services/tools/` | LLM tool-call executor (plant ID, search, …) |
| **M22 Mobile** | `hearthnet/ui/mobile/` | PWA manifest + service worker for home-screen install |
| **M23 E2E Encryption** | `hearthnet/crypto/` | X25519 ECDH + ChaCha20-Poly1305 channel encryption |
| **M24 Rerank** | `hearthnet/services/rerank/` | Cross-encoder reranking for RAG results |
| **M25 Group Chat** | `hearthnet/services/group_chat/` | Multi-party room-based chat |

### Phase 3 — Experimental (opt-in via `config.toml`)

| Module | Location | Flag | What it does |
|--------|----------|------|-------------|
| **M26 Distributed Inference** | `hearthnet/distributed_inference/` | `research.distributed_inference` | Layer-shard a 7B model across LAN nodes (Petals-style) |
| **M27 MoE Routing** | `hearthnet/moe/` | `research.moe_routing` | Route queries to best expert (model/service/human) via learned scorer |
| **M28 FedLearn** | `hearthnet/fedlearn/` | `research.fedlearn` | FedAvg LoRA fine-tuning without sharing raw data |
| **M29 LoRa Beacons** | `hearthnet/lora/` | `research.lora_beacons` | 868 MHz offline "I'm alive" heartbeats via USB LoRa stick |
| **M30 Evidence Graph** | `hearthnet/evidence/` | `research.evidence` | Claim → attest → dispute provenance graph + EBKH bridge |
| **M31 Civil Defense** | `hearthnet/civdef/` | `research.civil_defense` | THW/DRK/KatS alert pipeline with role certs + audit chain |
| **M32 Protocol Standard** | `hearthnet/services/protocol/` | on by default | Protocol version list + conformance report |

### Cross-Cutting

| ID | Location | What it does |
|----|----------|-------------|
| **X01 Transport** | `hearthnet/transport/` | HTTP/SSE client, backpressure, rate limiting, frame types |
| **X02 Events** | `hearthnet/events/` | SQLite Lamport event log + gossip sync |
| **X03 Observability** | `hearthnet/observability/` | Tracing, metrics, Doctor health checks, TrackioExporter |
| **X04 Config** | `hearthnet/config.py` | Typed TOML config + ResearchConfig feature flags |
| **X05 DHT** | `hearthnet/dht/` | Kademlia-inspired DHT for cross-LAN peer lookup |
| **X06 WebSocket** | `hearthnet/transport/` | WebSocket pubsub (StateBus → live UI push) |
| **X07 Federated Metrics** | `hearthnet/observability/` | Opt-in aggregate mesh health metrics |
| **X08 Tensor Transport** | `hearthnet/transport/tensor/` | Chunked tensor stream for M26 distributed inference |
| **X09 Conformance Suite** | `hearthnet/conformance/` | 21-check black-box conformance runner |

---

## Composition Root

`HearthNode` in [hearthnet/node.py](hearthnet/node.py) is the single composition root.

```python
node = HearthNode(
    node_id="my-node",
    display_name="Alice's Pi",
    community_id="ed25519:abc123",
)
node.install_services(corpus="general")
await node.start()
```

`install_services()` registers all services the local hardware supports into the bus. Heavy optional dependencies (torch, chromadb, etc.) are imported lazily and fail gracefully — a node with no GPU still works, it just can't answer GPU-only capabilities.

---

## Capability Bus

```
Caller ──── bus.call(name, version, body) ──────────┐
                                                     ▼
                                          ┌──────────────────┐
                                          │  CapabilityBus   │
                                          │                  │
                                          │  Registry        │
                                          │  ┌─────────────┐ │
                                          │  │ local route │─┼──► Service.handle()
                                          │  ├─────────────┤ │
                                          │  │ remote route│─┼──► HTTP POST /bus/v1/call
                                          │  └─────────────┘ │
                                          │  HealthMonitor   │
                                          │  TrustFilter     │
                                          └──────────────────┘
```

- **Local route** — service is installed on this node → direct Python call.
- **Remote route** — capability is advertised by a peer → HTTP POST to that peer's transport.
- **Version negotiation** — capabilities are registered with a `(major, minor)` version; the bus picks the highest compatible version.
- **Health monitoring** — each service's response times are tracked in a ring buffer; unhealthy services are quarantined for `BUS_QUARANTINE_SECONDS`.

---

## Data Flow: LLM Chat Request

```
User types in Gradio UI
       │
       ▼
  app.py (Gradio event handler)
       │  bus.call("llm.chat@1.0", body)
       ▼
  CapabilityBus.call()
       │
       ├─ local LlmService found?
       │       │ yes → LlmService.handle() → backend.chat() → yield Token
       │       │
       └─ no local service
               │ peer has llm.chat?
               ├─ yes → HTTP POST /bus/v1/call → remote node → stream tokens back
               └─ no  → CapabilityError("not_found")
```

---

## Discovery Flow

```
Node boots
    │
    ├── mDNS: register _hearthnet._tcp.local.  (LAN multicast DNS)
    ├── UDP: send announce to 224.0.0.251:7079 every 15s
    │
    ▼
PeerRegistry receives announcements from other nodes
    │
    ├── new peer → RegistryEvent(kind="added", entry=...)
    ├── peer gone (TTL expired) → RegistryEvent(kind="removed", ...)
    └── ManifestPublisher re-publishes every 300s
```

---

## Emergency Mode

```
EmergencyDetector (async loop, 30s probe)
    │
    ├── probe connectivity endpoints
    │
    ├── ONLINE  → EmergencyState.NORMAL
    │                │ UI shows normal theme
    │
    └── OFFLINE → EmergencyState.EMERGENCY
                     │ UI switches to emergency theme (red)
                     │ emergency.llm.chat capability activated
                     │ LoRa beacons sent if hardware available (M29)
                     │ Civil defense alerts published if role cert present (M31)
```

---

## MoE Expert Routing (M27)

```
Query arrives at any node
       │
       ▼
  MoeRouter.route(query, top_k=3)
       │
       ├── score all registered ExpertDescriptors against query
       │   (tag overlap + cosine similarity + recency weighting)
       │
       └── return ranked RouteResult
              │
              ├── expert_type="model"   → bus.call(f"llm.chat@1.0", ...) on that node
              ├── expert_type="service" → bus.call(expert_capability, ...)
              ├── expert_type="human"   → notify via chat + start handoff timer (M27 §4)
              └── expert_type="external"→ HTTP call to opt-in external API
```

Enable it: set `research.moe_routing = true` in `~/.config/hearthnet/config.toml`.

---

## Distributed Inference (M26 — BitTorrent-style LLM sharing)

```
Node A: layers 0–15 of Llama-3.2-3B
Node B: layers 16–27 of Llama-3.2-3B
Node C: layers 28–35 (lm_head) of Llama-3.2-3B
                │
                ▼
PipelineOrchestrator.plan(model_id="llama3.2:3b")
    │  → discovers shards via experimental.distributed_llm.shard.list
    │  → checks layer coverage: 0..35 ✓
    │
PipelineOrchestrator.run(pipeline, input_tokens)
    │  → sends activations A→B via X08 TensorTransport (1 MiB chunks)
    │  → B sends activations B→C
    │  → C returns final logits
    │
    └── caller gets streamed tokens like any local model
```

Model weights are shared chunk-by-chunk using BLAKE3 CID-addressed blob transfer — same
mechanism as file blobs (M07), but optimised for `.gguf` / `.safetensors` files.

---

## File Tree

```
hearthnet/
├── node.py                    # HearthNode — composition root
├── types.py                   # Shared type aliases (NodeID, ShardID, AlertID, …)
├── constants.py               # All numeric defaults and limits
├── config.py                  # HearthnetConfig + ResearchConfig (TOML-backed)
├── cli.py                     # Click CLI entry point
├── facades.py                 # HearthFacade — thin high-level API for app.py
├── controller.py              # HearthController — legacy thin wrapper
│
├── bus/                       # M03 CapabilityBus
│   ├── router.py              # routing logic (local → remote)
│   ├── registry.py            # CapabilityEntry, RegistryEvent, Diff
│   ├── capability.py          # CapabilityEntry dataclass
│   └── health.py              # ring-buffer health monitor
│
├── identity/                  # M01
│   ├── keys.py                # Ed25519 key generation + signing
│   ├── manifest.py            # NodeManifest, CommunityManifest, CommunityPolicy, …
│   └── tokens.py              # AuthToken, CapabilityToken
│
├── discovery/                 # M02
│   └── peers.py               # mDNS + UDP multicast PeerRegistry
│
├── transport/                 # X01 / X06 / X08
│   ├── client.py              # HTTP + SSE client
│   ├── streams.py             # Frame, SseReader
│   ├── backpressure.py        # FlowControl, RateCheck, RateLimiter
│   └── tensor/                # X08 tensor chunked transport
│
├── events/                    # X02
│   ├── log.py                 # SQLite Lamport event log
│   └── sync.py                # Gossip SyncClient / SyncServer
│
├── observability/             # X03
│   ├── tracing.py             # attach/detach trace context
│   ├── metrics.py             # MetricsCollector, TrackioExporter
│   └── doctor.py             # DoctorResult, CheckResult, DoctorService
│
├── services/                  # M04 – M21 + M32
│   ├── llm/                   # M04 — backends: ollama, llama_cpp, lmstudio, hf_api, anthropic
│   ├── rag/                   # M05
│   ├── marketplace/           # M06
│   ├── chat/                  # M10
│   ├── embedding/             # M11
│   ├── ocr/                   # M17
│   ├── translation/           # M18
│   ├── stt_tts/               # M19
│   ├── vision/                # M20
│   ├── tools/                 # M21
│   ├── group_chat/            # M25
│   └── protocol/              # M32
│
├── ui/                        # M08
│   ├── app.py                 # Gradio 8-tab entry point
│   ├── tabs/                  # one file per tab
│   ├── theme.py               # hearthnet_theme, emergency_theme
│   ├── topology.py            # TopologyComponent (mesh graph)
│   ├── onboarding.py          # first-run wizard + invite QR
│   └── mobile/                # M22 PWA manifest + service worker
│
├── emergency/                 # M09
│   ├── detector.py            # async probe loop
│   └── state.py               # EmergencyState enum
│
├── crypto/                    # M23
│   └── channel.py             # X25519 + ChaCha20-Poly1305
│
├── blobs/                     # M07
│   └── store.py               # BLAKE3 CID store + chunked reader
│
├── dht/                       # X05
├── federation/                # M14
├── relay/                     # M15
│
├── distributed_inference/     # M26 (experimental)
├── moe/                       # M27 (experimental)
├── fedlearn/                  # M28 (experimental)
├── lora/                      # M29 (experimental)
├── evidence/                  # M30 (experimental)
├── civdef/                    # M31 (experimental)
└── conformance/               # X09
```

---

## Configuration

`~/.config/hearthnet/config.toml` (created on first run with defaults):

```toml
[node]
node_id      = ""          # auto-generated Ed25519 key ID
display_name = "My Node"
data_dir     = "~/.hearthnet"

[transport]
http_port    = 7080
ui_port      = 7860

[llm]
default_backend = "ollama"   # "ollama" | "llama_cpp" | "lmstudio" | "hf_api" | "smollm"

[rag]
corpus_dir      = "~/.hearthnet/corpus"
embedding_model = "BAAI/bge-small-en-v1.5"

[policy.research]
enable                  = false     # master switch for all experimental modules
moe_routing             = false     # M27
distributed_inference   = false     # M26
fedlearn                = false     # M28
lora_beacons            = false     # M29
evidence                = false     # M30
civil_defense           = false     # M31
```

---

## Connecting a Local Node to the HF Space

The HF Space at `https://huggingface.co/spaces/build-small-hackathon/HearthNet` is a
single-node anchor you can peer with from any local machine.

```bash
# 1. Clone and install
git clone https://huggingface.co/spaces/build-small-hackathon/HearthNet
cd HearthNet
pip install -e .

# 2. Run your local node (pick a free port if 7080 is taken)
python -m hearthnet.cli run --http-port 7080 --ui-port 7860

# 3. Manually add the HF Space anchor as a peer (different network = manual)
python -m hearthnet.cli call discovery.peer.add 1 0 \
  '{"endpoint":"https://build-small-hackathon-hearthnet.hf.space","node_id":"hf-space-anchor"}'

# 4. Verify peering
python -m hearthnet.cli call discovery.peers 1 0 '{}'
```

Or use the helper script:
```bash
python scripts/connect_to_hf.py
```

Once peered, your local node can:
- Route LLM queries **from** the HF Space to your local (better) model
- Push community posts that appear in the HF Space UI
- Share blob files across the connection

> **Note:** The HF Space runs on a public server without a static IP for inbound connections.
> Your local node initiates the connection; the HF Space cannot discover you via mDNS.
> Use `discovery.peer.add` or the invite flow to establish the bridge manually.

---

## Security Model

- **Node identity** — Ed25519 key pair generated locally, never leaves the device.
- **Trust levels** — `unknown` → `member` → `trusted` → `anchor`. Capabilities can require a minimum trust level.
- **Capability scoping** — `AuthToken` restricts which capabilities a caller may invoke.
- **Channel encryption** — M23 X25519 ECDH + ChaCha20-Poly1305 for inter-node transport (opt-in, defaults off).
- **Experimental capabilities** — Phase 3 modules are off by default and require explicit opt-in. The bus refuses to register them unless the feature flag is on.
- **No central authority** — there is no HearthNet.com, no certificate authority, no registration server. Trust is established peer-to-peer via invite chains.

---

## Testing

```bash
# Full suite (133 unit + integration tests):
pytest tests/ -q

# Skip slow E2E browser tests:
pytest tests/ -q -k "not e2e"

# Phase 3 experimental module tests only:
pytest tests/test_phase3_experimental.py -v

# Conformance runner (X09):
python -m hearthnet.conformance.runner --output conformance-report/
```

---

*This document is generated from the spec set in `docs/`. For per-module detail see:*
- *Phase 1+2: `00-OVERVIEW.md`, `CAPABILITY_CONTRACT.md`, `modules/M01-*.md` …*
- *Phase 3: `docs/p2_p3/IMPLEMENTATION_REFERENCE_p3.md`, `docs/p2_p3/M26-*.md` …*