LOGOS-SPCW-Matroska / N8N_ARCHITECTURE.md
GitHub Copilot
Protocol 22: Update HF Inference to Router endpoint
edae06c
# N8N Mixture of Agents (MoA) Architecture
The "Google Antigravity" Neural Router
## 1. Core Philosophy
Treat n8n as a **Neural Router**, decoupling "Thinking" (Logic/Architecture) from "Inference" (Execution/Code). This bypasses latencies and refusals by routing tasks to the most efficient model.
## 2. Infrastructure: The "OpenAI-Compatible" Bridge
**Optimization**: Run n8n **NATIVELY** on Windows (`npm install -g n8n`) instead of Docker.
- **Why**: Eliminates the `host.docker.internal` bridge bottleneck.
- **Effect**: N8N talks directly to `localhost:1234` with zero latency overhead.
Standardize all providers to the OpenAI API protocol.
### Local (Code & Privacy)
- **Tool**: Ollama / LM Studio (The "New Friends" Cluster)
- **Endpoint**:
- Ollama: `http://localhost:11434/v1`
- LM Studio: `http://localhost:1234/v1`
### Local Stack (The "Nano Swarm")
Instead of one giant model, use a stack of specialized lightweight models to save RAM:
- **Router/Logic**: `nvidia/nemotron-3-nano` or `Phi-3-Mini` (High logic/param ratio).
- **Coding**: `deepseek-coder-6.7b` or `dolphin-2.9-llama3-8b`.
- **Creative**: `openhermes-2.5-mistral-7b`.
**Configuration**:
- **Endpoint**: `http://localhost:1234/v1`
- **Multi-Model**: If using LM Studio, load the specific model needed for the batch, or run multiple instances on ports `1234`, `1235`, `1236`.
### Workflow Import
A ready-to-use workflow file has been generated at:
`hf_space/logos_n8n_workflow.json`
**Usage**:
1. Open N8N Editor.
2. Click **Workflow** > **Import from File**.
3. Select `logos_n8n_workflow.json`.
4. Execute. It will scan your codebase using the Local Nano Swarm.
### Connection Health Check
Verify the stack is active with this rhyme test:
```bash
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "nvidia/nemotron-3-nano",
"messages": [
{"role": "system", "content": "Always answer in rhymes. Today is Thursday"},
{"role": "user", "content": "What day is it today?"}
],
"temperature": 0.7,
"stream": false
}'
```
2. Click the **Local Server** icon (`<->`) on the left sidebar.
3. Ensure settings:
- **Port**: `1234`
- **CORS**: On (Recommended)
4. Click **Start Server**.
5. *Green Light*: The log should say `Server listening on http://localhost:1234`.
### High-Speed Inference (Math & Logic)
- **Tool**: DeepInfra / Groq
- **Endpoint**: `https://api.deepseek.com/v1`
- **Model**: `deepseek-v3` (Math verification, topology)
### Synthesis (Architecture)
- **Tool**: Google Vertex / Gemini
- **Role**: Systems Architect (High-level synthesis)
## 3. The N8N Topology: "Router & Jury"
### Phase A: The Dispatcher (Llama-3-8B-Groq)
Classifies incoming request type:
- **Systems Architecture** -> Route to Gemini
- **Python Implementation** -> Route to Dolphin (Local)
- **Mathematical Proof** -> Route to DeepSeek (API)
### Phase B: Parallel Execution
Use **Merge Node (Wait Mode)** to execute paths simultaneously.
1. **Path 1 (Math)**: DeepSeek analyzes Prime Potentiality/Manifold logic.
2. **Path 2 (Code)**: Dolphin writes adapters/scripts locally.
- *Implementation Helper*:
```python
# Use a pipeline as a high-level helper for local execution
from transformers import pipeline
pipe = pipeline("text-generation", model="dphn/Dolphin-X1-8B-GGUF")
messages = [{"role": "user", "content": "Write the adapter."}]
pipe(messages)
```
3. **Path 3 (Sys)**: Gemini drafts Strategy/README.
### Phase C: Consensus (The Annealing)
Final LLM Node synthesizes outputs:
> "Synthesize perspectives. If Dolphin's code conflicts with DeepSeek's math, prioritize DeepSeek constraints."
## 4. Implementation Config
### HTTP Request Node (Generic)
- **Method**: POST
- **URL**: `{{ $json.baseUrl }}/chat/completions`
- **Headers**: `Authorization: Bearer {{ $json.apiKey }}`
- **Body**:
```json
{
"model": "{{ $json.modelName }}",
"messages": [
{ "role": "system", "content": "You are a LOGOS systems engineer." },
{ "role": "user", "content": "{{ $json.prompt }}" }
],
"temperature": 0.2
}
```
### Model Selector (Code Node)
```javascript
if (items[0].json.taskType === "coding") {
return { json: {
baseUrl: "http://host.docker.internal:11434/v1",
modelName: "dolphin-llama3",
apiKey: "ollama"
}};
} else if (items[0].json.taskType === "math") {
return { json: {
baseUrl: "https://api.deepseek.com/v1",
modelName: "deepseek-coder",
apiKey: "YOUR_DEEPSEEK_KEY"
}};
}
```
This architecture breaks the bottleneck by using Dolphin for grunt work (local/free) and specialized models for high-IQ tasks.