Spaces:
Runtime error
N8N Mixture of Agents (MoA) Architecture
The "Google Antigravity" Neural Router
1. Core Philosophy
Treat n8n as a Neural Router, decoupling "Thinking" (Logic/Architecture) from "Inference" (Execution/Code). This bypasses latencies and refusals by routing tasks to the most efficient model.
2. Infrastructure: The "OpenAI-Compatible" Bridge
Optimization: Run n8n NATIVELY on Windows (npm install -g n8n) instead of Docker.
- Why: Eliminates the
host.docker.internalbridge bottleneck. - Effect: N8N talks directly to
localhost:1234with zero latency overhead.
Standardize all providers to the OpenAI API protocol.
Local (Code & Privacy)
- Tool: Ollama / LM Studio (The "New Friends" Cluster)
- Endpoint:
- Ollama:
http://localhost:11434/v1 - LM Studio:
http://localhost:1234/v1
- Ollama:
Local Stack (The "Nano Swarm")
Instead of one giant model, use a stack of specialized lightweight models to save RAM:
- Router/Logic:
nvidia/nemotron-3-nanoorPhi-3-Mini(High logic/param ratio). - Coding:
deepseek-coder-6.7bordolphin-2.9-llama3-8b. - Creative:
openhermes-2.5-mistral-7b.
Configuration:
- Endpoint:
http://localhost:1234/v1 - Multi-Model: If using LM Studio, load the specific model needed for the batch, or run multiple instances on ports
1234,1235,1236.
Workflow Import
A ready-to-use workflow file has been generated at:
hf_space/logos_n8n_workflow.json
Usage:
- Open N8N Editor.
- Click Workflow > Import from File.
- Select
logos_n8n_workflow.json. - Execute. It will scan your codebase using the Local Nano Swarm.
Connection Health Check
Verify the stack is active with this rhyme test:
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "nvidia/nemotron-3-nano",
"messages": [
{"role": "system", "content": "Always answer in rhymes. Today is Thursday"},
{"role": "user", "content": "What day is it today?"}
],
"temperature": 0.7,
"stream": false
}'
- Click the Local Server icon (
<->) on the left sidebar. - Ensure settings:
- Port:
1234 - CORS: On (Recommended)
- Port:
- Click Start Server.
- Green Light: The log should say
Server listening on http://localhost:1234.
High-Speed Inference (Math & Logic)
- Tool: DeepInfra / Groq
- Endpoint:
https://api.deepseek.com/v1 - Model:
deepseek-v3(Math verification, topology)
Synthesis (Architecture)
- Tool: Google Vertex / Gemini
- Role: Systems Architect (High-level synthesis)
3. The N8N Topology: "Router & Jury"
Phase A: The Dispatcher (Llama-3-8B-Groq)
Classifies incoming request type:
- Systems Architecture -> Route to Gemini
- Python Implementation -> Route to Dolphin (Local)
- Mathematical Proof -> Route to DeepSeek (API)
Phase B: Parallel Execution
Use Merge Node (Wait Mode) to execute paths simultaneously.
- Path 1 (Math): DeepSeek analyzes Prime Potentiality/Manifold logic.
- Path 2 (Code): Dolphin writes adapters/scripts locally.
- Implementation Helper:
# Use a pipeline as a high-level helper for local execution from transformers import pipeline pipe = pipeline("text-generation", model="dphn/Dolphin-X1-8B-GGUF") messages = [{"role": "user", "content": "Write the adapter."}] pipe(messages)
- Implementation Helper:
- Path 3 (Sys): Gemini drafts Strategy/README.
Phase C: Consensus (The Annealing)
Final LLM Node synthesizes outputs:
"Synthesize perspectives. If Dolphin's code conflicts with DeepSeek's math, prioritize DeepSeek constraints."
4. Implementation Config
HTTP Request Node (Generic)
- Method: POST
- URL:
{{ $json.baseUrl }}/chat/completions - Headers:
Authorization: Bearer {{ $json.apiKey }} - Body:
{
"model": "{{ $json.modelName }}",
"messages": [
{ "role": "system", "content": "You are a LOGOS systems engineer." },
{ "role": "user", "content": "{{ $json.prompt }}" }
],
"temperature": 0.2
}
Model Selector (Code Node)
if (items[0].json.taskType === "coding") {
return { json: {
baseUrl: "http://host.docker.internal:11434/v1",
modelName: "dolphin-llama3",
apiKey: "ollama"
}};
} else if (items[0].json.taskType === "math") {
return { json: {
baseUrl: "https://api.deepseek.com/v1",
modelName: "deepseek-coder",
apiKey: "YOUR_DEEPSEEK_KEY"
}};
}
This architecture breaks the bottleneck by using Dolphin for grunt work (local/free) and specialized models for high-IQ tasks.