LOGOS-SPCW-Matroska / N8N_ARCHITECTURE.md
GitHub Copilot
Protocol 22: Update HF Inference to Router endpoint
edae06c

N8N Mixture of Agents (MoA) Architecture

The "Google Antigravity" Neural Router

1. Core Philosophy

Treat n8n as a Neural Router, decoupling "Thinking" (Logic/Architecture) from "Inference" (Execution/Code). This bypasses latencies and refusals by routing tasks to the most efficient model.

2. Infrastructure: The "OpenAI-Compatible" Bridge

Optimization: Run n8n NATIVELY on Windows (npm install -g n8n) instead of Docker.

  • Why: Eliminates the host.docker.internal bridge bottleneck.
  • Effect: N8N talks directly to localhost:1234 with zero latency overhead.

Standardize all providers to the OpenAI API protocol.

Local (Code & Privacy)

  • Tool: Ollama / LM Studio (The "New Friends" Cluster)
  • Endpoint:
    • Ollama: http://localhost:11434/v1
    • LM Studio: http://localhost:1234/v1

Local Stack (The "Nano Swarm")

Instead of one giant model, use a stack of specialized lightweight models to save RAM:

  • Router/Logic: nvidia/nemotron-3-nano or Phi-3-Mini (High logic/param ratio).
  • Coding: deepseek-coder-6.7b or dolphin-2.9-llama3-8b.
  • Creative: openhermes-2.5-mistral-7b.

Configuration:

  • Endpoint: http://localhost:1234/v1
  • Multi-Model: If using LM Studio, load the specific model needed for the batch, or run multiple instances on ports 1234, 1235, 1236.

Workflow Import

A ready-to-use workflow file has been generated at: hf_space/logos_n8n_workflow.json

Usage:

  1. Open N8N Editor.
  2. Click Workflow > Import from File.
  3. Select logos_n8n_workflow.json.
  4. Execute. It will scan your codebase using the Local Nano Swarm.

Connection Health Check

Verify the stack is active with this rhyme test:

curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nvidia/nemotron-3-nano",
    "messages": [
        {"role": "system", "content": "Always answer in rhymes. Today is Thursday"},
        {"role": "user", "content": "What day is it today?"}
    ],
    "temperature": 0.7,
    "stream": false
}'
  1. Click the Local Server icon (<->) on the left sidebar.
  2. Ensure settings:
    • Port: 1234
    • CORS: On (Recommended)
  3. Click Start Server.
  4. Green Light: The log should say Server listening on http://localhost:1234.

High-Speed Inference (Math & Logic)

  • Tool: DeepInfra / Groq
  • Endpoint: https://api.deepseek.com/v1
  • Model: deepseek-v3 (Math verification, topology)

Synthesis (Architecture)

  • Tool: Google Vertex / Gemini
  • Role: Systems Architect (High-level synthesis)

3. The N8N Topology: "Router & Jury"

Phase A: The Dispatcher (Llama-3-8B-Groq)

Classifies incoming request type:

  • Systems Architecture -> Route to Gemini
  • Python Implementation -> Route to Dolphin (Local)
  • Mathematical Proof -> Route to DeepSeek (API)

Phase B: Parallel Execution

Use Merge Node (Wait Mode) to execute paths simultaneously.

  1. Path 1 (Math): DeepSeek analyzes Prime Potentiality/Manifold logic.
  2. Path 2 (Code): Dolphin writes adapters/scripts locally.
    • Implementation Helper:
      # Use a pipeline as a high-level helper for local execution
      from transformers import pipeline
      pipe = pipeline("text-generation", model="dphn/Dolphin-X1-8B-GGUF")
      messages = [{"role": "user", "content": "Write the adapter."}]
      pipe(messages)
      
  3. Path 3 (Sys): Gemini drafts Strategy/README.

Phase C: Consensus (The Annealing)

Final LLM Node synthesizes outputs:

"Synthesize perspectives. If Dolphin's code conflicts with DeepSeek's math, prioritize DeepSeek constraints."

4. Implementation Config

HTTP Request Node (Generic)

  • Method: POST
  • URL: {{ $json.baseUrl }}/chat/completions
  • Headers: Authorization: Bearer {{ $json.apiKey }}
  • Body:
{
  "model": "{{ $json.modelName }}",
  "messages": [
    { "role": "system", "content": "You are a LOGOS systems engineer." },
    { "role": "user", "content": "{{ $json.prompt }}" }
  ],
  "temperature": 0.2
}

Model Selector (Code Node)

if (items[0].json.taskType === "coding") {
  return { json: {
      baseUrl: "http://host.docker.internal:11434/v1",
      modelName: "dolphin-llama3",
      apiKey: "ollama"
  }};
} else if (items[0].json.taskType === "math") {
  return { json: {
      baseUrl: "https://api.deepseek.com/v1",
      modelName: "deepseek-coder",
      apiKey: "YOUR_DEEPSEEK_KEY"
  }};
}

This architecture breaks the bottleneck by using Dolphin for grunt work (local/free) and specialized models for high-IQ tasks.