File size: 4,732 Bytes
4bd91a4
 
 
 
 
 
 
edae06c
 
 
 
4bd91a4
 
 
edae06c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4bd91a4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
edae06c
 
 
 
 
 
 
 
4bd91a4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
# N8N Mixture of Agents (MoA) Architecture
The "Google Antigravity" Neural Router

## 1. Core Philosophy
Treat n8n as a **Neural Router**, decoupling "Thinking" (Logic/Architecture) from "Inference" (Execution/Code). This bypasses latencies and refusals by routing tasks to the most efficient model.

## 2. Infrastructure: The "OpenAI-Compatible" Bridge
**Optimization**: Run n8n **NATIVELY** on Windows (`npm install -g n8n`) instead of Docker.
- **Why**: Eliminates the `host.docker.internal` bridge bottleneck.
- **Effect**: N8N talks directly to `localhost:1234` with zero latency overhead.

Standardize all providers to the OpenAI API protocol.

### Local (Code & Privacy)
- **Tool**: Ollama / LM Studio (The "New Friends" Cluster)
- **Endpoint**: 
    - Ollama: `http://localhost:11434/v1`
    - LM Studio: `http://localhost:1234/v1`
### Local Stack (The "Nano Swarm")
Instead of one giant model, use a stack of specialized lightweight models to save RAM:
-   **Router/Logic**: `nvidia/nemotron-3-nano` or `Phi-3-Mini` (High logic/param ratio).
-   **Coding**: `deepseek-coder-6.7b` or `dolphin-2.9-llama3-8b`.
-   **Creative**: `openhermes-2.5-mistral-7b`.

**Configuration**:
-   **Endpoint**: `http://localhost:1234/v1`
-   **Multi-Model**: If using LM Studio, load the specific model needed for the batch, or run multiple instances on ports `1234`, `1235`, `1236`.

### Workflow Import
A ready-to-use workflow file has been generated at:
`hf_space/logos_n8n_workflow.json`

**Usage**:
1.  Open N8N Editor.
2.  Click **Workflow** > **Import from File**.
3.  Select `logos_n8n_workflow.json`.
4.  Execute. It will scan your codebase using the Local Nano Swarm.

### Connection Health Check
Verify the stack is active with this rhyme test:
```bash
curl http://localhost:1234/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nvidia/nemotron-3-nano",
    "messages": [
        {"role": "system", "content": "Always answer in rhymes. Today is Thursday"},
        {"role": "user", "content": "What day is it today?"}
    ],
    "temperature": 0.7,
    "stream": false
}'
```
2.  Click the **Local Server** icon (`<->`) on the left sidebar.
3.  Ensure settings:
    -   **Port**: `1234`
    -   **CORS**: On (Recommended)
4.  Click **Start Server**.
5.  *Green Light*: The log should say `Server listening on http://localhost:1234`.

### High-Speed Inference (Math & Logic)
- **Tool**: DeepInfra / Groq
- **Endpoint**: `https://api.deepseek.com/v1`
- **Model**: `deepseek-v3` (Math verification, topology)

### Synthesis (Architecture)
- **Tool**: Google Vertex / Gemini
- **Role**: Systems Architect (High-level synthesis)

## 3. The N8N Topology: "Router & Jury"

### Phase A: The Dispatcher (Llama-3-8B-Groq)
Classifies incoming request type:
- **Systems Architecture** -> Route to Gemini
- **Python Implementation** -> Route to Dolphin (Local)
- **Mathematical Proof** -> Route to DeepSeek (API)

### Phase B: Parallel Execution
Use **Merge Node (Wait Mode)** to execute paths simultaneously.
1.  **Path 1 (Math)**: DeepSeek analyzes Prime Potentiality/Manifold logic.
2.  **Path 2 (Code)**: Dolphin writes adapters/scripts locally.
    -   *Implementation Helper*:
        ```python
        # Use a pipeline as a high-level helper for local execution
        from transformers import pipeline
        pipe = pipeline("text-generation", model="dphn/Dolphin-X1-8B-GGUF")
        messages = [{"role": "user", "content": "Write the adapter."}]
        pipe(messages)
        ```
3.  **Path 3 (Sys)**: Gemini drafts Strategy/README.

### Phase C: Consensus (The Annealing)
Final LLM Node synthesizes outputs:
> "Synthesize perspectives. If Dolphin's code conflicts with DeepSeek's math, prioritize DeepSeek constraints."

## 4. Implementation Config

### HTTP Request Node (Generic)
- **Method**: POST
- **URL**: `{{ $json.baseUrl }}/chat/completions`
- **Headers**: `Authorization: Bearer {{ $json.apiKey }}`
- **Body**:
```json
{
  "model": "{{ $json.modelName }}",
  "messages": [
    { "role": "system", "content": "You are a LOGOS systems engineer." },
    { "role": "user", "content": "{{ $json.prompt }}" }
  ],
  "temperature": 0.2
}
```

### Model Selector (Code Node)
```javascript
if (items[0].json.taskType === "coding") {
  return { json: {
      baseUrl: "http://host.docker.internal:11434/v1",
      modelName: "dolphin-llama3",
      apiKey: "ollama"
  }};
} else if (items[0].json.taskType === "math") {
  return { json: {
      baseUrl: "https://api.deepseek.com/v1",
      modelName: "deepseek-coder",
      apiKey: "YOUR_DEEPSEEK_KEY"
  }};
}
```

This architecture breaks the bottleneck by using Dolphin for grunt work (local/free) and specialized models for high-IQ tasks.