Spaces:
Running
Running
mindchain
commited on
Commit
·
4af9fdb
1
Parent(s):
d883c13
Restructure Self-Improving Agent post - 5 components with Docker MCP Server
Browse files- index.html +43 -47
index.html
CHANGED
|
@@ -145,72 +145,68 @@ Plus im Gateway: GitHub, Sentry, Z-Image, Web-Search, Browser Automation
|
|
| 145 |
|
| 146 |
<div class="post">
|
| 147 |
<span class="tag">Agent Training Loop</span>
|
| 148 |
-
<h2>🔄
|
| 149 |
<div class="date">30. Dez 2025 • Closed-Loop AI Agent Training</div>
|
| 150 |
<div class="content"><strong>Die Vision:</strong> Ein Agent, der sich selbst verbessert durch iterative Schleifen.
|
| 151 |
|
| 152 |
-
<strong>
|
| 153 |
|
| 154 |
-
<
|
| 155 |
-
|
|
|
|
|
|
|
| 156 |
• Stop-Hook fährt Resultat ein
|
|
|
|
| 157 |
|
| 158 |
-
<
|
| 159 |
-
|
| 160 |
-
|
|
|
|
|
|
|
| 161 |
• Git-backed - jeder Loop ist versioniert
|
| 162 |
|
| 163 |
-
<
|
| 164 |
-
|
| 165 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 166 |
• Agent lernt aus eigenen Erfolgen/Fehlern
|
| 167 |
|
| 168 |
-
<
|
| 169 |
-
|
| 170 |
-
|
| 171 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 172 |
|
| 173 |
<strong>Use Cases:</strong>
|
| 174 |
• Code-Refactoring Agent trainieren
|
| 175 |
• Bug-Finding Skills verbessern
|
| 176 |
• Domain-spezifische Tasks optimieren
|
| 177 |
|
| 178 |
-
<strong>Die Kombination:</strong> Ralph liefert die Schleife, Beads das Gedächtnis, HF Skills das Lernen.
|
| 179 |
-
|
| 180 |
-
<strong>5. Gemma Scope 2 + Neuronpedia (Interpretability + Steering)</strong>
|
| 181 |
-
Das Agent-Training wird transparent und steuerbar.
|
| 182 |
-
|
| 183 |
-
<span style="color: #667eea;">Discovery Skills</span> - WAS lernt der Agent?
|
| 184 |
-
• SAE Features finden die das Verhalten bestimmen
|
| 185 |
-
• Circuits identifizieren (Kausal-Ketten im Netzwerk)
|
| 186 |
-
• Neuronpedia: 4TB+ activations, explanations, metadata
|
| 187 |
-
• <a href="https://www.neuronpedia.org/gemma-scope-2" class="link">neuronpedia.org/gemma-scope-2</a>
|
| 188 |
-
|
| 189 |
-
<span style="color: #667eea;">Steering Skills</span> - Verhalten beeinflussen
|
| 190 |
-
• Feature-Stärke erhöhen/verringern (↑/↓)
|
| 191 |
-
• API: POST /api/steer mit strength_multiplier
|
| 192 |
-
• "Golden Gate Claude" aber für jeden Feature
|
| 193 |
-
• <a href="https://docs.neuronpedia.org/steering" class="link">Neuronpedia Steering Docs</a>
|
| 194 |
-
|
| 195 |
-
<span style="color: #667eea;">Freezing Skills</span> - Gelerntes fixieren
|
| 196 |
-
• Wichtige Circuits identifizieren und speichern
|
| 197 |
-
• Feature-Vektoren exportieren und wiederverwenden
|
| 198 |
-
• Agent-Verhalten konsistent halten
|
| 199 |
-
• <a href="https://github.com/hijohnnylin/neuronpedia-python" class="link">neuronpedia-python GitHub</a>
|
| 200 |
-
|
| 201 |
-
<strong>Der erweiterte Loop:</strong>
|
| 202 |
-
1. Ralph startet → Agent führt Task aus
|
| 203 |
-
2. Beads tracked → Graph speichert Fortschritt
|
| 204 |
-
3. Gemma Scope 2 → Activations werden analysiert
|
| 205 |
-
4. Neuronpedia → Discovery: Wichtige Features finden
|
| 206 |
-
5. Steering → Agent-Verhalten aktiv korrigieren
|
| 207 |
-
6. HF Skills → Gelerntes in Model trainieren
|
| 208 |
-
7. Freezing → Erfolgreiche Patterns fixieren
|
| 209 |
-
8. Loop wiederholt → Verbesserter Agent
|
| 210 |
-
|
| 211 |
<strong>Links:</strong>
|
| 212 |
<a href="https://github.com/anthropics/claude-code/tree/main/plugins/ralph-wiggum" class="link">Ralph Wiggum GitHub</a>
|
| 213 |
<a href="https://github.com/steveyegge/beads" class="link">Beads GitHub</a>
|
|
|
|
| 214 |
<a href="https://github.com/huggingface/skills" class="link">HF Skills GitHub</a>
|
| 215 |
<a href="https://huggingface.co/blog/hf-skills-training" class="link">HF Skills Blog</a>
|
| 216 |
<a href="https://www.neuronpedia.org/api-doc" class="link">Neuronpedia API</a>
|
|
|
|
| 145 |
|
| 146 |
<div class="post">
|
| 147 |
<span class="tag">Agent Training Loop</span>
|
| 148 |
+
<h2>🔄 Self-Improving Agent Loop</h2>
|
| 149 |
<div class="date">30. Dez 2025 • Closed-Loop AI Agent Training</div>
|
| 150 |
<div class="content"><strong>Die Vision:</strong> Ein Agent, der sich selbst verbessert durch iterative Schleifen.
|
| 151 |
|
| 152 |
+
<strong>Die Komponenten:</strong>
|
| 153 |
|
| 154 |
+
<strong>1. Ralph Wiggum</strong> (Loop Engine)
|
| 155 |
+
Iterative AI-Agentenschleifen mit selbstreferenziellem Feedback.
|
| 156 |
+
<a href="https://github.com/anthropics/claude-code/tree/main/plugins/ralph-wiggum" class="link">Ralph Wiggum GitHub</a>
|
| 157 |
+
• /ralph-loop startet die Schleife
|
| 158 |
• Stop-Hook fährt Resultat ein
|
| 159 |
+
• /cancel-ralph bricht ab
|
| 160 |
|
| 161 |
+
<strong>2. Beads</strong> (Task Memory)
|
| 162 |
+
Git-backed graph issue tracker für Tasks.
|
| 163 |
+
<a href="https://github.com/steveyegge/beads" class="link">Beads GitHub</a>
|
| 164 |
+
• Tasks als Graph-Nodes gespeichert
|
| 165 |
+
• Dependencies und Blocker sichtbar
|
| 166 |
• Git-backed - jeder Loop ist versioniert
|
| 167 |
|
| 168 |
+
<strong>3. Docker MCP Server</strong> (Container Runtime)
|
| 169 |
+
Alles läuft in Containern - reproduzierbar und isoliert.
|
| 170 |
+
<a href="https://docs.docker.com/ai/mcp-catalog-and-toolkit/server-docker/" class="link">Docker MCP Server Docs</a>
|
| 171 |
+
• Agent-Umgebungen on-demand erstellen
|
| 172 |
+
• GPU-Container für Training
|
| 173 |
+
• Jeder Loop in frischem Container
|
| 174 |
+
|
| 175 |
+
<strong>4. HF Skills</strong> (Model Training)
|
| 176 |
+
HuggingFace Skills für Training auf Loop-Ergebnissen.
|
| 177 |
+
<a href="https://github.com/huggingface/skills" class="link">HF Skills GitHub</a>
|
| 178 |
+
• model-trainer: SFT/DPO/GRPO
|
| 179 |
+
• Ergebnisse werden Dataset
|
| 180 |
• Agent lernt aus eigenen Erfolgen/Fehlern
|
| 181 |
|
| 182 |
+
<strong>5. Gemma Scope 2 + Neuronpedia</strong> (Interpretability)
|
| 183 |
+
Training wird transparent und steuerbar.
|
| 184 |
+
<a href="https://www.neuronpedia.org/gemma-scope-2" class="link">neuronpedia.org/gemma-scope-2</a>
|
| 185 |
+
|
| 186 |
+
<span style="color: #667eea;">Discovery:</span> SAE Features finden die Verhalten bestimmen
|
| 187 |
+
<span style="color: #667eea;">Steering:</span> Feature-Stärke ändern (↑/↓)
|
| 188 |
+
<span style="color: #667eea;">Freezing:</span> Gelernte Patterns fixieren
|
| 189 |
+
|
| 190 |
+
<strong>Der vollständige Loop:</strong>
|
| 191 |
+
1. Ralph startet → Agent führt Task aus
|
| 192 |
+
2. Beads tracked → Graph speichert Fortschritt
|
| 193 |
+
3. Docker MCP → Frische Container für jeden Schritt
|
| 194 |
+
4. Gemma Scope 2 → Activations werden analysiert
|
| 195 |
+
5. Neuronpedia → Discovery: Wichtige Features finden
|
| 196 |
+
6. Steering → Agent-Verhalten aktiv korrigieren
|
| 197 |
+
7. HF Skills → Gelerntes in Model trainieren
|
| 198 |
+
8. Freezing → Erfolgreiche Patterns fixieren
|
| 199 |
+
9. Loop wiederholt → Verbesserter Agent
|
| 200 |
|
| 201 |
<strong>Use Cases:</strong>
|
| 202 |
• Code-Refactoring Agent trainieren
|
| 203 |
• Bug-Finding Skills verbessern
|
| 204 |
• Domain-spezifische Tasks optimieren
|
| 205 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 206 |
<strong>Links:</strong>
|
| 207 |
<a href="https://github.com/anthropics/claude-code/tree/main/plugins/ralph-wiggum" class="link">Ralph Wiggum GitHub</a>
|
| 208 |
<a href="https://github.com/steveyegge/beads" class="link">Beads GitHub</a>
|
| 209 |
+
<a href="https://docs.docker.com/ai/mcp-catalog-and-toolkit/server-docker/" class="link">Docker MCP Server</a>
|
| 210 |
<a href="https://github.com/huggingface/skills" class="link">HF Skills GitHub</a>
|
| 211 |
<a href="https://huggingface.co/blog/hf-skills-training" class="link">HF Skills Blog</a>
|
| 212 |
<a href="https://www.neuronpedia.org/api-doc" class="link">Neuronpedia API</a>
|