VibecoderMcSwaggins commited on
Commit
9cfbd6a
·
1 Parent(s): a31cea6

docs: Update SPEC_16 with namespace neutrality and Gemini strategy

Browse files
docs/specs/SPEC_16_UNIFIED_CHAT_CLIENT_ARCHITECTURE.md CHANGED
@@ -2,12 +2,18 @@
2
 
3
  **Status**: Proposed
4
  **Priority**: P1 (Architectural Simplification)
5
- **Issue**: Updates [#105](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/105)
6
  **Created**: 2025-12-01
7
 
8
  ## Summary
9
 
10
- Eliminate the Simple Mode / Advanced Mode parallel universe by implementing a pluggable `ChatClient` architecture. This allows the multi-agent framework to work with ANY LLM provider (OpenAI, HuggingFace, Anthropic, etc.) through a single, unified codebase.
 
 
 
 
 
 
11
 
12
  ## Problem Statement
13
 
@@ -18,7 +24,7 @@ User Query
18
 
19
  ├── Has API Key? ──Yes──→ Advanced Mode (400 lines)
20
  │ └── Microsoft Agent Framework
21
- │ └── OpenAIChatClient (hardcoded)
22
 
23
  └── No API Key? ──────────→ Simple Mode (761 lines)
24
  └── While-loop orchestration
@@ -26,27 +32,10 @@ User Query
26
  ```
27
 
28
  **Problems:**
29
- 1. **Double Maintenance**: 1,161 lines across two systems
30
- 2. **Feature Drift**: New features must be implemented twice
31
- 3. **Bug Duplication**: Same bugs appear in both systems
32
- 4. **Testing Burden**: Two test suites, two CI paths
33
- 5. **Cognitive Load**: Developers must understand both patterns
34
-
35
- ### Root Cause Analysis
36
-
37
- The issue #105 stated: "Microsoft Agent Framework's OpenAIChatClient only speaks OpenAI API format."
38
-
39
- **This is FALSE.** Upon investigation:
40
-
41
- ```python
42
- # Microsoft Agent Framework provides:
43
- from agent_framework import BaseChatClient, ChatClientProtocol
44
-
45
- # Abstract methods to implement:
46
- frozenset({'_inner_get_response', '_inner_get_streaming_response'})
47
- ```
48
-
49
- The framework IS designed for pluggable clients. We just never implemented alternatives.
50
 
51
  ## Proposed Solution: ChatClientFactory
52
 
@@ -57,10 +46,10 @@ User Query
57
 
58
  └──→ Advanced Mode (unified)
59
  └── Microsoft Agent Framework
60
- └── ChatClientFactory:
61
- ├── OpenAIChatClient (API key present)
62
- ├── AnthropicChatClient (Anthropic key)
63
- └── HuggingFaceChatClient (free fallback)
64
  ```
65
 
66
  ### New Files
@@ -69,10 +58,10 @@ User Query
69
  src/
70
  ├── clients/
71
  │ ├── __init__.py
72
- │ ├── base.py # Re-export BaseChatClient
73
  │ ├── factory.py # ChatClientFactory
74
- │ ├── huggingface.py # HuggingFaceChatClient (~200 lines)
75
- │ └── anthropic.py # AnthropicChatClient (~200 lines) [future]
76
  ```
77
 
78
  ### ChatClientFactory Implementation
@@ -81,7 +70,6 @@ src/
81
  # src/clients/factory.py
82
  from agent_framework import BaseChatClient
83
  from agent_framework.openai import OpenAIChatClient
84
-
85
  from src.utils.config import settings
86
 
87
  def get_chat_client(
@@ -93,335 +81,89 @@ def get_chat_client(
93
 
94
  Auto-detection priority:
95
  1. Explicit provider parameter
96
- 2. OpenAI key (highest quality)
97
- 3. Anthropic key
98
- 4. HuggingFace (free fallback)
99
 
100
  Args:
101
- provider: Force specific provider ("openai", "anthropic", "huggingface")
102
  api_key: Override API key for the provider
103
 
104
  Returns:
105
- Configured BaseChatClient instance
106
  """
 
107
  if provider == "openai" or (provider is None and settings.has_openai_key):
108
  return OpenAIChatClient(
109
  model_id=settings.openai_model,
110
  api_key=api_key or settings.openai_api_key,
111
  )
112
 
113
- if provider == "anthropic" or (provider is None and settings.has_anthropic_key):
114
- from src.clients.anthropic import AnthropicChatClient
115
- return AnthropicChatClient(
116
- model_id=settings.anthropic_model,
117
- api_key=api_key or settings.anthropic_api_key,
 
118
  )
119
 
120
- # Free fallback
121
  from src.clients.huggingface import HuggingFaceChatClient
122
  return HuggingFaceChatClient(
123
  model_id="meta-llama/Llama-3.1-70B-Instruct",
124
  )
125
  ```
126
 
127
- ### HuggingFaceChatClient Implementation
128
-
129
- ```python
130
- # src/clients/huggingface.py
131
- from collections.abc import AsyncIterable
132
- from typing import Any
133
-
134
- from agent_framework import (
135
- BaseChatClient,
136
- ChatMessage,
137
- ChatResponse,
138
- ChatResponseUpdate,
139
- TextContent,
140
- FunctionCallContent,
141
- )
142
- from huggingface_hub import InferenceClient
143
-
144
- class HuggingFaceChatClient(BaseChatClient):
145
- """
146
- HuggingFace Inference adapter for Microsoft Agent Framework.
147
-
148
- Enables multi-agent orchestration using free HuggingFace models
149
- like Llama 3.1 70B Instruct (supports function calling).
150
- """
151
-
152
- def __init__(
153
- self,
154
- model_id: str = "meta-llama/Llama-3.1-70B-Instruct",
155
- api_key: str | None = None,
156
- ):
157
- self._model_id = model_id
158
- self._client = InferenceClient(model=model_id, token=api_key)
159
-
160
- def service_url(self) -> str:
161
- return "https://api-inference.huggingface.co"
162
-
163
- async def _inner_get_response(
164
- self,
165
- messages: list[ChatMessage],
166
- **kwargs: Any,
167
- ) -> ChatResponse:
168
- """Convert and call HuggingFace, return ChatResponse."""
169
- # Convert ChatMessage[] to HuggingFace format
170
- hf_messages = self._convert_messages_to_hf(messages)
171
-
172
- # Handle tools/function calling if present
173
- tools = kwargs.get("tools")
174
- hf_tools = self._convert_tools_to_hf(tools) if tools else None
175
-
176
- # Call HuggingFace API
177
- response = await self._client.chat_completion(
178
- messages=hf_messages,
179
- tools=hf_tools,
180
- max_tokens=kwargs.get("max_tokens", 4096),
181
- temperature=kwargs.get("temperature", 0.7),
182
- )
183
-
184
- # Convert response back to ChatResponse
185
- return self._convert_response_from_hf(response)
186
-
187
- async def _inner_get_streaming_response(
188
- self,
189
- messages: list[ChatMessage],
190
- **kwargs: Any,
191
- ) -> AsyncIterable[ChatResponseUpdate]:
192
- """Streaming version of response generation."""
193
- hf_messages = self._convert_messages_to_hf(messages)
194
-
195
- async for chunk in self._client.chat_completion(
196
- messages=hf_messages,
197
- stream=True,
198
- **kwargs,
199
- ):
200
- yield self._convert_chunk_from_hf(chunk)
201
-
202
- def _convert_messages_to_hf(self, messages: list[ChatMessage]) -> list[dict]:
203
- """Convert Agent Framework messages to HuggingFace format."""
204
- result = []
205
- for msg in messages:
206
- hf_msg = {"role": msg.role.value}
207
-
208
- # Extract text content
209
- if msg.text:
210
- hf_msg["content"] = str(msg.text)
211
- elif msg.contents:
212
- # Handle multi-part content
213
- hf_msg["content"] = " ".join(
214
- str(c.text) for c in msg.contents
215
- if hasattr(c, "text")
216
- )
217
-
218
- # Handle function calls
219
- if any(isinstance(c, FunctionCallContent) for c in (msg.contents or [])):
220
- hf_msg["tool_calls"] = [
221
- self._convert_function_call(c)
222
- for c in msg.contents
223
- if isinstance(c, FunctionCallContent)
224
- ]
225
-
226
- result.append(hf_msg)
227
- return result
228
-
229
- def _convert_tools_to_hf(self, tools) -> list[dict] | None:
230
- """Convert Agent Framework tools to HuggingFace format."""
231
- if not tools:
232
- return None
233
-
234
- hf_tools = []
235
- for tool in tools:
236
- if hasattr(tool, "to_dict"):
237
- # ToolProtocol objects
238
- hf_tools.append({
239
- "type": "function",
240
- "function": tool.to_dict(),
241
- })
242
- elif callable(tool):
243
- # ai_function decorated functions
244
- hf_tools.append({
245
- "type": "function",
246
- "function": {
247
- "name": tool.__name__,
248
- "description": tool.__doc__ or "",
249
- "parameters": getattr(tool, "__schema__", {}),
250
- }
251
- })
252
- return hf_tools or None
253
-
254
- def _convert_response_from_hf(self, response) -> ChatResponse:
255
- """Convert HuggingFace response to ChatResponse."""
256
- choice = response.choices[0]
257
- message = choice.message
258
-
259
- contents = []
260
-
261
- # Text content
262
- if message.content:
263
- contents.append(TextContent(text=message.content))
264
-
265
- # Function/tool calls
266
- if message.tool_calls:
267
- for tc in message.tool_calls:
268
- contents.append(FunctionCallContent(
269
- call_id=tc.id,
270
- name=tc.function.name,
271
- arguments=tc.function.arguments,
272
- ))
273
-
274
- return ChatResponse(
275
- text=message.content,
276
- model_id=self._model_id,
277
- finish_reason={"type": choice.finish_reason},
278
- )
279
- ```
280
-
281
  ### Changes to Advanced Orchestrator
282
 
283
  ```python
284
  # src/orchestrators/advanced.py
285
 
286
- # BEFORE (hardcoded):
287
  from agent_framework.openai import OpenAIChatClient
288
 
289
  class AdvancedOrchestrator:
290
  def __init__(self, ...):
291
  self._chat_client = OpenAIChatClient(...)
292
 
293
- # AFTER (factory):
294
  from src.clients.factory import get_chat_client
295
 
296
  class AdvancedOrchestrator:
297
  def __init__(self, chat_client=None, provider=None, api_key=None, ...):
 
298
  self._chat_client = chat_client or get_chat_client(
299
  provider=provider,
300
  api_key=api_key,
301
  )
302
  ```
303
 
304
- ## Files to Delete After Implementation
305
-
306
- | File | Lines | Reason |
307
- |------|-------|--------|
308
- | `src/orchestrators/simple.py` | 761 | Replaced by unified Advanced Mode |
309
- | `src/tools/search_handler.py` | ~150 | Manager agent handles orchestration |
310
- | `src/agent_factory/judges.py` (JudgeHandler) | ~200 | JudgeAgent replaces this |
311
-
312
- **Total deletion: ~1,100 lines**
313
- **Total addition: ~400 lines (new clients)**
314
- **Net: -700 lines, single architecture**
315
-
316
  ## Migration Plan
317
 
318
- ### Phase 1: Implement HuggingFaceChatClient
319
- - [ ] Create `src/clients/` package
320
- - [ ] Implement `HuggingFaceChatClient` with function calling
321
- - [ ] Write unit tests for message/tool conversion
322
- - [ ] Test with simple queries (no multi-agent)
 
323
 
324
- ### Phase 2: Integrate into Advanced Mode
325
- - [ ] Create `ChatClientFactory`
326
- - [ ] Update `AdvancedOrchestrator` to use factory
327
- - [ ] Update `magentic_agents.py` to accept any `BaseChatClient`
328
- - [ ] Test full multi-agent flow with HuggingFace
329
 
330
  ### Phase 3: Deprecate Simple Mode
331
- - [ ] Add deprecation warning to Simple Mode
332
- - [ ] Update factory.py to only return AdvancedOrchestrator
333
- - [ ] Update UI to remove mode selection (auto-detect only)
334
- - [ ] Run full regression tests
335
-
336
- ### Phase 4: Remove Simple Mode
337
- - [ ] Delete `simple.py`
338
- - [ ] Delete `search_handler.py`
339
- - [ ] Remove JudgeHandler classes
340
- - [ ] Archive to `docs/archive/` for reference
341
- - [ ] Update all tests
342
-
343
- ## Risks and Mitigations
344
-
345
- ### Risk 1: HuggingFace Rate Limits
346
- **Problem**: Free tier may throttle multi-agent flows (5-10 LLM calls per query)
347
- **Mitigation**:
348
- - Add exponential backoff with retries
349
- - Cache manager decisions where possible
350
- - Consider paid HF Pro ($9/month) for demo
351
-
352
- ### Risk 2: Function Calling Quality
353
- **Problem**: Llama 3.1 70B function calling may be less reliable than GPT-5
354
- **Mitigation**:
355
- - Add validation/retry on malformed tool calls
356
- - Fall back to text parsing if JSON fails
357
- - Test extensively before removing Simple Mode
358
-
359
- ### Risk 3: Response Format Differences
360
- **Problem**: HuggingFace responses may have subtle format differences
361
- **Mitigation**:
362
- - Comprehensive conversion functions
363
- - Unit tests covering edge cases
364
- - Integration tests with real API
365
-
366
- ## Success Criteria
367
-
368
- 1. **Single Codebase**: No more Simple/Advanced split
369
- 2. **Zero API Key Demo**: HuggingFace Spaces works without user API key
370
- 3. **Quality Parity**: Free tier produces comparable research reports
371
- 4. **Maintainability**: One test suite, one bug tracker, one feature path
372
-
373
- ## Full Stack Analysis
374
-
375
- ### Files Requiring Changes (Category 1: Core)
376
-
377
- | File | Refs | Change |
378
- |------|------|--------|
379
- | `src/orchestrators/advanced.py` | 8 | `OpenAIChatClient` → `get_chat_client()` |
380
- | `src/agents/magentic_agents.py` | 12 | Type: `OpenAIChatClient` → `BaseChatClient` |
381
- | `src/agents/retrieval_agent.py` | 4 | Same pattern |
382
- | `src/agents/code_executor_agent.py` | 4 | Same pattern |
383
- | `src/utils/llm_factory.py` | 8 | Merge into `clients/factory.py` |
384
-
385
- ### Files to Delete (Category 2: Simple Mode)
386
-
387
- | File | Lines | Reason |
388
- |------|-------|--------|
389
- | `src/orchestrators/simple.py` | 761 | Replaced by unified system |
390
- | `src/agent_factory/judges.py` (handlers) | ~200 | JudgeAgent replaces |
391
- | `src/tools/search_handler.py` | ~150 | Manager agent replaces |
392
-
393
- ### Files Unchanged (Category 3: Embeddings)
394
-
395
- Embedding services are a **separate concern**:
396
- - `src/services/llamaindex_rag.py` - Premium tier (OpenAI embeddings)
397
- - `src/services/embeddings.py` - Free tier (local sentence-transformers)
398
-
399
- Both work today. No changes needed.
400
-
401
- ### Config Toggle (Future Enhancement)
402
-
403
- After implementation, providers can be toggled via config:
404
-
405
- ```bash
406
- # .env
407
- CHAT_PROVIDER=huggingface # "openai", "anthropic", "huggingface", "auto"
408
- ```
409
 
410
- Or at runtime:
411
- ```python
412
- orchestrator = AdvancedOrchestrator(provider="huggingface")
413
- orchestrator = AdvancedOrchestrator(provider="openai", api_key="sk-...")
414
- ```
415
 
416
- This enables:
417
- 1. **A/B testing** different providers
418
- 2. **Cost optimization** (switch to cheaper provider)
419
- 3. **Graceful degradation** (fallback chain)
420
- 4. **Kill switch** (disable specific provider)
421
 
422
  ## References
423
 
424
  - Microsoft Agent Framework: `agent_framework.BaseChatClient`
425
- - HuggingFace Inference: `huggingface_hub.InferenceClient`
426
- - Llama 3.1 Function Calling: [HuggingFace Docs](https://huggingface.co/docs/transformers/main/chat_templating#tool-use--function-calling)
427
- - Issue #105: Deprecate Simple Mode
 
2
 
3
  **Status**: Proposed
4
  **Priority**: P1 (Architectural Simplification)
5
+ **Issue**: Updates [#105](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/105), [#109](https://github.com/The-Obstacle-Is-The-Way/DeepBoner/issues/109)
6
  **Created**: 2025-12-01
7
 
8
  ## Summary
9
 
10
+ Eliminate the Simple Mode / Advanced Mode parallel universe by implementing a pluggable `ChatClient` architecture. This moves the system away from a hardcoded `OpenAIChatClient` namespace to a neutral `BaseChatClient` protocol, allowing the multi-agent framework to work with ANY LLM provider through a unified codebase.
11
+
12
+ ## Strategic Goals
13
+
14
+ 1. **Namespace Neutrality**: Decouple the core orchestrator from the `OpenAI` namespace. The system should speak `ChatClient`, not `OpenAIChatClient`.
15
+ 2. **Full-Stack Provider Chain**: Prioritize providers that offer both LLM and Embeddings (OpenAI, Gemini, HuggingFace+Local) to ensure a unified environment.
16
+ 3. **Fragmentation Reduction**: Remove "LLM-only" providers (Anthropic) that force complex hybrid dependency chains (e.g., Anthropic LLM + OpenAI Embeddings).
17
 
18
  ## Problem Statement
19
 
 
24
 
25
  ├── Has API Key? ──Yes──→ Advanced Mode (400 lines)
26
  │ └── Microsoft Agent Framework
27
+ │ └── OpenAIChatClient (hardcoded dependency)
28
 
29
  └── No API Key? ──────────→ Simple Mode (761 lines)
30
  └── While-loop orchestration
 
32
  ```
33
 
34
  **Problems:**
35
+ 1. **Double Maintenance**: 1,161 lines across two systems.
36
+ 2. **Namespace Lock-in**: The Advanced Orchestrator is tightly coupled to `OpenAIChatClient`.
37
+ 3. **Fragmented Chains**: Using Anthropic requires a "frankstein" chain (Anthropic LLM + OpenAI Embeddings).
38
+ 4. **Testing Burden**: Two test suites, two CI paths.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39
 
40
  ## Proposed Solution: ChatClientFactory
41
 
 
46
 
47
  └──→ Advanced Mode (unified)
48
  └── Microsoft Agent Framework
49
+ └── ChatClientFactory (Namespace Neutral):
50
+ ├── OpenAIChatClient (Paid Tier: Best Performance)
51
+ ├── GeminiChatClient (Alternative Tier: LLM + Embeddings)
52
+ └── HuggingFaceChatClient (Free Tier: LLM + Local Embeddings)
53
  ```
54
 
55
  ### New Files
 
58
  src/
59
  ├── clients/
60
  │ ├── __init__.py
61
+ │ ├── base.py # Re-export BaseChatClient (The neutral protocol)
62
  │ ├── factory.py # ChatClientFactory
63
+ │ ├── huggingface.py # HuggingFaceChatClient
64
+ │ └── gemini.py # GeminiChatClient [Future]
65
  ```
66
 
67
  ### ChatClientFactory Implementation
 
70
  # src/clients/factory.py
71
  from agent_framework import BaseChatClient
72
  from agent_framework.openai import OpenAIChatClient
 
73
  from src.utils.config import settings
74
 
75
  def get_chat_client(
 
81
 
82
  Auto-detection priority:
83
  1. Explicit provider parameter
84
+ 2. OpenAI key (Best Function Calling)
85
+ 3. Gemini key (Best Context/Cost)
86
+ 4. HuggingFace (Free Fallback)
87
 
88
  Args:
89
+ provider: Force specific provider ("openai", "gemini", "huggingface")
90
  api_key: Override API key for the provider
91
 
92
  Returns:
93
+ Configured BaseChatClient instance (Neutral Namespace)
94
  """
95
+ # OpenAI (Standard)
96
  if provider == "openai" or (provider is None and settings.has_openai_key):
97
  return OpenAIChatClient(
98
  model_id=settings.openai_model,
99
  api_key=api_key or settings.openai_api_key,
100
  )
101
 
102
+ # Gemini (High Performance Alternative)
103
+ if provider == "gemini" or (provider is None and settings.has_gemini_key):
104
+ from src.clients.gemini import GeminiChatClient
105
+ return GeminiChatClient(
106
+ model_id="gemini-2.0-flash",
107
+ api_key=api_key or settings.gemini_api_key,
108
  )
109
 
110
+ # Free Fallback (HuggingFace)
111
  from src.clients.huggingface import HuggingFaceChatClient
112
  return HuggingFaceChatClient(
113
  model_id="meta-llama/Llama-3.1-70B-Instruct",
114
  )
115
  ```
116
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
117
  ### Changes to Advanced Orchestrator
118
 
119
  ```python
120
  # src/orchestrators/advanced.py
121
 
122
+ # BEFORE (hardcoded namespace):
123
  from agent_framework.openai import OpenAIChatClient
124
 
125
  class AdvancedOrchestrator:
126
  def __init__(self, ...):
127
  self._chat_client = OpenAIChatClient(...)
128
 
129
+ # AFTER (neutral namespace):
130
  from src.clients.factory import get_chat_client
131
 
132
  class AdvancedOrchestrator:
133
  def __init__(self, chat_client=None, provider=None, api_key=None, ...):
134
+ # The orchestrator no longer knows about OpenAI
135
  self._chat_client = chat_client or get_chat_client(
136
  provider=provider,
137
  api_key=api_key,
138
  )
139
  ```
140
 
 
 
 
 
 
 
 
 
 
 
 
 
141
  ## Migration Plan
142
 
143
+ ### Phase 1: Neutralize Namespace & Add HuggingFace
144
+ - [ ] Create `src/clients/` package.
145
+ - [ ] Implement `HuggingFaceChatClient` adapter.
146
+ - [ ] Implement `ChatClientFactory`.
147
+ - [ ] Refactor `AdvancedOrchestrator` to use `get_chat_client()`.
148
+ - [ ] Update strict typing to use `BaseChatClient` instead of `OpenAIChatClient`.
149
 
150
+ ### Phase 2: Simplify Provider Chain
151
+ - [ ] Remove `Anthropic` references (Issue #110).
152
+ - [ ] (Future) Implement `GeminiChatClient` to support Google's full stack.
 
 
153
 
154
  ### Phase 3: Deprecate Simple Mode
155
+ - [ ] Update `src/orchestrators/factory.py` to use unified `AdvancedOrchestrator`.
156
+ - [ ] Delete `src/orchestrators/simple.py`.
157
+ - [ ] Delete `src/tools/search_handler.py`.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
158
 
159
+ ## Why This is "Elegant"
 
 
 
 
160
 
161
+ 1. **One System**: We stop maintaining two parallel universes.
162
+ 2. **Dependency Injection**: The specific LLM provider is injected, not hardcoded.
163
+ 3. **Full Stack Alignment**: We prioritize providers (OpenAI, Gemini) that own the whole vertical (LLM + Embeddings), reducing environment complexity.
 
 
164
 
165
  ## References
166
 
167
  - Microsoft Agent Framework: `agent_framework.BaseChatClient`
168
+ - Gemini API: [Embeddings + LLM](https://ai.google.dev/gemini-api/docs/embeddings)
169
+ - HuggingFace Inference: `huggingface_hub.InferenceClient`