# πŸ—οΈ Architecture Overview ## System Architecture This Hugging Face Space implements a comparative agent system with three reasoning modes. Here's how everything works together: ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Gradio UI Layer β”‚ β”‚ - Question Input β”‚ β”‚ - Mode Selection (Think/Act/ReAct/All) β”‚ β”‚ - Three Output Panels (side-by-side comparison) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Agent Controller β”‚ β”‚ run_comparison() - Routes to appropriate mode handler β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β–Ό β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Think-Only β”‚ β”‚ Act-Only β”‚ β”‚ ReAct β”‚ β”‚ Mode β”‚ β”‚ Mode β”‚ β”‚ Mode β”‚ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β–Ό β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ LLM Interface β”‚ β”‚ call_llm() - Communicates with openai/gpt-oss-20b β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό (Act-Only & ReAct modes only) β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Tool Executor β”‚ β”‚ - parse_action() β”‚ β”‚ - call_tool() β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β–Ό β–Ό β–Ό β–Ό β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ DuckDuckGo β”‚ β”‚ Wikipedia β”‚ β”‚Weatherβ”‚ β”‚Calcβ”‚ β”‚ Python β”‚ β”‚ Search β”‚ β”‚ Search β”‚ β”‚ API β”‚ β”‚ β”‚ β”‚ REPL β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## Component Details ### 1. **Tool Layer** Each tool is wrapped in a `Tool` class with: - **name**: Identifier for the LLM to reference - **description**: Instructions for when/how to use the tool - **func**: The actual implementation **Tool Implementations:** - `duckduckgo_search()`: Uses DuckDuckGo's JSON API - `wikipedia_search()`: Uses the Wikipedia Python library - `get_weather()`: Queries wttr.in API for weather data - `calculate()`: Safe AST-based math expression evaluator - `python_repl()`: Sandboxed Python execution with whitelisted builtins ### 2. **Agent Modes** #### Think-Only Mode (`think_only_mode`) ``` User Question β†’ System Prompt β†’ LLM β†’ Thoughts β†’ Answer ``` - Single LLM call with CoT prompt - No tool access - Shows reasoning steps - Best for knowledge-based questions #### Act-Only Mode (`act_only_mode`) ``` User Question β†’ System Prompt β†’ LLM β†’ Action ↓ Execute Tool β†’ Observation ↓ LLM β†’ Action/Answer ↓ ... ``` - Iterative loop: Action β†’ Observation - No explicit "Thought" step - Maximum 5 iterations - Best for information gathering #### ReAct Mode (`react_mode`) ``` User Question β†’ System Prompt β†’ LLM β†’ Thought β†’ Action ↓ Execute Tool β†’ Observation ↓ LLM β†’ Thought β†’ Action/Answer ↓ ... ``` - Full Thought-Action-Observation cycle - Most comprehensive reasoning - Maximum 5 iterations - Best for complex multi-step problems ### 3. **LLM Interface** **`call_llm()` Function:** - Uses Hugging Face Inference API - Model: openai/gpt-oss-20b - Supports chat format (messages list) - Configurable temperature and max_tokens **Authentication:** - Requires `HF_TOKEN` environment variable - Set in Space secrets (secure) ### 4. **Parsing & Control Flow** **`parse_action()` Function:** - Extracts `Action:` and `Action Input:` from LLM response - Uses regex to handle various formats - Returns (action_name, action_input) tuple **Iteration Control:** - Max 5 iterations per mode to prevent infinite loops - Early termination when "Answer:" detected - Error handling for malformed responses ### 5. **UI Layer (Gradio)** **Components:** - **Input Section**: Question textbox + mode dropdown - **Example Buttons**: Pre-filled question templates - **Output Panels**: Three side-by-side Markdown displays - **Streaming**: Generator functions for real-time updates **User Flow:** 1. User enters question or clicks example 2. Selects mode (or "All" for comparison) 3. Clicks "Run" 4. Sees real-time updates in output panel(s) 5. Views final answer and complete reasoning trace ## Data Flow Example ### Example: "What's the weather in Paris?" **Mode: ReAct** 1. User submits question 2. `react_mode()` called with question 3. Prompt formatted with question + tool descriptions 4. First LLM call: ``` Thought: I need to check the current weather in Paris Action: get_weather Action Input: Paris ``` 5. `parse_action()` extracts tool call 6. `call_tool("get_weather", "Paris")` executes 7. Observation: "Weather in Paris: Cloudy, 15Β°C..." 8. Second LLM call with observation 9. LLM responds: ``` Thought: I have the weather information Answer: The current weather in Paris is... ``` 10. Generator yields formatted output to UI 11. User sees complete trace in ReAct panel ## Key Design Patterns ### 1. **Generator Pattern for Streaming** ```python def mode(question: str) -> Generator[str, None, None]: yield "Step 1..." # process yield "Step 2..." # etc ``` Enables real-time UI updates without blocking ### 2. **Tool Registry Pattern** ```python TOOLS = [Tool(name, description, func), ...] ``` Easy to add new tools - just append to list ### 3. **Prompt Templates** ```python PROMPT = """...""".format(question=q, tools=t) ``` Modular prompts for each mode ### 4. **Safe Execution** - AST parsing for calculator (no `eval()`) - Whitelisted builtins for Python REPL - Timeout limits on API calls - Error handling with fallback messages ## Extensibility ### Adding a New Tool ```python def my_tool(input: str) -> str: # Implementation return result TOOLS.append(Tool( name="my_tool", description="When to use this tool...", func=my_tool )) ``` ### Adding a New Mode ```python def hybrid_mode(question: str) -> Generator[str, None, None]: # Custom logic mixing elements yield "Starting hybrid mode..." # ... # Add to run_comparison() and UI dropdown ``` ### Customizing Prompts Edit the `*_PROMPT` constants to change agent behavior: - Add constraints - Change format - Provide examples - Adjust tone ## Performance Considerations 1. **API Latency**: Model calls take 2-5 seconds 2. **Tool Latency**: External APIs add 1-2 seconds per call 3. **Iteration Count**: 5 iterations max = ~30 seconds worst case 4. **Parallel Modes**: "All" mode runs sequentially (not parallel) ## Security Notes 1. **API Keys**: Never commit `HF_TOKEN` to repo 2. **Python REPL**: Sandboxed with limited builtins 3. **User Input**: Sanitized before tool execution 4. **Rate Limits**: Consider adding rate limiting for production ## Testing Strategy 1. **Unit Tests**: Test individual tool functions 2. **Integration Tests**: Test mode handlers end-to-end 3. **Prompt Tests**: Verify LLM responses parse correctly 4. **UI Tests**: Test Gradio interface components ## Future Enhancements - [ ] Add memory/conversation history - [ ] Implement parallel tool calling - [ ] Add caching layer for repeated queries - [ ] Support custom user tools - [ ] Add performance metrics/timing - [ ] Implement token counting/cost tracking - [ ] Add export functionality for reasoning traces