# 🏗️ Architecture Overview

## System Architecture

This Hugging Face Space implements a comparative agent system with three reasoning modes. Here's how everything works together:

```
┌─────────────────────────────────────────────────────────────┐
│                    Gradio UI Layer                          │
│  - Question Input                                           │
│  - Mode Selection (Think/Act/ReAct/All)                    │
│  - Three Output Panels (side-by-side comparison)           │
└──────────────────┬──────────────────────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────────────────────┐
│                   Agent Controller                          │
│  run_comparison() - Routes to appropriate mode handler     │
└──────────────────┬──────────────────────────────────────────┘
                   │
        ┌──────────┴──────────┬──────────────┐
        ▼                     ▼              ▼
┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│  Think-Only  │    │   Act-Only   │    │    ReAct     │
│    Mode      │    │     Mode     │    │     Mode     │
└──────┬───────┘    └──────┬───────┘    └──────┬───────┘
       │                   │                    │
       ▼                   ▼                    ▼
┌─────────────────────────────────────────────────────────────┐
│                    LLM Interface                            │
│  call_llm() - Communicates with openai/gpt-oss-20b        │
└──────────────────┬──────────────────────────────────────────┘
                   │
                   ▼ (Act-Only & ReAct modes only)
┌─────────────────────────────────────────────────────────────┐
│                    Tool Executor                            │
│  - parse_action()                                           │
│  - call_tool()                                              │
└──────────────────┬──────────────────────────────────────────┘
                   │
       ┌───────────┴───────────┬───────────┬───────────┬──────┐
       ▼                       ▼           ▼           ▼      ▼
┌────────────┐  ┌────────────┐  ┌──────┐  ┌────┐  ┌─────────┐
│ DuckDuckGo │  │ Wikipedia  │  │Weather│ │Calc│  │ Python  │
│   Search   │  │   Search   │  │  API  │ │    │  │  REPL   │
└────────────┘  └────────────┘  └──────┘  └────┘  └─────────┘
```

## Component Details

### 1. **Tool Layer**

Each tool is wrapped in a `Tool` class with:
- **name**: Identifier for the LLM to reference
- **description**: Instructions for when/how to use the tool
- **func**: The actual implementation

**Tool Implementations:**

- `duckduckgo_search()`: Uses DuckDuckGo's JSON API
- `wikipedia_search()`: Uses the Wikipedia Python library
- `get_weather()`: Queries wttr.in API for weather data
- `calculate()`: Safe AST-based math expression evaluator
- `python_repl()`: Sandboxed Python execution with whitelisted builtins

### 2. **Agent Modes**

#### Think-Only Mode (`think_only_mode`)
```
User Question → System Prompt → LLM → Thoughts → Answer
```
- Single LLM call with CoT prompt
- No tool access
- Shows reasoning steps
- Best for knowledge-based questions

#### Act-Only Mode (`act_only_mode`)
```
User Question → System Prompt → LLM → Action
                                   ↓
                            Execute Tool → Observation
                                   ↓
                                  LLM → Action/Answer
                                   ↓
                                  ...
```
- Iterative loop: Action → Observation
- No explicit "Thought" step
- Maximum 5 iterations
- Best for information gathering

#### ReAct Mode (`react_mode`)
```
User Question → System Prompt → LLM → Thought → Action
                                         ↓
                                  Execute Tool → Observation
                                         ↓
                                       LLM → Thought → Action/Answer
                                         ↓
                                        ...
```
- Full Thought-Action-Observation cycle
- Most comprehensive reasoning
- Maximum 5 iterations
- Best for complex multi-step problems

### 3. **LLM Interface**

**`call_llm()` Function:**
- Uses Hugging Face Inference API
- Model: openai/gpt-oss-20b
- Supports chat format (messages list)
- Configurable temperature and max_tokens

**Authentication:**
- Requires `HF_TOKEN` environment variable
- Set in Space secrets (secure)

### 4. **Parsing & Control Flow**

**`parse_action()` Function:**
- Extracts `Action:` and `Action Input:` from LLM response
- Uses regex to handle various formats
- Returns (action_name, action_input) tuple

**Iteration Control:**
- Max 5 iterations per mode to prevent infinite loops
- Early termination when "Answer:" detected
- Error handling for malformed responses

### 5. **UI Layer (Gradio)**

**Components:**
- **Input Section**: Question textbox + mode dropdown
- **Example Buttons**: Pre-filled question templates
- **Output Panels**: Three side-by-side Markdown displays
- **Streaming**: Generator functions for real-time updates

**User Flow:**
1. User enters question or clicks example
2. Selects mode (or "All" for comparison)
3. Clicks "Run"
4. Sees real-time updates in output panel(s)
5. Views final answer and complete reasoning trace

## Data Flow Example

### Example: "What's the weather in Paris?"

**Mode: ReAct**

1. User submits question
2. `react_mode()` called with question
3. Prompt formatted with question + tool descriptions
4. First LLM call:
   ```
   Thought: I need to check the current weather in Paris
   Action: get_weather
   Action Input: Paris
   ```
5. `parse_action()` extracts tool call
6. `call_tool("get_weather", "Paris")` executes
7. Observation: "Weather in Paris: Cloudy, 15°C..."
8. Second LLM call with observation
9. LLM responds:
   ```
   Thought: I have the weather information
   Answer: The current weather in Paris is...
   ```
10. Generator yields formatted output to UI
11. User sees complete trace in ReAct panel

## Key Design Patterns

### 1. **Generator Pattern for Streaming**
```python
def mode(question: str) -> Generator[str, None, None]:
    yield "Step 1..."
    # process
    yield "Step 2..."
    # etc
```
Enables real-time UI updates without blocking

### 2. **Tool Registry Pattern**
```python
TOOLS = [Tool(name, description, func), ...]
```
Easy to add new tools - just append to list

### 3. **Prompt Templates**
```python
PROMPT = """...""".format(question=q, tools=t)
```
Modular prompts for each mode

### 4. **Safe Execution**
- AST parsing for calculator (no `eval()`)
- Whitelisted builtins for Python REPL
- Timeout limits on API calls
- Error handling with fallback messages

## Extensibility

### Adding a New Tool

```python
def my_tool(input: str) -> str:
    # Implementation
    return result

TOOLS.append(Tool(
    name="my_tool",
    description="When to use this tool...",
    func=my_tool
))
```

### Adding a New Mode

```python
def hybrid_mode(question: str) -> Generator[str, None, None]:
    # Custom logic mixing elements
    yield "Starting hybrid mode..."
    # ...
    
# Add to run_comparison() and UI dropdown
```

### Customizing Prompts

Edit the `*_PROMPT` constants to change agent behavior:
- Add constraints
- Change format
- Provide examples
- Adjust tone

## Performance Considerations

1. **API Latency**: Model calls take 2-5 seconds
2. **Tool Latency**: External APIs add 1-2 seconds per call
3. **Iteration Count**: 5 iterations max = ~30 seconds worst case
4. **Parallel Modes**: "All" mode runs sequentially (not parallel)

## Security Notes

1. **API Keys**: Never commit `HF_TOKEN` to repo
2. **Python REPL**: Sandboxed with limited builtins
3. **User Input**: Sanitized before tool execution
4. **Rate Limits**: Consider adding rate limiting for production

## Testing Strategy

1. **Unit Tests**: Test individual tool functions
2. **Integration Tests**: Test mode handlers end-to-end
3. **Prompt Tests**: Verify LLM responses parse correctly
4. **UI Tests**: Test Gradio interface components

## Future Enhancements

- [ ] Add memory/conversation history
- [ ] Implement parallel tool calling
- [ ] Add caching layer for repeated queries
- [ ] Support custom user tools
- [ ] Add performance metrics/timing
- [ ] Implement token counting/cost tracking
- [ ] Add export functionality for reasoning traces