Spaces:

desolo-2918
/

Competitive-Analysis-Single-Agent

Runtime error

File size: 17,779 Bytes

5d1056c

# MCP Architecture Documentation

## Overview

This document explains the Model Context Protocol (MCP) architecture used in the Competitive Analysis Agent system.

## What is the Model Context Protocol?

MCP is a standardized protocol designed to enable seamless integration of:
- **AI Models** (Claude, GPT-4, etc.) with
- **External Tools & Services** (web search, databases, APIs, etc.)
- **Custom Business Logic** (analysis, validation, report generation)

### Why MCP?

1. **Modularity**: Tools are isolated and reusable
2. **Scalability**: Add tools without modifying core agent code
3. **Standardization**: Common protocol across different AI systems
4. **Separation of Concerns**: Clear boundaries between reasoning and action
5. **Production Ready**: Built for enterprise-grade AI applications

---

## System Architecture

### Three-Tier Architecture

```
┌─────────────────────────────────────────────────────┐
│          PRESENTATION LAYER - Gradio UI             │
│  • User input (company name, API key)               │
│  • Report display (formatted Markdown)              │
│  • Error handling and validation                    │
└──────────────────┬──────────────────────────────────┘
                   │ HTTP/REST
                   ▼
┌─────────────────────────────────────────────────────┐
│         APPLICATION LAYER - MCP Client              │
│  • OpenAI Agent (GPT-4)                            │
│  • Strategic reasoning and planning                 │
│  • Tool orchestration and sequencing               │
│  • Result synthesis                                 │
└──────────────────┬──────────────────────────────────┘
                   │ MCP Protocol
                   ▼
┌─────────────────────────────────────────────────────┐
│         SERVICE LAYER - MCP Server (FastMCP)        │
│  Tools:                                             │
│  • validate_company()                              │
│  • identify_sector()                               │
│  • identify_competitors()                          │
│  • browse_page()                                   │
│  • generate_report()                               │
│                                                     │
│  External Services:                                │
│  • DuckDuckGo API                                 │
│  • HTTP/BeautifulSoup scraping                    │
│  • OpenAI API (GPT-4)                             │
└─────────────────────────────────────────────────────┘
```

---

## Component Details

### 1. Presentation Layer (`app.py`)

**Gradio Interface**
- User-friendly web UI
- Input validation
- Output formatting
- Error messaging

```python
# Example flow
User Input: "Tesla"
    ↓
Validate inputs
    ↓
Call MCP Client.analyze_company()
    ↓
Display Markdown report
```

### 2. Application Layer (`mcp_client.py`)

**MCP Client with OpenAI Agent**

The client implements:
- **System Prompt**: Defines agent role and goals
- **Message History**: Maintains conversation context
- **Tool Calling**: Translates agent decisions to MCP calls
- **Response Synthesis**: Compiles results into reports

```python
system_prompt = """
You are a competitive analysis expert.
Use available tools to:
1. Validate the company
2. Identify sector
3. Find competitors
4. Gather strategic data
5. Generate insights
"""

# Agent workflow:
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "Analyze Sony"}
]

response = client.chat.completions.create(
    model="gpt-4",
    messages=messages
)
# OpenAI returns tool calls, which we execute
```

**Key Features**:
- Graceful fallback to simple analysis when MCP unavailable
- Handles API errors and timeouts
- Synthesizes multiple tool results

### 3. Service Layer (`mcp_server.py`)

**FastMCP Server with Tools**

#### Tools Overview

| Tool | Purpose | Returns |
|------|---------|---------|
| `validate_company(name)` | Check if company exists | Bool + evidence |
| `identify_sector(name)` | Find industry classification | Sector name |
| `identify_competitors(sector, company)` | Discover top 3 rivals | "Comp1, Comp2, Comp3" |
| `browse_page(url, instructions)` | Extract webpage content | Relevant text |
| `generate_report(company, context)` | Create analysis report | Markdown report |

#### Tool Implementation Pattern

```python
@mcp.tool()
def validate_company(company_name: str) -> str:
    """
    Docstring: Describes tool purpose and parameters
    """
    # Implementation
    try:
        results = web_search_tool(f"{company_name} company")
        evidence_count = analyze_search_results(results)
        return validation_result
    except Exception as e:
        return f"Error: {str(e)}"
```

#### Web Search Integration

```python
from duckduckgo_search import DDGS

def web_search_tool(query: str) -> str:
    """Unified search interface for all tools"""
    with DDGS() as ddgs:
        results = list(ddgs.text(query, max_results=5))
    return format_results(results)
```

---

## Message Flow

### Complete Analysis Request

```
1. USER INTERFACE (Gradio)
   │
   ├─ Company: "Apple"
   └─ OpenAI Key: "sk-..."
   
2. GRADIO → MCP CLIENT
   │
   ├─ analyze_competitor_landscape("Apple", api_key)
   │
   └─ Creates CompetitiveAnalysisAgent instance
   
3. MCP CLIENT → OPENAI
   │
   ├─ System: "You are a competitive analyst..."
   ├─ User: "Analyze Apple's competitors"
   │
   ├─ OpenAI responds with:
   │  └─ "Call validate_company('Apple')"
   
4. MCP CLIENT → MCP SERVER
   │
   ├─ Calls: validate_company("Apple")
   ├─ Calls: identify_sector("Apple")
   ├─ Calls: identify_competitors("Technology", "Apple")
   │
   └─ Receives results for each tool
   
5. MCP SERVER
   │
   ├─ validate_company()
   │  └─ Web search → DuckDuckGo API → Parse results
   │
   ├─ identify_sector()
   │  └─ Multi-stage search → Keyword analysis → Return sector
   │
   ├─ identify_competitors()
   │  └─ Industry search → Competitor extraction → Ranking
   │
   └─ generate_report()
      └─ Format results → Markdown template → Return report
   
6. MCP CLIENT SYNTHESIS
   │
   ├─ Compile all tool results
   ├─ Add OpenAI insights
   └─ Return complete report
   
7. GRADIO DISPLAY
   │
   └─ Render Markdown report to user
```

---

## Data Flow Diagram

```
USER INPUT
    │
    ├─ company_name: "Company X"
    └─ api_key: "sk-xxx"
    │
    ▼
┌──────────────────────┐
│   Input Validation   │
│  (Length, Format)    │
└──────────┬───────────┘
           │
           ▼
┌──────────────────────────────┐
│   OpenAI Agent Planning      │
│  (System + User Messages)    │
└──────────┬───────────────────┘
           │
           ├─────────────────────────────┬─────────────────────────┬──────────────┐
           │                             │                         │              │
           ▼                             ▼                         ▼              ▼
    ┌────────────────┐         ┌──────────────────┐      ┌──────────────┐   ┌─────────┐
    │ validate_      │         │ identify_        │      │ identify_    │   │ browse_ │
    │ company()      │         │ sector()         │      │ competitors()│   │ page()  │
    └────────┬───────┘         └────────┬─────────┘      └──────┬───────┘   └────┬────┘
             │                         │                       │                  │
             ▼                         ▼                       ▼                  ▼
       ┌──────────────┐         ┌─────────────┐         ┌────────────┐     ┌──────────┐
       │ Web Search   │         │ Web Search  │         │ Web Search │     │ HTTP Get │
       │ DuckDuckGo   │         │ Multi-stage │         │ Industry   │     │ Parse    │
       │ + Analysis   │         │             │         │ Leaders    │     │ HTML     │
       └──────┬───────┘         └──────┬──────┘         └─────┬──────┘     └────┬─────┘
              │                       │                      │                  │
              ▼                       ▼                      ▼                  ▼
         VALIDATION  →  SECTOR ID  →  COMPETITORS  →  ADDITIONAL DATA
              │              │              │                  │
              └──────────────┴──────────────┴──────────────────┘
                             │
                             ▼
                    ┌─────────────────────┐
                    │  generate_report()  │
                    │  (Compile results)  │
                    └────────┬────────────┘
                             │
                             ▼
                    ┌─────────────────────┐
                    │  OpenAI Final       │
                    │  Synthesis          │
                    └────────┬────────────┘
                             │
                             ▼
                     FINAL REPORT
                    (Markdown format)
```

---

## Tool Implementation Details

### Tool 1: `validate_company()`

```python
# Multi-stage validation
search_results = web_search_tool("Tesla company business official site")

# Evidence signals:
✓ Official website found (.com/.io)
✓ "Official site" or "official website" mention
✓ Company + sector description
✓ Business terminology present
✓ Wikipedia/news mentions

# Result: Evidence count >= 2 → Valid company
```

### Tool 2: `identify_sector()`

```python
# Three search strategies:
1. "What does Tesla do?" → Extract sector keywords
2. "Tesla industry type" → Direct classification
3. "Tesla sector news" → Financial/news sources

# Sector patterns:
{
  "Technology": ["software", "hardware", "cloud", "ai", ...],
  "Finance": ["banking", "fintech", "insurance", ...],
  "Manufacturing": ["automotive", "industrial", ...],
  ...
}

# Weighted voting to determine primary sector
```

### Tool 3: `identify_competitors()`

```python
# Search strategy:
1. "Top technology companies" → Market leaders
2. "Tesla competitors" → Direct rivals
3. "EV industry leaders" → Sector players

# Extraction methods:
- Pattern matching for company names
- List parsing (comma-separated, bulleted)
- Frequency analysis and ranking

# Returns: Top 3 ranked competitors
```

### Tool 4: `browse_page()`

```python
# Content extraction workflow:
requests.get(url) 
  → BeautifulSoup parsing
  → Remove scripts/styles/headers/footers
  → Extract main content divs/articles/paragraphs
  → Keyword matching against instructions
  → Return top N relevant sentences

# Safety: Timeout=10s, max_content=5000 chars
```

### Tool 5: `generate_report()`

```python
# Template-based report generation
report = f"""
# Competitive Analysis Report: {company_name}

## Executive Summary
[Synthesized findings]

## Competitor Comparison
| Competitor | Strategy | Pricing | Products | Market |
|------------|----------|---------|----------|--------|
| [extracted competitors] | - | - | - | - |

## Strategic Insights
[Recommendations]
"""
```

---

## Error Handling Strategy

### Layered Error Handling

```
Layer 1: Input Validation (Gradio)
  └─ Check company name length
  └─ Validate API key format
  └─ Return user-friendly error

Layer 2: Tool Execution (MCP Server)
  └─ Try/except on each tool
  └─ Timeout protection (10s requests)
  └─ Graceful degradation
  └─ Log detailed errors

Layer 3: Agent Logic (MCP Client)
  └─ API timeout handling
  └─ Rate limit handling
  └─ Fallback to simple analysis
  └─ Return partial results

Layer 4: User Feedback (Gradio)
  └─ Display error with context
  └─ Suggest remediation
  └─ Allow retry
```

---

## Performance Optimization

### Caching Strategy
```python
# Web search results cached for 5 minutes
# Sector identify, re-used across tools
# Competitor list, reused in reports
```

### Parallel Tool Execution
```python
# Future enhancement: Run independent tools in parallel
validate_company() (parallel)
identify_sector()  (parallel)
identify_competitors() (sequential, depends on sector)
```

### Rate Limiting
```python
# DuckDuckGo: 2.0 second delays between searches
# OpenAI: Batched requests, monitoring quota
# HTTP: 10-second timeout, connection pooling
```

---

## Security Considerations

### API Key Handling
```python
# Keys accepted via:
✓ UI input field (temporary in memory)
✗ NOT stored in files
✗ NOT logged in output
✗ NOT persisted in database

# Environment variables optional:
Optional: Load from .env via python-dotenv
```

### Data Privacy
```python
# Web search results: Temporary, discarded after analysis
# Company data: Not cached or stored
# User queries: Not logged or tracked
# Report generation: All local processing
```

### Web Scraping Safety
```python
# User-Agent provided (genuine browser identification)
# Robots.txt respected (DuckDuckGo + BeautifulSoup)
# Timeout protection (10 seconds)
# Error handling for blocked requests
```

---

## Extension Points

### Adding New Tools

```python
@mcp.tool()
def custom_tool(param1: str, param2: int) -> str:
    """
    Your custom tool description.
    Args:
        param1: Parameter 1 description
        param2: Parameter 2 description
    Returns:
        str: Result description
    """
    try:
        # Implementation
        result = some_operation(param1, param2)
        return result
    except Exception as e:
        return f"Error: {str(e)}"
```

### Modifying Agent Behavior

```python
# In mcp_client.py, edit system_prompt:
system_prompt = """
Updated instructions for agent behavior
"""

# Or add initial human message:
messages.append({
    "role": "user",
    "content": "Additional analysis request..."
})
```

### Customizing Report Generation

```python
# In mcp_server.py, edit generate_report() template:
report = f"""
# Custom Report Format

Your custom structure here...
"""
```

---

## Testing

### Manual Testing

```bash
# Test MCP Server
python mcp_server.py

# Test MCP Client functions
python -c "from mcp_client import analyze_competitor_landscape; print(analyze_competitor_landscape('Microsoft', 'sk-...'))"

# Test Gradio UI
python app.py
# Navigate to http://localhost:7860
```

### Validation Tests

```python
# Test validate_company()
assert "VALID" in validate_company("Google")
assert "NOT" in validate_company("FakeCompanyXYZ123")

# Test identify_sector()
assert "Technology" in identify_sector("Microsoft")
assert "Finance" in identify_sector("JPMorgan")

# Test competitor discovery
competitors = identify_competitors("Technology", "Google")
assert len(competitors) <= 3
```

---

## Future Enhancements

1. **Real-time Market Data**: Integrate financial APIs (Alpha Vantage, etc.)
2. **Sentiment Analysis**: Analyze news sentiment about companies
3. **Patent Analysis**: Include R&D insights from patents
4. **Social Media**: Monitor competitor social media activity
5. **Pricing Intelligence**: Track price changes over time
6. **SWOT Matrix**: Generate structured SWOT analysis
7. **Visualization**: Create charts and graphs
8. **PDF Export**: Generate PDF reports
9. **Multi-company Batch**: Analyze multiple companies
10. **Integration APIs**: Connect to Slack, Salesforce, etc.

---

## Conclusion

The MCP architecture provides:
- ✅ Modularity and extensibility
- ✅ Clear separation of concerns
- ✅ Robust error handling
- ✅ Scalability for future enhancements
- ✅ Production-ready design
- ✅ Easy tool management

This design enables rapid development, maintenance, and deployment of AI-powered competitive analysis systems.

---

**Document Version**: 1.0  
**Last Updated**: March 2026