Spaces:
Sleeping
Sleeping
| # ๐๏ธ System Architecture | |
| MissionControlMCP system design and architecture documentation. | |
| --- | |
| ## ๐ High-Level Architecture | |
| ``` | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ Client Layer โ | |
| โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ | |
| โ โ Claude โ โ Custom โ โ Other MCP โ โ | |
| โ โ Desktop โ โ Client โ โ Clients โ โ | |
| โ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโ โ | |
| โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ MCP Protocol (stdio) | |
| โโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ MCP Server Layer โ | |
| โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ | |
| โ โ mcp_server.py โ โ | |
| โ โ โข Tool Registration โ โ | |
| โ โ โข Request Routing โ โ | |
| โ โ โข Response Formatting โ โ | |
| โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ | |
| โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ | |
| โโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ Business Logic Layer โ | |
| โ โโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโ โ | |
| โ โ PDF โ Text โ Web โ RAG โ โ | |
| โ โ Reader โ Extract โ Fetcher โ Search โ โ | |
| โ โโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโค โ | |
| โ โ Data โ File โ Email โ KPI โ โ | |
| โ โ Visual โ Convert โ Classify โ Generate โ โ | |
| โ โโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโ โ | |
| โโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ | |
| โโโโโโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| โ Utility Layer โ | |
| โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ | |
| โ โ โข helpers.py - Text processing utilities โ โ | |
| โ โ โข rag_utils.py - Vector search & FAISS โ โ | |
| โ โ โข schemas.py - Pydantic models โ โ | |
| โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ | |
| โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | |
| ``` | |
| --- | |
| ## ๐งฉ Component Architecture | |
| ### 1. MCP Server (`mcp_server.py`) | |
| **Responsibilities:** | |
| - Register all 8 tools with MCP SDK | |
| - Handle incoming tool requests | |
| - Route requests to appropriate tool functions | |
| - Format and return responses | |
| - Error handling and logging | |
| **Flow:** | |
| ``` | |
| Client Request โ MCP Protocol โ Server โ Tool โ Response โ Client | |
| ``` | |
| **Code Structure:** | |
| ```python | |
| # Tool Registration | |
| server.register_tool(name, description, input_schema) | |
| # Request Handler | |
| async def call_tool(name, arguments): | |
| if name == "pdf_reader": | |
| return await pdf_reader.read_pdf(**arguments) | |
| elif name == "text_extractor": | |
| return await text_extractor.extract_text(**arguments) | |
| # ... other tools | |
| # Server Startup | |
| async with stdio_server() as (read_stream, write_stream): | |
| await server.run(read_stream, write_stream) | |
| ``` | |
| --- | |
| ### 2. Tool Layer (`tools/`) | |
| Each tool is independent and follows this pattern: | |
| **Tool Structure:** | |
| ```python | |
| """ | |
| Tool Name - Description | |
| """ | |
| import logging | |
| from typing import Dict, Any | |
| logger = logging.getLogger(__name__) | |
| def tool_function(param: str) -> Dict[str, Any]: | |
| """ | |
| Tool description. | |
| Args: | |
| param: Parameter description | |
| Returns: | |
| Standardized result dictionary | |
| """ | |
| try: | |
| # Validation | |
| if not param: | |
| raise ValueError("Invalid input") | |
| # Processing | |
| result = process_data(param) | |
| # Return standardized format | |
| return { | |
| "success": True, | |
| "data": result, | |
| "metadata": {} | |
| } | |
| except Exception as e: | |
| logger.error(f"Error: {e}") | |
| raise | |
| ``` | |
| **Tool Independence:** | |
| - Each tool is self-contained | |
| - No dependencies between tools | |
| - Can be tested individually | |
| - Easy to add/remove tools | |
| --- | |
| ### 3. Utility Layer (`utils/`) | |
| **helpers.py - Text Processing:** | |
| ```python | |
| โข clean_text() - Remove extra whitespace | |
| โข extract_keywords() - NLP keyword extraction | |
| โข chunk_text() - Text splitting with overlap | |
| โข validate_url() - URL validation | |
| ``` | |
| **rag_utils.py - Vector Search:** | |
| ```python | |
| โข SimpleRAGStore - FAISS-based vector database | |
| โข semantic_search() - Sentence transformer embeddings | |
| โข create_rag_store() - Initialize vector store | |
| ``` | |
| **Models (models/schemas.py):** | |
| ```python | |
| โข Pydantic models for type validation | |
| โข Input/output schemas | |
| โข Data validation | |
| ``` | |
| --- | |
| ## ๐ Data Flow | |
| ### Request Flow | |
| ``` | |
| 1. Client sends MCP request | |
| โ | |
| 2. mcp_server.py receives request | |
| โ | |
| 3. Server validates input schema | |
| โ | |
| 4. Server routes to tool function | |
| โ | |
| 5. Tool processes data | |
| โ | |
| 6. Tool returns result dict | |
| โ | |
| 7. Server formats MCP response | |
| โ | |
| 8. Client receives response | |
| ``` | |
| ### Example: PDF Reading Flow | |
| ``` | |
| Client: "Read this PDF" | |
| โ | |
| MCP Server: Receives pdf_reader request | |
| โ | |
| pdf_reader.py: read_pdf(file_path) | |
| โ | |
| PyPDF2: Extract text from pages | |
| โ | |
| Return: {text, pages, metadata} | |
| โ | |
| MCP Server: Format response | |
| โ | |
| Client: Receives extracted text | |
| ``` | |
| --- | |
| ## ๐๏ธ Project Structure | |
| ``` | |
| mission_control_mcp/ | |
| โ | |
| โโโ mcp_server.py # MCP server entry point | |
| โ | |
| โโโ tools/ # 8 independent tools | |
| โ โโโ pdf_reader.py # PDF text extraction | |
| โ โโโ text_extractor.py # Text processing (4 ops) | |
| โ โโโ web_fetcher.py # Web scraping | |
| โ โโโ rag_search.py # Semantic search | |
| โ โโโ data_visualizer.py # Chart generation | |
| โ โโโ file_converter.py # File format conversion | |
| โ โโโ email_intent_classifier.py # Email classification | |
| โ โโโ kpi_generator.py # Business metrics | |
| โ | |
| โโโ utils/ # Shared utilities | |
| โ โโโ helpers.py # Text processing helpers | |
| โ โโโ rag_utils.py # Vector search utilities | |
| โ | |
| โโโ models/ # Data models | |
| โ โโโ schemas.py # Pydantic schemas | |
| โ | |
| โโโ examples/ # Sample test data | |
| โ โโโ sample_report.txt # Business report | |
| โ โโโ business_data.csv # Financial data | |
| โ โโโ sample_email_*.txt # Email samples | |
| โ โโโ sample_documents.txt # RAG search docs | |
| โ | |
| โโโ app.py # Gradio web interface | |
| โโโ demo.py # Demo & test suite | |
| โ | |
| โโโ docs/ # Documentation | |
| โ โโโ README.md # Main documentation | |
| โ โโโ API.md # API reference | |
| โ โโโ EXAMPLES.md # Use cases | |
| โ โโโ TESTING.md # Testing guide | |
| โ โโโ ARCHITECTURE.md # This file | |
| โ โโโ CONTRIBUTING.md # Contribution guide | |
| โ | |
| โโโ requirements.txt # Python dependencies | |
| โโโ .gitignore # Git ignore rules | |
| โโโ LICENSE # MIT License | |
| ``` | |
| --- | |
| ## ๐ Integration Points | |
| ### MCP Protocol Integration | |
| ```python | |
| from mcp.server import Server | |
| from mcp.types import Tool, TextContent | |
| # Create server | |
| server = Server("mission-control") | |
| # Register tool | |
| @server.tool() | |
| async def pdf_reader(file_path: str) -> str: | |
| result = read_pdf(file_path) | |
| return json.dumps(result) | |
| # Run server | |
| await server.run(stdin, stdout) | |
| ``` | |
| ### Claude Desktop Integration | |
| **Configuration:** | |
| ```json | |
| { | |
| "mcpServers": { | |
| "mission-control": { | |
| "command": "python", | |
| "args": ["path/to/mcp_server.py"] | |
| } | |
| } | |
| } | |
| ``` | |
| **Communication:** | |
| ``` | |
| Claude Desktop โโ MCP Protocol โโ mcp_server.py โโ Tools | |
| ``` | |
| --- | |
| ## ๐ Scalability Design | |
| ### Horizontal Scaling | |
| **Current:** Single-process server | |
| **Future:** Multi-process with load balancing | |
| ``` | |
| Load Balancer | |
| โ | |
| โโโโโโโโโโโโผโโโโโโโโโโโ | |
| โ โ โ | |
| Server 1 Server 2 Server 3 | |
| โ โ โ | |
| โโโโโโโโโโโโดโโโโโโโโโโโ | |
| Tools | |
| ``` | |
| ### Caching Strategy | |
| **Implemented:** | |
| - RAG model caching (sentence transformers) | |
| - NLTK data caching | |
| **Future Improvements:** | |
| - Redis for result caching | |
| - Database for document storage | |
| - CDN for static assets | |
| --- | |
| ## ๐ Security Architecture | |
| ### Input Validation | |
| ```python | |
| # Pydantic schemas | |
| from pydantic import BaseModel, validator | |
| class PDFReaderInput(BaseModel): | |
| file_path: str | |
| @validator('file_path') | |
| def validate_path(cls, v): | |
| if not Path(v).exists(): | |
| raise ValueError("File not found") | |
| return v | |
| ``` | |
| ### Error Handling | |
| ```python | |
| try: | |
| result = tool_function(input) | |
| except FileNotFoundError: | |
| return {"error": "File not found", "code": 404} | |
| except ValueError: | |
| return {"error": "Invalid input", "code": 400} | |
| except Exception: | |
| return {"error": "Internal error", "code": 500} | |
| ``` | |
| ### Authentication | |
| **Current:** None (local tool execution) | |
| **Production Considerations:** | |
| - API key authentication | |
| - Rate limiting | |
| - Request logging | |
| - User permissions | |
| --- | |
| ## ๐ Performance Characteristics | |
| ### Tool Performance | |
| | Tool | Avg Time | Memory | Notes | | |
| |------|----------|--------|-------| | |
| | PDF Reader | 1s | 50MB | Depends on PDF size | | |
| | Text Extractor | 0.5s | 10MB | Fast text processing | | |
| | Web Fetcher | 2-3s | 20MB | Network dependent | | |
| | RAG Search | 2.5s* | 200MB | *First run (model load) | | |
| | RAG Search | 0.5s | 200MB | Subsequent runs | | |
| | Data Visualizer | 1.2s | 30MB | Chart generation | | |
| | File Converter | 1-2s | 50MB | File size dependent | | |
| | Email Classifier | 0.1s | 5MB | Very fast | | |
| | KPI Generator | 0.3s | 10MB | Quick calculations | | |
| ### Bottlenecks | |
| 1. **RAG Search** - Initial model loading (~2s) | |
| - Solution: Keep model in memory | |
| 2. **Web Fetcher** - Network latency | |
| - Solution: Async requests, caching | |
| 3. **PDF Reader** - Large files | |
| - Solution: Stream processing | |
| --- | |
| ## ๐ State Management | |
| ### Stateless Design | |
| Each tool request is independent: | |
| - No session state | |
| - No user context | |
| - Pure function design | |
| **Benefits:** | |
| - Easy scaling | |
| - No state synchronization | |
| - Simple debugging | |
| - High availability | |
| ### RAG Store State | |
| Exception: RAG search maintains in-memory vector store: | |
| ```python | |
| class SimpleRAGStore: | |
| def __init__(self): | |
| self.documents = [] | |
| self.index = None # FAISS index | |
| ``` | |
| **Lifecycle:** | |
| - Created on first search | |
| - Persists during server lifetime | |
| - Cleared on server restart | |
| --- | |
| ## ๐งช Testing Architecture | |
| ### Test Pyramid | |
| ``` | |
| โโโโโโโโโโโโโโโ | |
| โ E2E Tests โ (MCP integration) | |
| โโโโโโโโโโโโโโโค | |
| โ Integration โ (Tool combinations) | |
| โโโโโโโโโโโโโโโค | |
| โ Unit Tests โ (Individual functions) | |
| โโโโโโโโโโโโโโโ | |
| ``` | |
| ### Test Coverage | |
| - **Unit Tests:** Test each function independently | |
| - **Integration Tests:** Test tool interactions | |
| - **MCP Tests:** Test server communication | |
| - **Sample Tests:** Test with real data | |
| --- | |
| ## ๐ฆ Dependency Management | |
| ### Core Dependencies | |
| ``` | |
| MCP SDK (>=1.0.0) | |
| โโโ stdio communication | |
| โโโ Tool registration | |
| Processing Libraries | |
| โโโ PyPDF2 (PDF reading) | |
| โโโ BeautifulSoup4 (HTML parsing) | |
| โโโ Pandas (Data processing) | |
| โโโ Matplotlib (Visualization) | |
| ML/NLP Libraries | |
| โโโ scikit-learn (Text processing) | |
| โโโ NLTK (Keyword extraction) | |
| โโโ sentence-transformers (Embeddings) | |
| โโโ FAISS (Vector search) | |
| ``` | |
| ### Optional Dependencies | |
| - faiss-cpu: Can use faiss-gpu on GPU systems | |
| - reportlab: Optional for PDF generation | |
| --- | |
| ## ๐ฎ Future Architecture Improvements | |
| ### Planned Enhancements | |
| 1. **Database Integration** | |
| ``` | |
| PostgreSQL for persistent storage | |
| Redis for caching | |
| ``` | |
| 2. **Async Processing** | |
| ```python | |
| async def process_pdf(file_path: str): | |
| # Async PDF processing | |
| return await asyncio.to_thread(read_pdf, file_path) | |
| ``` | |
| 3. **Microservices** | |
| ``` | |
| Each tool as separate service | |
| API gateway for routing | |
| Service mesh for communication | |
| ``` | |
| 4. **Monitoring** | |
| ``` | |
| Prometheus metrics | |
| Grafana dashboards | |
| Error tracking (Sentry) | |
| ``` | |
| --- | |
| ## ๐ Design Principles | |
| ### SOLID Principles | |
| - **Single Responsibility:** Each tool does one thing | |
| - **Open/Closed:** Easy to add new tools | |
| - **Liskov Substitution:** Tools are interchangeable | |
| - **Interface Segregation:** Minimal tool interfaces | |
| - **Dependency Inversion:** Tools depend on abstractions | |
| ### Clean Architecture | |
| - **Independent of Frameworks:** Core logic separate from MCP | |
| - **Testable:** Can test without MCP server | |
| - **Independent of UI:** Works with any MCP client | |
| - **Independent of Database:** No database coupling | |
| --- | |
| ## ๐ฏ Architectural Goals | |
| โ **Achieved:** | |
| - Modular design | |
| - Easy to extend | |
| - Well-documented | |
| - Testable | |
| - Production-ready | |
| ๐ **In Progress:** | |
| - Performance optimization | |
| - Enhanced caching | |
| - Better error handling | |
| ๐ฏ **Future:** | |
| - Multi-tenancy | |
| - Distributed processing | |
| - Advanced monitoring | |
| - Auto-scaling | |
| --- | |
| **MissionControlMCP Architecture Documentation v1.0** ๐๏ธ | |