Spaces:
Sleeping
Sleeping
| # mcp-server-integration | |
| ## table-of-contents | |
| 1. [Overview](#overview) | |
| 2. [Available MCP Servers](#available-mcp-servers) | |
| 3. [Tool Registry & Discovery](#tool-registry--discovery) | |
| 4. [HTML Processing MCPs](#html-processing-mcps) | |
| 5. [Lazy Loading System](#lazy-loading-system) | |
| 6. [MCP Composition](#mcp-composition) | |
| 7. [Testing Panel](#testing-panel) | |
| 8. [Configuration](#configuration) | |
| --- | |
| ## overview | |
| The **Model Context Protocol (MCP)** enables the WebScraper agent to interact with external tools, databases, and services through a standardized interface. MCP servers expose **tools** that the agent can discover and use dynamically. | |
| ### why-mcp | |
| **Without MCP:** | |
| - Agent limited to built-in capabilities | |
| - Cannot access external databases, APIs, or specialized libraries | |
| - Difficult to extend without code changes | |
| **With MCP:** | |
| - Dynamically discover and use 100+ community tools | |
| - Access databases (PostgreSQL, MongoDB, etc.) | |
| - Use specialized libraries (BeautifulSoup, Selenium, Playwright) | |
| - Integrate with external APIs (Google, GitHub, etc.) | |
| - Extend agent capabilities without code changes | |
| ### architecture | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β WebScraper Agent β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β MCP Tool Registry β β | |
| β β - Discovers available tools from all MCP servers β β | |
| β β - Provides tool metadata to agent β β | |
| β β - Routes tool calls to appropriate server β β | |
| β ββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββ β | |
| β β β | |
| βββββββββββββββββββββΌβββββββββββββββββββββββββββββββββββββββββββ | |
| β | |
| βββββββββββββΌββββββββββββ¬βββββββββββββββ¬ββββββββββββββ | |
| β β β β β | |
| βΌ βΌ βΌ βΌ βΌ | |
| ββββββββββββββββ βββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ | |
| β HTML Parser β βBrowser β β Database β β File β β Custom β | |
| β MCP β β MCP β β MCP β β System β β MCP β | |
| β β β β β β β MCP β β β | |
| ββ’ BeautifulSoupβββ’ Puppeteerβββ’ Postgresβββ’ Read βββ’ Your β | |
| ββ’ lxml βββ’ Playwrightβββ’ MongoDB ββββ’ Write ββ tools β | |
| ββ’ html5lib βββ’ Selenium βββ’ Redis ββββ’ Search ββ β | |
| ββββββββββββββββ βββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ | |
| ``` | |
| --- | |
| ## available-mcp-servers | |
| ### 1-html-processing-and-parsing | |
| #### beautifulsoup-mcp | |
| Advanced HTML parsing and extraction. | |
| **Tools:** | |
| - `parse_html(html: str, parser: str = "html.parser")` β Parse HTML into DOM tree | |
| - `find_all(html: str, selector: str)` β CSS selector search | |
| - `extract_text(html: str, selector: str)` β Extract text content | |
| - `extract_attributes(html: str, selector: str, attrs: List[str])` β Get element attributes | |
| - `clean_html(html: str)` β Remove scripts, styles, comments | |
| - `extract_tables(html: str)` β Parse all tables into structured data | |
| **Configuration:** | |
| ```json | |
| { | |
| "mcpServers": { | |
| "beautifulsoup": { | |
| "command": "python", | |
| "args": ["-m", "mcp_beautifulsoup"], | |
| "enabled": true, | |
| "autoDownload": true, | |
| "config": { | |
| "default_parser": "lxml", | |
| "encodings": ["utf-8", "latin-1"] | |
| } | |
| } | |
| } | |
| } | |
| ``` | |
| **Example Usage:** | |
| ```python | |
| # Agent action | |
| action = Action( | |
| action_type="MCP_TOOL_CALL", | |
| tool_name="beautifulsoup.find_all", | |
| tool_params={ | |
| "html": observation.page_html, | |
| "selector": "div.product-card" | |
| } | |
| ) | |
| # Response | |
| { | |
| "products": [ | |
| {"name": "Widget", "price": "$49.99"}, | |
| {"name": "Gadget", "price": "$39.99"} | |
| ] | |
| } | |
| ``` | |
| #### lxml-mcp | |
| Fast XML/HTML parsing with XPath support. | |
| **Tools:** | |
| - `xpath_query(html: str, xpath: str)` β XPath extraction | |
| - `css_select(html: str, css: str)` β CSS selector (fast) | |
| - `validate_html(html: str)` β Check well-formedness | |
| #### html5lib-mcp | |
| Standards-compliant HTML5 parsing. | |
| **Tools:** | |
| - `parse_html5(html: str)` β Parse like a browser would | |
| - `sanitize_html(html: str, allowed_tags: List[str])` β Safe HTML cleaning | |
| ### 2-browser-automation | |
| #### playwright-mcp | |
| Full browser automation with JavaScript rendering. | |
| **Tools:** | |
| - `navigate(url: str, wait_for: str = "networkidle")` β Load page with JS | |
| - `click(selector: str)` β Click element | |
| - `fill_form(selector: str, value: str)` β Fill input | |
| - `screenshot(selector: str = None)` β Capture screenshot | |
| - `wait_for_selector(selector: str, timeout: int = 5000)` β Wait for element | |
| - `execute_script(script: str)` β Run custom JavaScript | |
| **Use Cases:** | |
| - Pages with client-side rendering (React, Vue, Angular) | |
| - Infinite scroll / lazy loading | |
| - Forms and interactions | |
| - Captcha handling | |
| **Configuration:** | |
| ```json | |
| { | |
| "mcpServers": { | |
| "playwright": { | |
| "command": "npx", | |
| "args": ["@playwright/mcp-server"], | |
| "enabled": false, // Only enable when needed (heavy) | |
| "autoDownload": true, | |
| "config": { | |
| "browser": "chromium", | |
| "headless": true, | |
| "viewport": {"width": 1920, "height": 1080} | |
| } | |
| } | |
| } | |
| } | |
| ``` | |
| #### puppeteer-mcp | |
| Lightweight browser automation (Chrome DevTools Protocol). | |
| Similar to Playwright but lighter weight. | |
| #### selenium-mcp | |
| Legacy browser automation (more compatible, slower). | |
| ### 3-database-access | |
| #### postgresql-mcp | |
| Access PostgreSQL databases. | |
| **Tools:** | |
| - `query(sql: str, params: List = [])` β Execute SELECT | |
| - `execute(sql: str, params: List = [])` β Execute INSERT/UPDATE/DELETE | |
| - `list_tables()` β Get schema | |
| **Use Case:** Store scraped data directly to production database. | |
| #### mongodb-mcp | |
| Access MongoDB collections. | |
| **Tools:** | |
| - `find(collection: str, query: dict)` β Query documents | |
| - `insert(collection: str, document: dict)` β Insert document | |
| - `aggregate(collection: str, pipeline: List)` β Aggregation pipeline | |
| #### redis-mcp | |
| Fast cache and pub/sub. | |
| **Tools:** | |
| - `get(key: str)` β Retrieve cached value | |
| - `set(key: str, value: str, ttl: int)` β Cache value | |
| - `publish(channel: str, message: str)` β Pub/sub | |
| **Use Case:** Cache parsed HTML, share state between agents. | |
| ### 4-file-system | |
| #### filesystem-mcp | |
| Read/write local files. | |
| **Tools:** | |
| - `read_file(path: str)` β Read text/binary file | |
| - `write_file(path: str, content: str)` β Write file | |
| - `list_directory(path: str)` β List files | |
| - `search_files(pattern: str)` β Glob search | |
| **Use Case:** Save scraped data to CSV/JSON, read configuration files. | |
| ### 5-search-engines | |
| #### google-search-mcp | |
| Google Search API integration. | |
| **Tools:** | |
| - `search(query: str, num: int = 10)` β Google Search results | |
| - `search_images(query: str)` β Image search | |
| **Configuration:** | |
| ```json | |
| { | |
| "mcpServers": { | |
| "google-search": { | |
| "command": "python", | |
| "args": ["-m", "mcp_google_search"], | |
| "enabled": true, | |
| "autoDownload": true, | |
| "config": { | |
| "api_key": "YOUR_GOOGLE_API_KEY", | |
| "search_engine_id": "YOUR_SEARCH_ENGINE_ID" | |
| } | |
| } | |
| } | |
| } | |
| ``` | |
| #### bing-search-mcp | |
| Bing Search API. | |
| #### brave-search-mcp | |
| Privacy-focused search (Brave Search API). | |
| #### duckduckgo-mcp | |
| Free, no-API search. | |
| **Tools:** | |
| - `search(query: str, max_results: int = 10)` β DDG results | |
| ### 6-data-extraction | |
| #### readability-mcp | |
| Extract main article content (removes ads, navigation, etc.). | |
| **Tools:** | |
| - `extract_article(html: str)` β Returns clean article text + metadata | |
| **Use Case:** Extract blog posts, news articles, documentation. | |
| #### trafilatura-mcp | |
| Advanced web scraping and text extraction. | |
| **Tools:** | |
| - `extract(url: str)` β Extract main content | |
| - `extract_metadata(html: str)` β Get title, author, date, etc. | |
| #### newspaper-mcp | |
| News article extraction and NLP. | |
| **Tools:** | |
| - `parse_article(url: str)` β Full article data | |
| - `extract_keywords(text: str)` β Keyword extraction | |
| - `summarize(text: str)` β Auto-summarization | |
| ### 7-data-validation | |
| #### cerberus-mcp | |
| Schema validation for extracted data. | |
| **Tools:** | |
| - `validate(data: dict, schema: dict)` β Validate against schema | |
| **Example:** | |
| ```python | |
| # Define schema | |
| schema = { | |
| "product_name": {"type": "string", "required": True, "minlength": 1}, | |
| "price": {"type": "float", "required": True, "min": 0}, | |
| "rating": {"type": "float", "min": 0, "max": 5} | |
| } | |
| # Validate extracted data | |
| result = mcp.call("cerberus.validate", data=extracted_data, schema=schema) | |
| if not result["valid"]: | |
| print("Validation errors:", result["errors"]) | |
| ``` | |
| #### pydantic-mcp | |
| Pydantic model validation. | |
| ### 8-computer-vision | |
| #### ocr-mcp | |
| Extract text from images (Tesseract OCR). | |
| **Tools:** | |
| - `extract_text(image_path: str, lang: str = "eng")` β OCR text | |
| **Use Case:** Extract prices from product images, read captchas (if legal). | |
| #### image-analysis-mcp | |
| Vision AI (GPT-4 Vision, Claude Vision). | |
| **Tools:** | |
| - `describe_image(image_path: str)` β Natural language description | |
| - `extract_structured(image_path: str, schema: dict)` β Extract structured data from images | |
| ### 9-http-and-networking | |
| #### requests-mcp | |
| HTTP client with retry, session management. | |
| **Tools:** | |
| - `get(url: str, headers: dict = {})` β HTTP GET | |
| - `post(url: str, data: dict = {})` β HTTP POST | |
| #### proxy-manager-mcp | |
| Manage proxy rotation, IP reputation. | |
| **Tools:** | |
| - `get_proxy()` β Get next proxy from pool | |
| - `report_dead_proxy(proxy: str)` β Mark proxy as failed | |
| ### 10-utility | |
| #### regex-mcp | |
| Advanced regex operations. | |
| **Tools:** | |
| - `find_all(pattern: str, text: str)` β Find all matches | |
| - `replace(pattern: str, replacement: str, text: str)` β Regex replace | |
| - `validate(pattern: str)` β Check if regex is valid | |
| #### datetime-mcp | |
| Parse and normalize dates. | |
| **Tools:** | |
| - `parse_date(text: str)` β Parse natural language dates | |
| - `normalize_timezone(date: str, tz: str)` β Convert timezone | |
| #### currency-mcp | |
| Currency parsing and conversion. | |
| **Tools:** | |
| - `parse_price(text: str)` β Extract price and currency | |
| - `convert(amount: float, from_currency: str, to_currency: str)` β Convert | |
| --- | |
| ## tool-registry-and-discovery | |
| The **Tool Registry** automatically discovers all available tools from enabled MCP servers. | |
| ### architecture | |
| ```python | |
| class MCPToolRegistry: | |
| def __init__(self): | |
| self.servers: Dict[str, MCPServer] = {} | |
| self.tools: Dict[str, Tool] = {} # tool_name β Tool | |
| def discover_servers(self, config: MCPConfig): | |
| """Load and connect to all enabled MCP servers.""" | |
| for server_name, server_config in config.mcpServers.items(): | |
| if not server_config.enabled: | |
| continue | |
| # Auto-download if needed | |
| if server_config.autoDownload and not self.is_installed(server_config): | |
| self.download_and_install(server_name, server_config) | |
| # Connect to server | |
| server = self.connect_server(server_name, server_config) | |
| self.servers[server_name] = server | |
| # Discover tools | |
| for tool in server.list_tools(): | |
| full_name = f"{server_name}.{tool.name}" | |
| self.tools[full_name] = tool | |
| def get_tool(self, tool_name: str) -> Tool: | |
| """Get tool by fully qualified name (server.tool).""" | |
| return self.tools.get(tool_name) | |
| def search_tools(self, query: str, category: str = None) -> List[Tool]: | |
| """Search tools by natural language query.""" | |
| # Semantic search using tool descriptions | |
| candidates = list(self.tools.values()) | |
| if category: | |
| candidates = [t for t in candidates if t.category == category] | |
| # Embed query and tools, rank by similarity | |
| scored = [] | |
| for tool in candidates: | |
| score = self.semantic_similarity(query, tool.description) | |
| scored.append((tool, score)) | |
| scored.sort(key=lambda x: x[1], reverse=True) | |
| return [tool for tool, score in scored[:10]] | |
| ``` | |
| ### tool-metadata | |
| Each tool exposes rich metadata: | |
| ```python | |
| class Tool(BaseModel): | |
| name: str # e.g., "find_all" | |
| full_name: str # e.g., "beautifulsoup.find_all" | |
| server: str # Server name | |
| description: str # Human-readable description | |
| category: str # "parsing" | "browser" | "database" | ... | |
| input_schema: Dict[str, Any] # JSON Schema for parameters | |
| output_schema: Dict[str, Any] # JSON Schema for return value | |
| examples: List[ToolExample] # Usage examples | |
| cost: ToolCost # Time/resource cost estimate | |
| requires_auth: bool # Needs API keys? | |
| rate_limit: Optional[RateLimit] # Rate limiting info | |
| ``` | |
| **Example:** | |
| ```python | |
| Tool( | |
| name="find_all", | |
| full_name="beautifulsoup.find_all", | |
| server="beautifulsoup", | |
| description="Find all HTML elements matching a CSS selector", | |
| category="parsing", | |
| input_schema={ | |
| "type": "object", | |
| "properties": { | |
| "html": {"type": "string", "description": "HTML content to search"}, | |
| "selector": {"type": "string", "description": "CSS selector"} | |
| }, | |
| "required": ["html", "selector"] | |
| }, | |
| output_schema={ | |
| "type": "array", | |
| "items": {"type": "object"} | |
| }, | |
| examples=[ | |
| ToolExample( | |
| input={"html": "<div class='item'>A</div>", "selector": ".item"}, | |
| output=[{"tag": "div", "text": "A", "class": "item"}] | |
| ) | |
| ], | |
| cost=ToolCost(time_ms=10, cpu_intensive=False), | |
| requires_auth=False | |
| ) | |
| ``` | |
| ### auto-tool-discovery-by-agent | |
| The agent can query the registry to find relevant tools: | |
| ```python | |
| # Agent needs to parse HTML | |
| available_tools = tool_registry.search_tools( | |
| query="parse HTML and extract elements by CSS selector", | |
| category="parsing" | |
| ) | |
| # Top result: beautifulsoup.find_all | |
| tool = available_tools[0] | |
| # Agent calls the tool | |
| action = Action( | |
| action_type="MCP_TOOL_CALL", | |
| tool_name=tool.full_name, | |
| tool_params={ | |
| "html": observation.page_html, | |
| "selector": "div.product-price" | |
| } | |
| ) | |
| ``` | |
| --- | |
| ## html-processing-mcps | |
| ### beautifulsoup-mcp-detailed | |
| **Installation:** | |
| ```bash | |
| pip install mcp-beautifulsoup | |
| ``` | |
| **Tools:** | |
| #### 1-find-all-html-selector-limit-none | |
| Find all elements matching CSS selector. | |
| ```python | |
| result = mcp.call("beautifulsoup.find_all", { | |
| "html": "<div class='price'>$10</div><div class='price'>$20</div>", | |
| "selector": "div.price" | |
| }) | |
| # Returns: [{"text": "$10"}, {"text": "$20"}] | |
| ``` | |
| #### 2-find-one-html-selector | |
| Find first matching element. | |
| ```python | |
| result = mcp.call("beautifulsoup.find_one", { | |
| "html": obs.page_html, | |
| "selector": "h1.product-title" | |
| }) | |
| # Returns: {"text": "Widget Pro", "tag": "h1"} | |
| ``` | |
| #### 3-extract-tables-html | |
| Parse all `<table>` elements into structured data. | |
| ```python | |
| result = mcp.call("beautifulsoup.extract_tables", {"html": obs.page_html}) | |
| # Returns: | |
| [ | |
| { | |
| "headers": ["Product", "Price", "Stock"], | |
| "rows": [ | |
| ["Widget", "$49.99", "In Stock"], | |
| ["Gadget", "$39.99", "Out of Stock"] | |
| ] | |
| } | |
| ] | |
| ``` | |
| #### 4-extract-links-html-base-url-none | |
| Extract all links from page. | |
| ```python | |
| result = mcp.call("beautifulsoup.extract_links", { | |
| "html": obs.page_html, | |
| "base_url": "https://example.com" | |
| }) | |
| # Returns: | |
| [ | |
| {"url": "https://example.com/product/123", "text": "View Product"}, | |
| {"url": "https://example.com/category/widgets", "text": "Widgets"} | |
| ] | |
| ``` | |
| #### 5-clean-html-html-remove-script-style-noscript | |
| Remove unwanted elements. | |
| ```python | |
| result = mcp.call("beautifulsoup.clean_html", { | |
| "html": obs.page_html, | |
| "remove": ["script", "style", "footer", "nav"] | |
| }) | |
| # Returns: Clean HTML without ads, scripts, navigation | |
| ``` | |
| #### 6-smart-extract-html-field-name | |
| Intelligent extraction based on field name. | |
| ```python | |
| # Agent wants to extract "price" | |
| result = mcp.call("beautifulsoup.smart_extract", { | |
| "html": obs.page_html, | |
| "field_name": "price" | |
| }) | |
| # MCP searches for: | |
| # - Elements with class/id containing "price" | |
| # - Text matching price patterns ($X.XX, β¬X,XX) | |
| # - Schema.org markup (itemprop="price") | |
| # Returns: {"value": "$49.99", "confidence": 0.92, "selector": "span.product-price"} | |
| ``` | |
| ### batch-processing-for-long-content | |
| When HTML is too large (> 100KB), process in batches: | |
| ```python | |
| class HTMLBatchProcessor: | |
| def __init__(self, mcp_client, chunk_size: int = 50000): | |
| self.mcp = mcp_client | |
| self.chunk_size = chunk_size | |
| def process_large_html(self, html: str, selector: str) -> List[Dict]: | |
| """Process large HTML in chunks.""" | |
| # Split HTML into meaningful chunks (by sections, not mid-tag) | |
| chunks = self.split_html_intelligently(html) | |
| results = [] | |
| for i, chunk in enumerate(chunks): | |
| # Process each chunk | |
| chunk_results = self.mcp.call("beautifulsoup.find_all", { | |
| "html": chunk, | |
| "selector": selector | |
| }) | |
| # Deduplicate across chunk boundaries | |
| results.extend(self.deduplicate(chunk_results, results)) | |
| return results | |
| def split_html_intelligently(self, html: str) -> List[str]: | |
| """Split HTML at section boundaries, not mid-tag.""" | |
| soup = BeautifulSoup(html, 'lxml') | |
| # Split by major sections (article, section, div.container, etc.) | |
| sections = soup.find_all(['article', 'section', 'main']) | |
| chunks = [] | |
| current_chunk = "" | |
| for section in sections: | |
| section_html = str(section) | |
| if len(current_chunk) + len(section_html) > self.chunk_size: | |
| chunks.append(current_chunk) | |
| current_chunk = section_html | |
| else: | |
| current_chunk += section_html | |
| if current_chunk: | |
| chunks.append(current_chunk) | |
| return chunks | |
| ``` | |
| --- | |
| ## lazy-loading-system | |
| MCP servers are **NOT downloaded by default**. They are installed on-demand when first used. | |
| ### download-on-demand-flow | |
| ``` | |
| Agent wants to use a tool | |
| β | |
| βΌ | |
| Is MCP server installed? | |
| β | |
| ββββββ΄βββββ | |
| No Yes | |
| β β | |
| βΌ βΌ | |
| Show dialog Execute tool | |
| "Download | |
| server X?" | |
| β | |
| βββββ΄ββββ | |
| No Yes | |
| β β | |
| Skip Download & Install | |
| β | |
| βΌ | |
| Cache for future use | |
| β | |
| βΌ | |
| Execute tool | |
| ``` | |
| ### implementation | |
| ```python | |
| class LazyMCPLoader: | |
| def __init__(self): | |
| self.installed_servers: Set[str] = set() | |
| self.download_queue: Queue[str] = Queue() | |
| def ensure_server(self, server_name: str, config: MCPServerConfig) -> bool: | |
| """Ensure MCP server is installed, download if needed.""" | |
| if server_name in self.installed_servers: | |
| return True | |
| if not config.autoDownload: | |
| # Prompt user | |
| if not self.prompt_user_download(server_name): | |
| return False | |
| # Download and install | |
| return self.download_server(server_name, config) | |
| def download_server(self, server_name: str, config: MCPServerConfig) -> bool: | |
| """Download and install MCP server.""" | |
| try: | |
| logger.info(f"Downloading MCP server: {server_name}") | |
| if config.command == "npx": | |
| # NPM package | |
| subprocess.run([ | |
| "npm", "install", "-g", config.args[1] | |
| ], check=True) | |
| elif config.command == "python": | |
| # Python package | |
| package_name = config.args[1].replace("-m ", "") | |
| subprocess.run([ | |
| "pip", "install", package_name | |
| ], check=True) | |
| self.installed_servers.add(server_name) | |
| logger.info(f" Installed {server_name}") | |
| return True | |
| except Exception as e: | |
| logger.error(f"Failed to install {server_name}: {e}") | |
| return False | |
| def prompt_user_download(self, server_name: str) -> bool: | |
| """Ask user if they want to download the server.""" | |
| # In UI, show dialog: | |
| # "Tool X requires MCP server Y. Download and install? (50MB) [Yes] [No]" | |
| return self.show_download_dialog(server_name) | |
| ``` | |
| ### ui-dialog | |
| ``` | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β MCP Server Required β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β β | |
| β The tool "beautifulsoup.find_all" requires the MCP β | |
| β server "beautifulsoup" which is not installed. β | |
| β β | |
| β Package: mcp-beautifulsoup β | |
| β Size: ~5 MB β | |
| β β | |
| β Would you like to download and install it now? β | |
| β β | |
| β [Download & Install] [Skip] β | |
| β β | |
| β Remember my choice for this server β | |
| ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| --- | |
| ## mcp-composition | |
| Combine multiple MCP tools to create powerful workflows. | |
| ### example-1-parse-html-extract-tables-save-to-database | |
| ```python | |
| # Step 1: Clean HTML | |
| cleaned = mcp.call("beautifulsoup.clean_html", { | |
| "html": observation.page_html | |
| }) | |
| # Step 2: Extract tables | |
| tables = mcp.call("beautifulsoup.extract_tables", { | |
| "html": cleaned["html"] | |
| }) | |
| # Step 3: Save to PostgreSQL | |
| for table in tables: | |
| mcp.call("postgresql.execute", { | |
| "sql": "INSERT INTO scraped_data (data) VALUES (%s)", | |
| "params": [json.dumps(table)] | |
| }) | |
| ``` | |
| ### example-2-search-google-navigate-parse-article-summarize | |
| ```python | |
| # Step 1: Search | |
| results = mcp.call("google-search.search", { | |
| "query": "best widgets 2026", | |
| "num": 5 | |
| }) | |
| # Step 2: Navigate to top result | |
| mcp.call("playwright.navigate", { | |
| "url": results[0]["url"] | |
| }) | |
| # Step 3: Extract article | |
| article = mcp.call("readability.extract_article", { | |
| "html": mcp.call("playwright.get_html", {}) | |
| }) | |
| # Step 4: Summarize | |
| summary = mcp.call("llm.summarize", { | |
| "text": article["text"], | |
| "max_length": 200 | |
| }) | |
| ``` | |
| ### composition-dsl | |
| Define reusable workflows: | |
| ```python | |
| class MCPWorkflow: | |
| def __init__(self, name: str, steps: List[WorkflowStep]): | |
| self.name = name | |
| self.steps = steps | |
| async def execute(self, initial_input: Dict) -> Dict: | |
| """Execute workflow steps sequentially.""" | |
| context = initial_input | |
| for step in self.steps: | |
| result = await mcp.call(step.tool, step.params(context)) | |
| context[step.output_var] = result | |
| return context | |
| # Define workflow | |
| extract_and_save = MCPWorkflow( | |
| name="extract_and_save", | |
| steps=[ | |
| WorkflowStep( | |
| tool="beautifulsoup.find_all", | |
| params=lambda ctx: {"html": ctx["html"], "selector": ctx["selector"]}, | |
| output_var="extracted" | |
| ), | |
| WorkflowStep( | |
| tool="cerberus.validate", | |
| params=lambda ctx: {"data": ctx["extracted"], "schema": ctx["schema"]}, | |
| output_var="validated" | |
| ), | |
| WorkflowStep( | |
| tool="postgresql.execute", | |
| params=lambda ctx: {"sql": "INSERT INTO items ...", "params": ctx["validated"]}, | |
| output_var="saved" | |
| ) | |
| ] | |
| ) | |
| # Execute | |
| result = await extract_and_save.execute({ | |
| "html": obs.page_html, | |
| "selector": "div.product", | |
| "schema": PRODUCT_SCHEMA | |
| }) | |
| ``` | |
| --- | |
| ## testing-panel | |
| Test MCP tools manually before using them in agent workflows. | |
| ### ui | |
| ``` | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| β MCP Testing Panel β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€ | |
| β β | |
| β Server: [beautifulsoup βΌ] β | |
| β Tool: [find_all βΌ] β | |
| β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β Input Parameters: β β | |
| β β β β | |
| β β html: β β | |
| β β βββββββββββββββββββββββββββββββββββββββββββββββββ β β | |
| β β β <div class="item">Item 1</div> β β β | |
| β β β <div class="item">Item 2</div> β β β | |
| β β βββββββββββββββββββββββββββββββββββββββββββββββββ β β | |
| β β β β | |
| β β selector: [div.item ] β β | |
| β β β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β | |
| β [Execute Tool] [Clear] β | |
| β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β Output: β β | |
| β β β β | |
| β β [ β β | |
| β β {"tag": "div", "class": "item", "text": "Item 1"}, β β | |
| β β {"tag": "div", "class": "item", "text": "Item 2"} β β | |
| β β ] β β | |
| β β β β | |
| β β Execution time: 12ms β β | |
| β β Status: Success β β | |
| β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β | |
| β β | |
| β [Save as Example] β | |
| βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | |
| ``` | |
| --- | |
| ## configuration | |
| ### full-mcp-configuration-example | |
| ```json | |
| { | |
| "mcpServers": { | |
| "beautifulsoup": { | |
| "command": "python", | |
| "args": ["-m", "mcp_beautifulsoup"], | |
| "enabled": true, | |
| "autoDownload": true, | |
| "config": { | |
| "default_parser": "lxml" | |
| } | |
| }, | |
| "playwright": { | |
| "command": "npx", | |
| "args": ["@playwright/mcp-server"], | |
| "enabled": false, | |
| "autoDownload": false, | |
| "config": { | |
| "browser": "chromium", | |
| "headless": true | |
| } | |
| }, | |
| "postgresql": { | |
| "command": "python", | |
| "args": ["-m", "mcp_postgresql"], | |
| "enabled": false, | |
| "autoDownload": false, | |
| "config": { | |
| "host": "localhost", | |
| "port": 5432, | |
| "database": "scraper_db", | |
| "user": "postgres", | |
| "password": "${PG_PASSWORD}" | |
| } | |
| }, | |
| "google-search": { | |
| "command": "python", | |
| "args": ["-m", "mcp_google_search"], | |
| "enabled": true, | |
| "autoDownload": true, | |
| "config": { | |
| "api_key": "${GOOGLE_API_KEY}", | |
| "search_engine_id": "${GOOGLE_SE_ID}" | |
| } | |
| }, | |
| "filesystem": { | |
| "command": "npx", | |
| "args": ["-y", "@modelcontextprotocol/server-filesystem", "./scraped_data"], | |
| "enabled": true, | |
| "autoDownload": true | |
| } | |
| }, | |
| "mcpSettings": { | |
| "autoDiscoverTools": true, | |
| "toolTimeout": 30, | |
| "maxConcurrentCalls": 5, | |
| "retryFailedCalls": true, | |
| "cacheToolResults": true, | |
| "cacheTTL": 3600 | |
| } | |
| } | |
| ``` | |
| --- | |
| **Next:** See [settings.md](./settings.md) for complete dashboard settings. | |
| ## related-api-reference | |
| | item | value | | |
| | --- | --- | | |
| | api-reference | `api-reference.md` | | |
| ## document-metadata | |
| | key | value | | |
| | --- | --- | | |
| | document | `mcp.md` | | |
| | status | active | | |
| ## document-flow | |
| ```mermaid | |
| flowchart TD | |
| A[document] --> B[key-sections] | |
| B --> C[implementation] | |
| B --> D[operations] | |
| B --> E[validation] | |
| ``` | |