Spaces:

NeerajCodz
/

scrapeRL

Sleeping

App Files Files Community

scrapeRL / docs /mcp.md

NeerajCodz

docs: init proto

24f0bf0 about 1 month ago

preview code

raw

history blame contribute delete

31.5 kB

	# mcp-server-integration

	## table-of-contents
	1. [Overview](#overview)
	2. [Available MCP Servers](#available-mcp-servers)
	3. [Tool Registry & Discovery](#tool-registry--discovery)
	4. [HTML Processing MCPs](#html-processing-mcps)
	5. [Lazy Loading System](#lazy-loading-system)
	6. [MCP Composition](#mcp-composition)
	7. [Testing Panel](#testing-panel)
	8. [Configuration](#configuration)

	---

	## overview

	The Model Context Protocol (MCP) enables the WebScraper agent to interact with external tools, databases, and services through a standardized interface. MCP servers expose tools that the agent can discover and use dynamically.

	### why-mcp

	Without MCP:
	- Agent limited to built-in capabilities
	- Cannot access external databases, APIs, or specialized libraries
	- Difficult to extend without code changes

	With MCP:
	- Dynamically discover and use 100+ community tools
	- Access databases (PostgreSQL, MongoDB, etc.)
	- Use specialized libraries (BeautifulSoup, Selenium, Playwright)
	- Integrate with external APIs (Google, GitHub, etc.)
	- Extend agent capabilities without code changes

	### architecture

	```
	┌─────────────────────────────────────────────────────────────┐
	│ WebScraper Agent │
	├─────────────────────────────────────────────────────────────┤
	│ │
	│ ┌────────────────────────────────────────────────────┐ │
	│ │ MCP Tool Registry │ │
	│ │ - Discovers available tools from all MCP servers │ │
	│ │ - Provides tool metadata to agent │ │
	│ │ - Routes tool calls to appropriate server │ │
	│ └────────────────┬───────────────────────────────────┘ │
	│ │ │
	└───────────────────┼──────────────────────────────────────────┘
	│
	┌───────────┼───────────┬──────────────┬─────────────┐
	│ │ │ │ │
	▼ ▼ ▼ ▼ ▼
	┌──────────────┐ ┌─────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
	│ HTML Parser │ │Browser │ │ Database │ │ File │ │ Custom │
	│ MCP │ │ MCP │ │ MCP │ │ System │ │ MCP │
	│ │ │ │ │ │ │ MCP │ │ │
	│• BeautifulSoup││• Puppeteer││• Postgres││• Read ││• Your │
	│• lxml ││• Playwright││• MongoDB │││• Write ││ tools │
	│• html5lib ││• Selenium ││• Redis │││• Search ││ │
	└──────────────┘ └─────────┘ └──────────┘ └──────────┘ └──────────┘
	```

	---

	## available-mcp-servers

	### 1-html-processing-and-parsing

	#### beautifulsoup-mcp
	Advanced HTML parsing and extraction.

	Tools:
	- `parse_html(html: str, parser: str = "html.parser")` → Parse HTML into DOM tree
	- `find_all(html: str, selector: str)` → CSS selector search
	- `extract_text(html: str, selector: str)` → Extract text content
	- `extract_attributes(html: str, selector: str, attrs: List[str])` → Get element attributes
	- `clean_html(html: str)` → Remove scripts, styles, comments
	- `extract_tables(html: str)` → Parse all tables into structured data

	Configuration:
	```json
	{
	"mcpServers": {
	"beautifulsoup": {
	"command": "python",
	"args": ["-m", "mcp_beautifulsoup"],
	"enabled": true,
	"autoDownload": true,
	"config": {
	"default_parser": "lxml",
	"encodings": ["utf-8", "latin-1"]
	}
	}
	}
	}
	```

	Example Usage:
	```python
	# Agent action
	action = Action(
	action_type="MCP_TOOL_CALL",
	tool_name="beautifulsoup.find_all",
	tool_params={
	"html": observation.page_html,
	"selector": "div.product-card"
	}
	)

	# Response
	{
	"products": [
	{"name": "Widget", "price": "$49.99"},
	{"name": "Gadget", "price": "$39.99"}
	]
	}
	```

	#### lxml-mcp
	Fast XML/HTML parsing with XPath support.

	Tools:
	- `xpath_query(html: str, xpath: str)` → XPath extraction
	- `css_select(html: str, css: str)` → CSS selector (fast)
	- `validate_html(html: str)` → Check well-formedness

	#### html5lib-mcp
	Standards-compliant HTML5 parsing.

	Tools:
	- `parse_html5(html: str)` → Parse like a browser would
	- `sanitize_html(html: str, allowed_tags: List[str])` → Safe HTML cleaning

	### 2-browser-automation

	#### playwright-mcp
	Full browser automation with JavaScript rendering.

	Tools:
	- `navigate(url: str, wait_for: str = "networkidle")` → Load page with JS
	- `click(selector: str)` → Click element
	- `fill_form(selector: str, value: str)` → Fill input
	- `screenshot(selector: str = None)` → Capture screenshot
	- `wait_for_selector(selector: str, timeout: int = 5000)` → Wait for element
	- `execute_script(script: str)` → Run custom JavaScript

	Use Cases:
	- Pages with client-side rendering (React, Vue, Angular)
	- Infinite scroll / lazy loading
	- Forms and interactions
	- Captcha handling

	Configuration:
	```json
	{
	"mcpServers": {
	"playwright": {
	"command": "npx",
	"args": ["@playwright/mcp-server"],
	"enabled": false, // Only enable when needed (heavy)
	"autoDownload": true,
	"config": {
	"browser": "chromium",
	"headless": true,
	"viewport": {"width": 1920, "height": 1080}
	}
	}
	}
	}
	```

	#### puppeteer-mcp
	Lightweight browser automation (Chrome DevTools Protocol).

	Similar to Playwright but lighter weight.

	#### selenium-mcp
	Legacy browser automation (more compatible, slower).

	### 3-database-access

	#### postgresql-mcp
	Access PostgreSQL databases.

	Tools:
	- `query(sql: str, params: List = [])` → Execute SELECT
	- `execute(sql: str, params: List = [])` → Execute INSERT/UPDATE/DELETE
	- `list_tables()` → Get schema

	Use Case: Store scraped data directly to production database.

	#### mongodb-mcp
	Access MongoDB collections.

	Tools:
	- `find(collection: str, query: dict)` → Query documents
	- `insert(collection: str, document: dict)` → Insert document
	- `aggregate(collection: str, pipeline: List)` → Aggregation pipeline

	#### redis-mcp
	Fast cache and pub/sub.

	Tools:
	- `get(key: str)` → Retrieve cached value
	- `set(key: str, value: str, ttl: int)` → Cache value
	- `publish(channel: str, message: str)` → Pub/sub

	Use Case: Cache parsed HTML, share state between agents.

	### 4-file-system

	#### filesystem-mcp
	Read/write local files.

	Tools:
	- `read_file(path: str)` → Read text/binary file
	- `write_file(path: str, content: str)` → Write file
	- `list_directory(path: str)` → List files
	- `search_files(pattern: str)` → Glob search

	Use Case: Save scraped data to CSV/JSON, read configuration files.

	### 5-search-engines

	#### google-search-mcp
	Google Search API integration.

	Tools:
	- `search(query: str, num: int = 10)` → Google Search results
	- `search_images(query: str)` → Image search

	Configuration:
	```json
	{
	"mcpServers": {
	"google-search": {
	"command": "python",
	"args": ["-m", "mcp_google_search"],
	"enabled": true,
	"autoDownload": true,
	"config": {
	"api_key": "YOUR_GOOGLE_API_KEY",
	"search_engine_id": "YOUR_SEARCH_ENGINE_ID"
	}
	}
	}
	}
	```

	#### bing-search-mcp
	Bing Search API.

	#### brave-search-mcp
	Privacy-focused search (Brave Search API).

	#### duckduckgo-mcp
	Free, no-API search.

	Tools:
	- `search(query: str, max_results: int = 10)` → DDG results

	### 6-data-extraction

	#### readability-mcp
	Extract main article content (removes ads, navigation, etc.).

	Tools:
	- `extract_article(html: str)` → Returns clean article text + metadata

	Use Case: Extract blog posts, news articles, documentation.

	#### trafilatura-mcp
	Advanced web scraping and text extraction.

	Tools:
	- `extract(url: str)` → Extract main content
	- `extract_metadata(html: str)` → Get title, author, date, etc.

	#### newspaper-mcp
	News article extraction and NLP.

	Tools:
	- `parse_article(url: str)` → Full article data
	- `extract_keywords(text: str)` → Keyword extraction
	- `summarize(text: str)` → Auto-summarization

	### 7-data-validation

	#### cerberus-mcp
	Schema validation for extracted data.

	Tools:
	- `validate(data: dict, schema: dict)` → Validate against schema

	Example:
	```python
	# Define schema
	schema = {
	"product_name": {"type": "string", "required": True, "minlength": 1},
	"price": {"type": "float", "required": True, "min": 0},
	"rating": {"type": "float", "min": 0, "max": 5}
	}

	# Validate extracted data
	result = mcp.call("cerberus.validate", data=extracted_data, schema=schema)
	if not result["valid"]:
	print("Validation errors:", result["errors"])
	```

	#### pydantic-mcp
	Pydantic model validation.

	### 8-computer-vision

	#### ocr-mcp
	Extract text from images (Tesseract OCR).

	Tools:
	- `extract_text(image_path: str, lang: str = "eng")` → OCR text

	Use Case: Extract prices from product images, read captchas (if legal).

	#### image-analysis-mcp
	Vision AI (GPT-4 Vision, Claude Vision).

	Tools:
	- `describe_image(image_path: str)` → Natural language description
	- `extract_structured(image_path: str, schema: dict)` → Extract structured data from images

	### 9-http-and-networking

	#### requests-mcp
	HTTP client with retry, session management.

	Tools:
	- `get(url: str, headers: dict = {})` → HTTP GET
	- `post(url: str, data: dict = {})` → HTTP POST

	#### proxy-manager-mcp
	Manage proxy rotation, IP reputation.

	Tools:
	- `get_proxy()` → Get next proxy from pool
	- `report_dead_proxy(proxy: str)` → Mark proxy as failed

	### 10-utility

	#### regex-mcp
	Advanced regex operations.

	Tools:
	- `find_all(pattern: str, text: str)` → Find all matches
	- `replace(pattern: str, replacement: str, text: str)` → Regex replace
	- `validate(pattern: str)` → Check if regex is valid

	#### datetime-mcp
	Parse and normalize dates.

	Tools:
	- `parse_date(text: str)` → Parse natural language dates
	- `normalize_timezone(date: str, tz: str)` → Convert timezone

	#### currency-mcp
	Currency parsing and conversion.

	Tools:
	- `parse_price(text: str)` → Extract price and currency
	- `convert(amount: float, from_currency: str, to_currency: str)` → Convert

	---

	## tool-registry-and-discovery

	The Tool Registry automatically discovers all available tools from enabled MCP servers.

	### architecture

	```python
	class MCPToolRegistry:
	def __init__(self):
	self.servers: Dict[str, MCPServer] = {}
	self.tools: Dict[str, Tool] = {} # tool_name → Tool

	def discover_servers(self, config: MCPConfig):
	"""Load and connect to all enabled MCP servers."""
	for server_name, server_config in config.mcpServers.items():
	if not server_config.enabled:
	continue

	# Auto-download if needed
	if server_config.autoDownload and not self.is_installed(server_config):
	self.download_and_install(server_name, server_config)

	# Connect to server
	server = self.connect_server(server_name, server_config)
	self.servers[server_name] = server

	# Discover tools
	for tool in server.list_tools():
	full_name = f"{server_name}.{tool.name}"
	self.tools[full_name] = tool

	def get_tool(self, tool_name: str) -> Tool:
	"""Get tool by fully qualified name (server.tool)."""
	return self.tools.get(tool_name)

	def search_tools(self, query: str, category: str = None) -> List[Tool]:
	"""Search tools by natural language query."""
	# Semantic search using tool descriptions
	candidates = list(self.tools.values())

	if category:
	candidates = [t for t in candidates if t.category == category]

	# Embed query and tools, rank by similarity
	scored = []
	for tool in candidates:
	score = self.semantic_similarity(query, tool.description)
	scored.append((tool, score))

	scored.sort(key=lambda x: x[1], reverse=True)
	return [tool for tool, score in scored[:10]]
	```

	### tool-metadata

	Each tool exposes rich metadata:

	```python
	class Tool(BaseModel):
	name: str # e.g., "find_all"
	full_name: str # e.g., "beautifulsoup.find_all"
	server: str # Server name
	description: str # Human-readable description
	category: str # "parsing" \| "browser" \| "database" \| ...
	input_schema: Dict[str, Any] # JSON Schema for parameters
	output_schema: Dict[str, Any] # JSON Schema for return value
	examples: List[ToolExample] # Usage examples
	cost: ToolCost # Time/resource cost estimate
	requires_auth: bool # Needs API keys?
	rate_limit: Optional[RateLimit] # Rate limiting info
	```

	Example:
	```python
	Tool(
	name="find_all",
	full_name="beautifulsoup.find_all",
	server="beautifulsoup",
	description="Find all HTML elements matching a CSS selector",
	category="parsing",
	input_schema={
	"type": "object",
	"properties": {
	"html": {"type": "string", "description": "HTML content to search"},
	"selector": {"type": "string", "description": "CSS selector"}
	},
	"required": ["html", "selector"]
	},
	output_schema={
	"type": "array",
	"items": {"type": "object"}
	},
	examples=[
	ToolExample(
	input={"html": "<div class='item'>A</div>", "selector": ".item"},
	output=[{"tag": "div", "text": "A", "class": "item"}]
	)
	],
	cost=ToolCost(time_ms=10, cpu_intensive=False),
	requires_auth=False
	)
	```

	### auto-tool-discovery-by-agent

	The agent can query the registry to find relevant tools:

	```python
	# Agent needs to parse HTML
	available_tools = tool_registry.search_tools(
	query="parse HTML and extract elements by CSS selector",
	category="parsing"
	)

	# Top result: beautifulsoup.find_all
	tool = available_tools[0]

	# Agent calls the tool
	action = Action(
	action_type="MCP_TOOL_CALL",
	tool_name=tool.full_name,
	tool_params={
	"html": observation.page_html,
	"selector": "div.product-price"
	}
	)
	```

	---

	## html-processing-mcps

	### beautifulsoup-mcp-detailed

	Installation:
	```bash
	pip install mcp-beautifulsoup
	```

	Tools:

	#### 1-find-all-html-selector-limit-none
	Find all elements matching CSS selector.

	```python
	result = mcp.call("beautifulsoup.find_all", {
	"html": "<div class='price'>$10</div><div class='price'>$20</div>",
	"selector": "div.price"
	})
	# Returns: [{"text": "$10"}, {"text": "$20"}]
	```

	#### 2-find-one-html-selector
	Find first matching element.

	```python
	result = mcp.call("beautifulsoup.find_one", {
	"html": obs.page_html,
	"selector": "h1.product-title"
	})
	# Returns: {"text": "Widget Pro", "tag": "h1"}
	```

	#### 3-extract-tables-html
	Parse all `<table>` elements into structured data.

	```python
	result = mcp.call("beautifulsoup.extract_tables", {"html": obs.page_html})
	# Returns:
	[
	{
	"headers": ["Product", "Price", "Stock"],
	"rows": [
	["Widget", "$49.99", "In Stock"],
	["Gadget", "$39.99", "Out of Stock"]
	]
	}
	]
	```

	#### 4-extract-links-html-base-url-none
	Extract all links from page.

	```python
	result = mcp.call("beautifulsoup.extract_links", {
	"html": obs.page_html,
	"base_url": "https://example.com"
	})
	# Returns:
	[
	{"url": "https://example.com/product/123", "text": "View Product"},
	{"url": "https://example.com/category/widgets", "text": "Widgets"}
	]
	```

	#### 5-clean-html-html-remove-script-style-noscript
	Remove unwanted elements.

	```python
	result = mcp.call("beautifulsoup.clean_html", {
	"html": obs.page_html,
	"remove": ["script", "style", "footer", "nav"]
	})
	# Returns: Clean HTML without ads, scripts, navigation
	```

	#### 6-smart-extract-html-field-name
	Intelligent extraction based on field name.

	```python
	# Agent wants to extract "price"
	result = mcp.call("beautifulsoup.smart_extract", {
	"html": obs.page_html,
	"field_name": "price"
	})
	# MCP searches for:
	# - Elements with class/id containing "price"
	# - Text matching price patterns ($X.XX, €X,XX)
	# - Schema.org markup (itemprop="price")
	# Returns: {"value": "$49.99", "confidence": 0.92, "selector": "span.product-price"}
	```

	### batch-processing-for-long-content

	When HTML is too large (> 100KB), process in batches:

	```python
	class HTMLBatchProcessor:
	def __init__(self, mcp_client, chunk_size: int = 50000):
	self.mcp = mcp_client
	self.chunk_size = chunk_size

	def process_large_html(self, html: str, selector: str) -> List[Dict]:
	"""Process large HTML in chunks."""
	# Split HTML into meaningful chunks (by sections, not mid-tag)
	chunks = self.split_html_intelligently(html)

	results = []
	for i, chunk in enumerate(chunks):
	# Process each chunk
	chunk_results = self.mcp.call("beautifulsoup.find_all", {
	"html": chunk,
	"selector": selector
	})

	# Deduplicate across chunk boundaries
	results.extend(self.deduplicate(chunk_results, results))

	return results

	def split_html_intelligently(self, html: str) -> List[str]:
	"""Split HTML at section boundaries, not mid-tag."""
	soup = BeautifulSoup(html, 'lxml')

	# Split by major sections (article, section, div.container, etc.)
	sections = soup.find_all(['article', 'section', 'main'])

	chunks = []
	current_chunk = ""

	for section in sections:
	section_html = str(section)

	if len(current_chunk) + len(section_html) > self.chunk_size:
	chunks.append(current_chunk)
	current_chunk = section_html
	else:
	current_chunk += section_html

	if current_chunk:
	chunks.append(current_chunk)

	return chunks
	```

	---

	## lazy-loading-system

	MCP servers are NOT downloaded by default. They are installed on-demand when first used.

	### download-on-demand-flow

	```
	Agent wants to use a tool
	│
	▼
	Is MCP server installed?
	│
	┌────┴────┐
	No Yes
	│ │
	▼ ▼
	Show dialog Execute tool
	"Download
	server X?"
	│
	┌───┴───┐
	No Yes
	│ │
	Skip Download & Install
	│
	▼
	Cache for future use
	│
	▼
	Execute tool
	```

	### implementation

	```python
	class LazyMCPLoader:
	def __init__(self):
	self.installed_servers: Set[str] = set()
	self.download_queue: Queue[str] = Queue()

	def ensure_server(self, server_name: str, config: MCPServerConfig) -> bool:
	"""Ensure MCP server is installed, download if needed."""
	if server_name in self.installed_servers:
	return True

	if not config.autoDownload:
	# Prompt user
	if not self.prompt_user_download(server_name):
	return False

	# Download and install
	return self.download_server(server_name, config)

	def download_server(self, server_name: str, config: MCPServerConfig) -> bool:
	"""Download and install MCP server."""
	try:
	logger.info(f"Downloading MCP server: {server_name}")

	if config.command == "npx":
	# NPM package
	subprocess.run([
	"npm", "install", "-g", config.args[1]
	], check=True)

	elif config.command == "python":
	# Python package
	package_name = config.args[1].replace("-m ", "")
	subprocess.run([
	"pip", "install", package_name
	], check=True)

	self.installed_servers.add(server_name)
	logger.info(f" Installed {server_name}")
	return True

	except Exception as e:
	logger.error(f"Failed to install {server_name}: {e}")
	return False

	def prompt_user_download(self, server_name: str) -> bool:
	"""Ask user if they want to download the server."""
	# In UI, show dialog:
	# "Tool X requires MCP server Y. Download and install? (50MB) [Yes] [No]"
	return self.show_download_dialog(server_name)
	```

	### ui-dialog

	```
	┌──────────────────────────────────────────────────────────┐
	│ MCP Server Required │
	├──────────────────────────────────────────────────────────┤
	│ │
	│ The tool "beautifulsoup.find_all" requires the MCP │
	│ server "beautifulsoup" which is not installed. │
	│ │
	│ Package: mcp-beautifulsoup │
	│ Size: ~5 MB │
	│ │
	│ Would you like to download and install it now? │
	│ │
	│ [Download & Install] [Skip] │
	│ │
	│ Remember my choice for this server │
	└──────────────────────────────────────────────────────────┘
	```

	---

	## mcp-composition

	Combine multiple MCP tools to create powerful workflows.

	### example-1-parse-html-extract-tables-save-to-database

	```python
	# Step 1: Clean HTML
	cleaned = mcp.call("beautifulsoup.clean_html", {
	"html": observation.page_html
	})

	# Step 2: Extract tables
	tables = mcp.call("beautifulsoup.extract_tables", {
	"html": cleaned["html"]
	})

	# Step 3: Save to PostgreSQL
	for table in tables:
	mcp.call("postgresql.execute", {
	"sql": "INSERT INTO scraped_data (data) VALUES (%s)",
	"params": [json.dumps(table)]
	})
	```

	### example-2-search-google-navigate-parse-article-summarize

	```python
	# Step 1: Search
	results = mcp.call("google-search.search", {
	"query": "best widgets 2026",
	"num": 5
	})

	# Step 2: Navigate to top result
	mcp.call("playwright.navigate", {
	"url": results[0]["url"]
	})

	# Step 3: Extract article
	article = mcp.call("readability.extract_article", {
	"html": mcp.call("playwright.get_html", {})
	})

	# Step 4: Summarize
	summary = mcp.call("llm.summarize", {
	"text": article["text"],
	"max_length": 200
	})
	```

	### composition-dsl

	Define reusable workflows:

	```python
	class MCPWorkflow:
	def __init__(self, name: str, steps: List[WorkflowStep]):
	self.name = name
	self.steps = steps

	async def execute(self, initial_input: Dict) -> Dict:
	"""Execute workflow steps sequentially."""
	context = initial_input

	for step in self.steps:
	result = await mcp.call(step.tool, step.params(context))
	context[step.output_var] = result

	return context

	# Define workflow
	extract_and_save = MCPWorkflow(
	name="extract_and_save",
	steps=[
	WorkflowStep(
	tool="beautifulsoup.find_all",
	params=lambda ctx: {"html": ctx["html"], "selector": ctx["selector"]},
	output_var="extracted"
	),
	WorkflowStep(
	tool="cerberus.validate",
	params=lambda ctx: {"data": ctx["extracted"], "schema": ctx["schema"]},
	output_var="validated"
	),
	WorkflowStep(
	tool="postgresql.execute",
	params=lambda ctx: {"sql": "INSERT INTO items ...", "params": ctx["validated"]},
	output_var="saved"
	)
	]
	)

	# Execute
	result = await extract_and_save.execute({
	"html": obs.page_html,
	"selector": "div.product",
	"schema": PRODUCT_SCHEMA
	})
	```

	---

	## testing-panel

	Test MCP tools manually before using them in agent workflows.

	### ui

	```
	┌─────────────────────────────────────────────────────────────┐
	│ MCP Testing Panel │
	├─────────────────────────────────────────────────────────────┤
	│ │
	│ Server: [beautifulsoup ▼] │
	│ Tool: [find_all ▼] │
	│ │
	│ ┌──────────────────────────────────────────────────────┐ │
	│ │ Input Parameters: │ │
	│ │ │ │
	│ │ html: │ │
	│ │ ┌───────────────────────────────────────────────┐ │ │
	│ │ │ <div class="item">Item 1</div> │ │ │
	│ │ │ <div class="item">Item 2</div> │ │ │
	│ │ └───────────────────────────────────────────────┘ │ │
	│ │ │ │
	│ │ selector: [div.item ] │ │
	│ │ │ │
	│ └──────────────────────────────────────────────────────┘ │
	│ │
	│ [Execute Tool] [Clear] │
	│ │
	│ ┌──────────────────────────────────────────────────────┐ │
	│ │ Output: │ │
	│ │ │ │
	│ │ [ │ │
	│ │ {"tag": "div", "class": "item", "text": "Item 1"}, │ │
	│ │ {"tag": "div", "class": "item", "text": "Item 2"} │ │
	│ │ ] │ │
	│ │ │ │
	│ │ Execution time: 12ms │ │
	│ │ Status: Success │ │
	│ └──────────────────────────────────────────────────────┘ │
	│ │
	│ [Save as Example] │
	└─────────────────────────────────────────────────────────────┘
	```

	---

	## configuration

	### full-mcp-configuration-example

	```json
	{
	"mcpServers": {
	"beautifulsoup": {
	"command": "python",
	"args": ["-m", "mcp_beautifulsoup"],
	"enabled": true,
	"autoDownload": true,
	"config": {
	"default_parser": "lxml"
	}
	},
	"playwright": {
	"command": "npx",
	"args": ["@playwright/mcp-server"],
	"enabled": false,
	"autoDownload": false,
	"config": {
	"browser": "chromium",
	"headless": true
	}
	},
	"postgresql": {
	"command": "python",
	"args": ["-m", "mcp_postgresql"],
	"enabled": false,
	"autoDownload": false,
	"config": {
	"host": "localhost",
	"port": 5432,
	"database": "scraper_db",
	"user": "postgres",
	"password": "${PG_PASSWORD}"
	}
	},
	"google-search": {
	"command": "python",
	"args": ["-m", "mcp_google_search"],
	"enabled": true,
	"autoDownload": true,
	"config": {
	"api_key": "${GOOGLE_API_KEY}",
	"search_engine_id": "${GOOGLE_SE_ID}"
	}
	},
	"filesystem": {
	"command": "npx",
	"args": ["-y", "@modelcontextprotocol/server-filesystem", "./scraped_data"],
	"enabled": true,
	"autoDownload": true
	}
	},

	"mcpSettings": {
	"autoDiscoverTools": true,
	"toolTimeout": 30,
	"maxConcurrentCalls": 5,
	"retryFailedCalls": true,
	"cacheToolResults": true,
	"cacheTTL": 3600
	}
	}
	```

	---

	Next: See [settings.md](./settings.md) for complete dashboard settings.


	## related-api-reference

	\| item \| value \|
	\| --- \| --- \|
	\| api-reference \| `api-reference.md` \|

	## document-metadata

	\| key \| value \|
	\| --- \| --- \|
	\| document \| `mcp.md` \|
	\| status \| active \|

	## document-flow

	```mermaid
	flowchart TD
	A[document] --> B[key-sections]
	B --> C[implementation]
	B --> D[operations]
	B --> E[validation]
	```