| | --- |
| | title: Web Search MCP |
| | emoji: 🔎 |
| | colorFrom: red |
| | colorTo: green |
| | sdk: gradio |
| | sdk_version: 5.36.2 |
| | app_file: app.py |
| | pinned: false |
| | short_description: Search and extract web content for LLM ingestion |
| | thumbnail: >- |
| | https://cdn-uploads.huggingface.co/production/uploads/5f17f0a0925b9863e28ad517/tfYtTMw9FgiWdyyIYz6A6.png |
| | --- |
| | |
| | # Web Search MCP Server |
| |
|
| | A Model Context Protocol (MCP) server that provides web search capabilities to LLMs, allowing them to fetch and extract content from web pages and news articles. |
| |
|
| | ## Features |
| |
|
| | - **Dual search modes**: |
| | - **General Search**: Get diverse results from blogs, documentation, articles, and more |
| | - **News Search**: Find fresh news articles and breaking stories from news sources |
| | - **Real-time web search**: Search for any topic with up-to-date results |
| | - **Content extraction**: Automatically extracts main article content, removing ads and boilerplate |
| | - **Rate limiting**: Built-in rate limiting (200 requests/hour) to prevent API abuse |
| | - **Structured output**: Returns formatted content with metadata (title, source, date, URL) |
| | - **Flexible results**: Control the number of results (1-20) |
| |
|
| | ## Prerequisites |
| |
|
| | 1. **Serper API Key**: Sign up at [serper.dev](https://serper.dev) to get your API key |
| | 2. **Python 3.8+**: Ensure you have Python installed |
| | 3. **MCP-compatible LLM client**: Such as Claude Desktop, Cursor, or any MCP-enabled application |
| |
|
| | ## Installation |
| |
|
| | 1. Clone or download this repository |
| | 2. Install dependencies: |
| | ```bash |
| | pip install -r requirements.txt |
| | ``` |
| | Or install manually: |
| | ```bash |
| | pip install "gradio[mcp]" httpx trafilatura python-dateutil limits |
| | ``` |
| |
|
| | 3. Set your Serper API key: |
| | ```bash |
| | export SERPER_API_KEY="your-api-key-here" |
| | ``` |
| |
|
| | ## Usage |
| |
|
| | ### Starting the MCP Server |
| |
|
| | ```bash |
| | python app_mcp.py |
| | ``` |
| |
|
| | The server will start on `http://localhost:7860` with the MCP endpoint at: |
| | ``` |
| | http://localhost:7860/gradio_api/mcp/sse |
| | ``` |
| |
|
| | ### Connecting to LLM Clients |
| |
|
| | #### Claude Desktop |
| | Add to your `claude_desktop_config.json`: |
| | ```json |
| | { |
| | "mcpServers": { |
| | "web-search": { |
| | "command": "python", |
| | "args": ["/path/to/app_mcp.py"], |
| | "env": { |
| | "SERPER_API_KEY": "your-api-key-here" |
| | } |
| | } |
| | } |
| | } |
| | ``` |
| |
|
| | #### Direct URL Connection |
| | For clients that support URL-based MCP servers: |
| | 1. Start the server: `python app_mcp.py` |
| | 2. Connect to: `http://localhost:7860/gradio_api/mcp/sse` |
| |
|
| | ## Tool Documentation |
| |
|
| | ### `search_web` Function |
| | |
| | **Purpose**: Search the web for information or fresh news and extract content. |
| | |
| | **Parameters**: |
| | - `query` (str, **REQUIRED**): The search query |
| | - Examples: "OpenAI news", "climate change 2024", "python tutorial" |
| | |
| | - `num_results` (int, **OPTIONAL**): Number of results to fetch |
| | - Default: 4 |
| | - Range: 1-20 |
| | - More results provide more context but take longer |
| |
|
| | - `search_type` (str, **OPTIONAL**): Type of search to perform |
| | - Default: "search" (general web search) |
| | - Options: "search" or "news" |
| | - Use "news" for fresh, time-sensitive news articles |
| | - Use "search" for general information, documentation, tutorials |
| |
|
| | **Returns**: Formatted text containing: |
| | - Summary of extraction results |
| | - For each article: |
| | - Title |
| | - Source and date |
| | - URL |
| | - Extracted main content |
| |
|
| | **When to use each search type**: |
| | - **Use "news" mode for**: |
| | - Breaking news or very recent events |
| | - Time-sensitive information ("today", "this week") |
| | - Current affairs and latest developments |
| | - Press releases and announcements |
| |
|
| | - **Use "search" mode for**: |
| | - General information and research |
| | - Technical documentation or tutorials |
| | - Historical information |
| | - Diverse perspectives from various sources |
| | - How-to guides and explanations |
| |
|
| | **Example Usage in LLM**: |
| | ``` |
| | # News mode examples |
| | "Search for breaking news about OpenAI" -> uses news mode |
| | "Find today's stock market updates" -> uses news mode |
| | "Get latest climate change developments" -> uses news mode |
| | |
| | # Search mode examples (default) |
| | "Search for Python programming tutorials" -> uses search mode |
| | "Find information about machine learning algorithms" -> uses search mode |
| | "Research historical data about climate change" -> uses search mode |
| | ``` |
| |
|
| | ## Error Handling |
| |
|
| | The tool handles various error scenarios: |
| | - Missing API key: Clear error message with setup instructions |
| | - Rate limiting: Informs when limit is exceeded |
| | - Failed extractions: Reports which articles couldn't be extracted |
| | - Network errors: Graceful error messages |
| |
|
| | ## Testing |
| |
|
| | You can test the server manually: |
| | 1. Open `http://localhost:7860` in your browser |
| | 2. Enter a search query |
| | 3. Adjust the number of results |
| | 4. Click "Search" to see the extracted content |
| |
|
| | ## Tips for LLM Usage |
| |
|
| | 1. **Choose the right search type**: Use "news" for fresh, breaking news; use "search" for general information |
| | 2. **Be specific with queries**: More specific queries yield better results |
| | 3. **Adjust result count**: Use fewer results for quick searches, more for comprehensive research |
| | 4. **Check dates**: The tool shows article dates for temporal context |
| | 5. **Follow up**: Use the extracted content to ask follow-up questions |
| |
|
| | ## Limitations |
| |
|
| | - Rate limited to 200 requests per hour |
| | - Extraction quality depends on website structure |
| | - Some websites may block automated access |
| | - News mode focuses on recent articles from news sources |
| | - Search mode provides diverse results but may include older content |
| |
|
| | ## Troubleshooting |
| |
|
| | 1. **"SERPER_API_KEY is not set"**: Ensure the environment variable is exported |
| | 2. **Rate limit errors**: Wait before making more requests |
| | 3. **No content extracted**: Some websites block scrapers; try different queries |
| | 4. **Connection errors**: Check your internet connection and firewall settings |