web

Sleeping

App Files Files Community

victor HF Staff commited on Sep 17, 2025

Commit

58d88bf

1 Parent(s): d82748f

Refactor README and analytics for improved clarity and functionality; update app.py to enhance search and fetch tools with better error handling and analytics tracking.

Browse files

Files changed (3) hide show

README.md +78 -150
analytics.py +66 -65
app.py +262 -248

README.md CHANGED Viewed

@@ -7,176 +7,104 @@ sdk: gradio
 sdk_version: 5.36.2
 app_file: app.py
 pinned: false
-short_description: Search and extract web content for LLM ingestion
 thumbnail: >-
   https://cdn-uploads.huggingface.co/production/uploads/5f17f0a0925b9863e28ad517/tfYtTMw9FgiWdyyIYz6A6.png
 ---
-# Web Search MCP Server
-A Model Context Protocol (MCP) server that provides web search capabilities to LLMs, allowing them to fetch and extract content from web pages and news articles.
-## Features
-- **Dual search modes**:
-  - **General Search**: Get diverse results from blogs, documentation, articles, and more
-  - **News Search**: Find fresh news articles and breaking stories from news sources
-- **Real-time web search**: Search for any topic with up-to-date results
-- **Content extraction**: Automatically extracts main article content, removing ads and boilerplate
-- **Rate limiting**: Built-in rate limiting (200 requests/hour) to prevent API abuse
-- **Structured output**: Returns formatted content with metadata (title, source, date, URL)
-- **Flexible results**: Control the number of results (1-20)
-## Prerequisites
-1. **Serper API Key**: Sign up at [serper.dev](https://serper.dev) to get your API key
-2. **Python 3.8+**: Ensure you have Python installed
-3. **MCP-compatible LLM client**: Such as Claude Desktop, Cursor, or any MCP-enabled application
-## Installation
-1. Clone or download this repository
-2. Install dependencies:
-   ```bash
-   pip install -r requirements.txt
-   ```
-   Or install manually:
    ```bash
-   pip install "gradio[mcp]" httpx trafilatura python-dateutil limits
    ```
-3. Set your Serper API key:
    ```bash
-   export SERPER_API_KEY="your-api-key-here"
    ```
-## Usage
-### Starting the MCP Server
 ```bash
-python app_mcp.py
-```
-The server will start on `http://localhost:7860` with the MCP endpoint at:
-```
-http://localhost:7860/gradio_api/mcp/sse
 ```
-### Connecting to LLM Clients
-#### Claude Desktop
-Add to your `claude_desktop_config.json`:
-```json
-{
-  "mcpServers": {
-    "web-search": {
-      "command": "python",
-      "args": ["/path/to/app_mcp.py"],
-      "env": {
-        "SERPER_API_KEY": "your-api-key-here"
       }
     }
   }
-}
-```
-#### Direct URL Connection
-For clients that support URL-based MCP servers:
-1. Start the server: `python app_mcp.py`
-2. Connect to: `http://localhost:7860/gradio_api/mcp/sse`
-## Tool Documentation
-### `search_web` Function
-**Purpose**: Search the web for information or fresh news and extract content.
-**Parameters**:
-- `query` (str, **REQUIRED**): The search query
-  - Examples: "OpenAI news", "climate change 2024", "python tutorial"
-- `num_results` (int, **OPTIONAL**): Number of results to fetch
-  - Default: 4
-  - Range: 1-20
-  - More results provide more context but take longer
-- `search_type` (str, **OPTIONAL**): Type of search to perform
-  - Default: "search" (general web search)
-  - Options: "search" or "news"
-  - Use "news" for fresh, time-sensitive news articles
-  - Use "search" for general information, documentation, tutorials
-**Returns**: Formatted text containing:
-- Summary of extraction results
-- For each article:
-  - Title
-  - Source and date
-  - URL
-  - Extracted main content
-**When to use each search type**:
-- **Use "news" mode for**:
-  - Breaking news or very recent events
-  - Time-sensitive information ("today", "this week")
-  - Current affairs and latest developments
-  - Press releases and announcements
-- **Use "search" mode for**:
-  - General information and research
-  - Technical documentation or tutorials
-  - Historical information
-  - Diverse perspectives from various sources
-  - How-to guides and explanations
-**Example Usage in LLM**:
-```
-# News mode examples
-"Search for breaking news about OpenAI" -> uses news mode
-"Find today's stock market updates" -> uses news mode
-"Get latest climate change developments" -> uses news mode
-# Search mode examples (default)
-"Search for Python programming tutorials" -> uses search mode
-"Find information about machine learning algorithms" -> uses search mode
-"Research historical data about climate change" -> uses search mode
-```
-## Error Handling
-The tool handles various error scenarios:
-- Missing API key: Clear error message with setup instructions
-- Rate limiting: Informs when limit is exceeded
-- Failed extractions: Reports which articles couldn't be extracted
-- Network errors: Graceful error messages
-## Testing
-You can test the server manually:
-1. Open `http://localhost:7860` in your browser
-2. Enter a search query
-3. Adjust the number of results
-4. Click "Search" to see the extracted content
-## Tips for LLM Usage
-1. **Choose the right search type**: Use "news" for fresh, breaking news; use "search" for general information
-2. **Be specific with queries**: More specific queries yield better results
-3. **Adjust result count**: Use fewer results for quick searches, more for comprehensive research
-4. **Check dates**: The tool shows article dates for temporal context
-5. **Follow up**: Use the extracted content to ask follow-up questions
-## Limitations
-- Rate limited to 200 requests per hour
-- Extraction quality depends on website structure
-- Some websites may block automated access
-- News mode focuses on recent articles from news sources
-- Search mode provides diverse results but may include older content
 ## Troubleshooting
-1. **"SERPER_API_KEY is not set"**: Ensure the environment variable is exported
-2. **Rate limit errors**: Wait before making more requests
-3. **No content extracted**: Some websites block scrapers; try different queries
-4. **Connection errors**: Check your internet connection and firewall settings

 sdk_version: 5.36.2
 app_file: app.py
 pinned: false
+short_description: Search & fetch the web with per-tool analytics
 thumbnail: >-
   https://cdn-uploads.huggingface.co/production/uploads/5f17f0a0925b9863e28ad517/tfYtTMw9FgiWdyyIYz6A6.png
 ---
+# Web MCP Server
+A Model Context Protocol (MCP) server that exposes two composable tools—`search` (Serper metadata) and `fetch` (single-page extraction)—alongside a live analytics dashboard that tracks daily usage for each tool. The UI runs on Gradio and can be reached directly or via MCP-compatible clients like Claude Desktop and Cursor.
+## Highlights
+- Dual MCP tools with shared rate limiting (`360 requests/hour`) and structured JSON responses.
+- Daily analytics split by tool: the **Analytics** tab renders "Daily Search" (left) and "Daily Fetch" (right) bar charts covering the last 14 days.
+- Persistent request counters keyed by UTC date and tool: `{"YYYY-MM-DD": {"search": n, "fetch": m}}`, with automatic migration from legacy totals.
+- Pluggable storage: respects `ANALYTICS_DATA_DIR`, otherwise falls back to `/data` (if writable) or `./data` for local development.
+- Ready-to-serve Gradio app with MCP endpoints exposed via `gr.api` for direct client consumption.
+## Requirements
+- Python 3.8 or newer.
+- Serper API key (`SERPER_API_KEY`) with access to the Search and News endpoints.
+- Dependencies listed in `requirements.txt`, including `filelock` and `pandas` for analytics storage.
+Install everything with:
+```bash
+pip install -r requirements.txt
+```
+## Configuration
+1. Export your Serper API key:
    ```bash
+   export SERPER_API_KEY="your-api-key"
    ```
+2. (Optional) Override the analytics storage path:
    ```bash
+   export ANALYTICS_DATA_DIR="/path/to/persistent/storage"
    ```
+   If unset, the app automatically prefers `/data` when available, otherwise `./data`.
+The request counters live in `<DATA_DIR>/request_counts.json`, guarded by a file lock to support concurrent MCP calls.
+## Running Locally
+Launch the Gradio server (with MCP support enabled) via:
 ```bash
+python app.py
 ```
+This starts a local UI at `http://localhost:7860` and exposes the MCP SSE endpoint at `http://localhost:7860/gradio_api/mcp/sse`.
+### Connecting From MCP Clients
+- **Claude Desktop** – update `claude_desktop_config.json`:
+  ```json
+  {
+    "mcpServers": {
+      "web-search": {
+        "command": "python",
+        "args": ["/absolute/path/to/app.py"],
+        "env": {
+          "SERPER_API_KEY": "your-api-key"
+        }
       }
     }
   }
+  ```
+- **URL-based MCP clients** – run `python app.py`, then point the client to `http://localhost:7860/gradio_api/mcp/sse`.
+## Tool Reference
+### `search`
+- **Purpose**: Retrieve metadata-only results from Serper (general web or news).
+- **Inputs**:
+  - `query` *(str, required)* – search terms.
+  - `search_type` *("search" | "news", default "search")* – switch to `news` for recency-focused results.
+  - `num_results` *(int, default 4, range 1–20)* – number of hits to return.
+- **Output**: JSON containing the query echo, result count, timing, and an array of entries with `position`, `title`, `link`, `domain`, and optional `source`/`date` for news.
+### `fetch`
+- **Purpose**: Download a single URL and extract the readable article text via Trafilatura.
+- **Inputs**:
+  - `url` *(str, required)* – must start with `http://` or `https://`.
+  - `timeout` *(int, default 20 seconds)* – client timeout for the HTTP request.
+- **Output**: JSON with the original and final URL, domain, HTTP status, title, ISO timestamp of the fetch, word count, cleaned `content`, and duration.
+Both tools increment their respective analytics buckets on every invocation, including validation failures and rate-limit denials, ensuring the dashboard mirrors real traffic.
+## Analytics Dashboard
+Open the **Analytics** tab in the Gradio UI to inspect daily activity:
+- **Daily Search Count** (left column) – bar chart for the past 14 days of `search` tool requests.
+- **Daily Fetch Count** (right column) – bar chart for the past 14 days of `fetch` tool requests.
+- Tooltips reveal the display label (e.g., `Sep 17`), raw count, and ISO date key.
+Data is stored in JSON and can be safely externalized for long-term tracking. Existing totals in the legacy integer-only format are automatically migrated during the first write.
+## Rate Limiting & Error Handling
+- Global moving-window limit of `360` requests per hour shared across both tools (powered by `limits`).
+- Standardized error payloads for missing parameters, invalid URLs, Serper issues, HTTP failures, and rate-limit hits, each preserving analytics increments.
 ## Troubleshooting
+- **`SERPER_API_KEY is not set`** – export the key in the environment where the server runs.
+- **`Rate limit exceeded`** – pause requests or reduce client concurrency.
+- **Empty extraction** – some sites block bots; try another URL.
+- **Storage permissions** – ensure the chosen data directory is writable; adjust `ANALYTICS_DATA_DIR` if necessary.
+## Licensing & Contributions
+Feel free to fork and adapt for your MCP workflows. Contributions are welcome—open a PR or issue with proposed analytics enhancements, additional tooling, or documentation tweaks.

analytics.py CHANGED Viewed

@@ -2,8 +2,8 @@
 import os
 import json
 from datetime import datetime, timedelta, timezone
-from filelock import FileLock           # pip install filelock
-import pandas as pd                     # already available in HF images
 # Determine data directory based on environment
 # 1. Check for environment variable override
@@ -21,84 +21,85 @@ if not DATA_DIR:
 os.makedirs(DATA_DIR, exist_ok=True)
 COUNTS_FILE = os.path.join(DATA_DIR, "request_counts.json")
-TIMES_FILE = os.path.join(DATA_DIR, "request_times.json")
-LOCK_FILE   = os.path.join(DATA_DIR, "analytics.lock")
-def _load() -> dict:
     if not os.path.exists(COUNTS_FILE):
         return {}
     with open(COUNTS_FILE) as f:
-        return json.load(f)
-def _save(data: dict):
     with open(COUNTS_FILE, "w") as f:
         json.dump(data, f)
-def _load_times() -> dict:
-    if not os.path.exists(TIMES_FILE):
-        return {}
-    with open(TIMES_FILE) as f:
-        return json.load(f)
-def _save_times(data: dict):
-    with open(TIMES_FILE, "w") as f:
-        json.dump(data, f)
-async def record_request(duration: float = None, num_results: int = None) -> None:
-    """Increment today's counter (UTC) atomically and optionally record request duration."""
     today = datetime.now(timezone.utc).strftime("%Y-%m-%d")
     with FileLock(LOCK_FILE):
-        # Update counts
-        data = _load()
-        data[today] = data.get(today, 0) + 1
-        _save(data)
-        # Only record times for default requests (num_results=4)
-        if duration is not None and (num_results is None or num_results == 4):
-            times = _load_times()
-            if today not in times:
-                times[today] = []
-            times[today].append(round(duration, 2))
-            _save_times(times)
-def last_n_days_df(n: int = 30) -> pd.DataFrame:
-    """Return a DataFrame with a row for each of the past *n* days."""
-    now = datetime.now(timezone.utc)
-    with FileLock(LOCK_FILE):
-        data = _load()
-    records = []
-    for i in range(n):
-        day = (now - timedelta(days=n - 1 - i))
-        day_str = day.strftime("%Y-%m-%d")
-        # Format date for display (MMM DD)
-        display_date = day.strftime("%b %d")
-        records.append({
-            "date": display_date,
-            "count": data.get(day_str, 0),
-            "full_date": day_str  # Keep full date for tooltip
-        })
-    return pd.DataFrame(records)
-def last_n_days_avg_time_df(n: int = 30) -> pd.DataFrame:
-    """Return a DataFrame with average request time for each of the past *n* days."""
     now = datetime.now(timezone.utc)
     with FileLock(LOCK_FILE):
-        times = _load_times()
     records = []
     for i in range(n):
-        day = (now - timedelta(days=n - 1 - i))
-        day_str = day.strftime("%Y-%m-%d")
-        # Format date for display (MMM DD)
         display_date = day.strftime("%b %d")
-        # Calculate average time for the day
-        day_times = times.get(day_str, [])
-        avg_time = round(sum(day_times) / len(day_times), 2) if day_times else 0
-        records.append({
-            "date": display_date,
-            "avg_time": avg_time,
-            "request_count": len(day_times),
-            "full_date": day_str  # Keep full date for tooltip
-        })
-    return pd.DataFrame(records)

 import os
 import json
 from datetime import datetime, timedelta, timezone
+from filelock import FileLock  # pip install filelock
+import pandas as pd  # already available in HF images
 # Determine data directory based on environment
 # 1. Check for environment variable override
 os.makedirs(DATA_DIR, exist_ok=True)
 COUNTS_FILE = os.path.join(DATA_DIR, "request_counts.json")
+LOCK_FILE = os.path.join(DATA_DIR, "analytics.lock")
+# ──────────────────────────────────────────────────────────────────────────────
+# Storage helpers
+# ──────────────────────────────────────────────────────────────────────────────
+def _load_counts() -> dict:
     if not os.path.exists(COUNTS_FILE):
         return {}
     with open(COUNTS_FILE) as f:
+        try:
+            return json.load(f)
+        except json.JSONDecodeError:
+            return {}
+def _save_counts(data: dict):
     with open(COUNTS_FILE, "w") as f:
         json.dump(data, f)
+def _normalize_counts_schema(data: dict) -> dict:
+    """
+    Ensure data is {date: {"search": int, "fetch": int}}.
+    Backward compatible with old schema {date: int}.
+    """
+    normalized = {}
+    for day, value in data.items():
+        if isinstance(value, dict):
+            normalized[day] = {
+                "search": int(value.get("search", 0)),
+                "fetch": int(value.get("fetch", 0)),
+            }
+        else:
+            # Old schema: total count as int → attribute to "search", keep fetch=0
+            normalized[day] = {"search": int(value or 0), "fetch": 0}
+    return normalized
+# ──────────────────────────────────────────────────────────────────────────────
+# Public API
+# ──────────────────────────────────────────────────────────────────────────────
+async def record_request(tool: str) -> None:
+    """Increment today's counter (UTC) for the given tool: 'search' or 'fetch'."""
+    tool = (tool or "").strip().lower()
+    if tool not in {"search", "fetch"}:
+        # Ignore unknown tool buckets to keep charts clean
+        tool = "search"
     today = datetime.now(timezone.utc).strftime("%Y-%m-%d")
     with FileLock(LOCK_FILE):
+        data = _normalize_counts_schema(_load_counts())
+        if today not in data:
+            data[today] = {"search": 0, "fetch": 0}
+        data[today][tool] = int(data[today].get(tool, 0)) + 1
+        _save_counts(data)
+def last_n_days_count_df(tool: str, n: int = 30) -> pd.DataFrame:
+    """Return DataFrame with a row for each of the past n days for the given tool."""
+    tool = (tool or "").strip().lower()
+    if tool not in {"search", "fetch"}:
+        tool = "search"
     now = datetime.now(timezone.utc)
     with FileLock(LOCK_FILE):
+        data = _normalize_counts_schema(_load_counts())
     records = []
     for i in range(n):
+        day = now - timedelta(days=n - 1 - i)
+        day_key = day.strftime("%Y-%m-%d")
         display_date = day.strftime("%b %d")
+        counts = data.get(day_key, {"search": 0, "fetch": 0})
+        records.append(
+            {
+                "date": display_date,
+                "count": int(counts.get(tool, 0)),
+                "full_date": day_key,
+            }
+        )
+    return pd.DataFrame(records)

app.py CHANGED Viewed

@@ -1,8 +1,11 @@
 import os
-import asyncio
 import time
-from typing import Optional
-from datetime import datetime
 import httpx
 import trafilatura
 import gradio as gr
@@ -10,92 +13,91 @@ from dateutil import parser as dateparser
 from limits import parse
 from limits.aio.storage import MemoryStorage
 from limits.aio.strategies import MovingWindowRateLimiter
-from analytics import record_request, last_n_days_df, last_n_days_avg_time_df
 # Configuration
 SERPER_API_KEY = os.getenv("SERPER_API_KEY")
 SERPER_SEARCH_ENDPOINT = "https://google.serper.dev/search"
 SERPER_NEWS_ENDPOINT = "https://google.serper.dev/news"
-HEADERS = {"X-API-KEY": SERPER_API_KEY, "Content-Type": "application/json"}
-# Rate limiting
 storage = MemoryStorage()
 limiter = MovingWindowRateLimiter(storage)
-rate_limit = parse("360/hour")
-async def search_web(
     query: str, search_type: str = "search", num_results: Optional[int] = 4
-) -> str:
     """
-    Search the web for information or fresh news, returning extracted content.
-    This tool can perform two types of searches:
-    - "search" (default): General web search for diverse, relevant content from various sources
-    - "news": Specifically searches for fresh news articles and breaking stories
-    Use "news" mode when looking for:
-    - Breaking news or very recent events
-    - Time-sensitive information
-    - Current affairs and latest developments
-    - Today's/this week's happenings
-    Use "search" mode (default) for:
-    - General information and research
-    - Technical documentation or guides
-    - Historical information
-    - Diverse perspectives from various sources
-    Args:
-        query (str): The search query. This is REQUIRED. Examples: "apple inc earnings",
-                    "climate change 2024", "AI developments"
-        search_type (str): Type of search. This is OPTIONAL. Default is "search".
-                          Options: "search" (general web search) or "news" (fresh news articles).
-                          Use "news" for time-sensitive, breaking news content.
-        num_results (int): Number of results to fetch. This is OPTIONAL. Default is 4.
-                          Range: 1-20. More results = more context but longer response time.
-    Returns:
-        str: Formatted text containing extracted content with metadata (title,
-             source, date, URL, and main text) for each result, separated by dividers.
-             Returns error message if API key is missing or search fails.
-    Examples:
-        - search_web("OpenAI GPT-5", "news") - Get 5 fresh news articles about OpenAI
-        - search_web("python tutorial", "search") - Get 4 general results about Python (default count)
-        - search_web("stock market today", "news", 10) - Get 10 news articles about today's market
-        - search_web("machine learning basics") - Get 4 general search results (all defaults)
     """
     start_time = time.time()
-    if not SERPER_API_KEY:
-        await record_request(None, num_results)  # Record even failed requests
-        return "Error: SERPER_API_KEY environment variable is not set. Please set it to use this tool."
-    # Validate and constrain num_results
     if num_results is None:
         num_results = 4
-    num_results = max(1, min(30, num_results))
-    # Validate search_type
     if search_type not in ["search", "news"]:
         search_type = "search"
     try:
-        # Check rate limit
         if not await limiter.hit(rate_limit, "global"):
-            print(f"[{datetime.now().isoformat()}] Rate limit exceeded")
-            duration = time.time() - start_time
-            await record_request(duration, num_results)
-            return "Error: Rate limit exceeded. Please try again later (limit: 500 requests per hour)."
-        # Select endpoint based on search type
         endpoint = (
             SERPER_NEWS_ENDPOINT if search_type == "news" else SERPER_SEARCH_ENDPOINT
         )
-        # Prepare payload
-        payload = {"q": query, "num": num_results}
         if search_type == "news":
             payload["type"] = "news"
             payload["page"] = 1
@@ -104,107 +106,119 @@ async def search_web(
             resp = await client.post(endpoint, headers=HEADERS, json=payload)
         if resp.status_code != 200:
-            duration = time.time() - start_time
-            await record_request(duration, num_results)
-            return f"Error: Search API returned status {resp.status_code}. Please check your API key and try again."
-        # Extract results based on search type
-        if search_type == "news":
-            results = resp.json().get("news", [])
-        else:
-            results = resp.json().get("organic", [])
-        if not results:
-            duration = time.time() - start_time
-            await record_request(duration, num_results)
-            return f"No {search_type} results found for query: '{query}'. Try a different search term or search type."
-        # Fetch HTML content concurrently
-        urls = [r["link"] for r in results]
-        async with httpx.AsyncClient(timeout=20, follow_redirects=True) as client:
-            tasks = [client.get(u) for u in urls]
-            responses = await asyncio.gather(*tasks, return_exceptions=True)
-        # Extract and format content
-        chunks = []
-        successful_extractions = 0
-        for meta, response in zip(results, responses):
-            if isinstance(response, Exception):
-                continue
-            # Extract main text content
-            body = trafilatura.extract(
-                response.text, include_formatting=False, include_comments=False
-            )
-            if not body:
-                continue
-            successful_extractions += 1
-            print(
-                f"[{datetime.now().isoformat()}] Successfully extracted content from {meta['link']}"
-            )
-            # Format the chunk based on search type
-            if search_type == "news":
-                # News results have date and source
-                try:
-                    date_str = meta.get("date", "")
-                    if date_str:
-                        date_iso = dateparser.parse(date_str, fuzzy=True).strftime(
-                            "%Y-%m-%d"
-                        )
-                    else:
-                        date_iso = "Unknown"
-                except Exception:
-                    date_iso = "Unknown"
-                chunk = (
-                    f"## {meta['title']}\n"
-                    f"**Source:** {meta.get('source', 'Unknown')}   "
-                    f"**Date:** {date_iso}\n"
-                    f"**URL:** {meta['link']}\n\n"
-                    f"{body.strip()}\n"
-                )
-            else:
-                # Search results don't have date/source but have domain
-                domain = meta["link"].split("/")[2].replace("www.", "")
-                chunk = (
-                    f"## {meta['title']}\n"
-                    f"**Domain:** {domain}\n"
-                    f"**URL:** {meta['link']}\n\n"
-                    f"{body.strip()}\n"
-                )
-            chunks.append(chunk)
-        if not chunks:
-            duration = time.time() - start_time
-            await record_request(duration, num_results)
-            return f"Found {len(results)} {search_type} results for '{query}', but couldn't extract readable content from any of them. The websites might be blocking automated access."
-        result = "\n---\n".join(chunks)
-        summary = f"Successfully extracted content from {successful_extractions} out of {len(results)} {search_type} results for query: '{query}'\n\n---\n\n"
-        print(
-            f"[{datetime.now().isoformat()}] Extraction complete: {successful_extractions}/{len(results)} successful for query '{query}'"
-        )
-        # Record successful request with duration
-        duration = time.time() - start_time
-        await record_request(duration, num_results)
-        return summary + result
     except Exception as e:
-        # Record failed request with duration
-        duration = time.time() - start_time
-        return f"Error occurred while searching: {str(e)}. Please try again or check your query."
-# Create Gradio interface
 with gr.Blocks(title="Web Search MCP Server") as demo:
     gr.HTML(
         """
@@ -217,141 +231,141 @@ with gr.Blocks(title="Web Search MCP Server") as demo:
     )
     gr.Markdown("# 🔍 Web Search MCP Server")
     with gr.Tabs():
         with gr.Tab("App"):
-            gr.Markdown(
-                """
-                This MCP server provides web search capabilities to LLMs. It can perform general web searches
-                or specifically search for fresh news articles, extracting the main content from results.
-                **⚡ Speed-Focused:** Optimized to complete the entire search process - from query to
-                fully extracted web content - in under 2 seconds. Check out the Analytics tab
-                to see real-time performance metrics.
-                **Search Types:**
-                - **General Search**: Diverse results from various sources (blogs, docs, articles, etc.)
-                - **News Search**: Fresh news articles and breaking stories from news sources
-                **Note:** This interface is primarily designed for MCP tool usage by LLMs, but you can
-                also test it manually below.
-                """
-            )
-            gr.HTML(
-                """
-                <div style="margin-bottom: 24px;">
-                    <a href="https://huggingface.co/spaces/victor/websearch?view=api">
-                        <img src="https://huggingface.co/datasets/huggingface/badges/resolve/main/use-with-mcp-lg-dark.svg"
-                             alt="Use with MCP"
-                             style="height: 36px;">
-                    </a>
-                </div>
-                """,
-                padding=0,
-            )
             with gr.Row():
                 with gr.Column(scale=3):
                     query_input = gr.Textbox(
                         label="Search Query",
-                        placeholder='e.g. "OpenAI news", "climate change 2024", "AI developments"',
-                        info="Required: Enter your search query",
                     )
-                with gr.Column(scale=1):
                     search_type_input = gr.Radio(
                         choices=["search", "news"],
                         value="search",
                         label="Search Type",
-                        info="Choose search type",
                     )
-            with gr.Row():
-                num_results_input = gr.Slider(
-                    minimum=1,
-                    maximum=20,
-                    value=4,
-                    step=1,
-                    label="Number of Results",
-                    info="Optional: How many results to fetch (default: 4)",
-                )
-            search_button = gr.Button("Search", variant="primary")
-            output = gr.Textbox(
-                label="Extracted Content",
-                lines=25,
-                max_lines=50,
-                info="The extracted article content will appear here",
-            )
-            # Add examples
-            gr.Examples(
-                examples=[
-                    ["OpenAI GPT-5 latest developments", "news", 5],
-                    ["React hooks useState", "search", 4],
-                    ["Tesla stock price today", "news", 6],
-                    ["Apple Vision Pro reviews", "search", 4],
-                    ["best Italian restaurants NYC", "search", 4],
-                ],
                 inputs=[query_input, search_type_input, num_results_input],
-                outputs=output,
-                fn=search_web,
-                cache_examples=False,
             )
         with gr.Tab("Analytics"):
             gr.Markdown("## Community Usage Analytics")
-            gr.Markdown(
-                "Track daily request counts and average response times from all community users."
-            )
             with gr.Row():
                 with gr.Column():
-                    requests_plot = gr.BarPlot(
-                        value=last_n_days_df(
-                            14
-                        ),  # Show only last 14 days for better visibility
                         x="date",
                         y="count",
-                        title="Daily Request Count",
-                        tooltip=["date", "count"],
                         height=350,
-                        x_label_angle=-45,  # Rotate labels to prevent overlap
                         container=False,
                     )
                 with gr.Column():
-                    avg_time_plot = gr.BarPlot(
-                        value=last_n_days_avg_time_df(14),  # Show only last 14 days
                         x="date",
-                        y="avg_time",
-                        title="Average Request Time (seconds)",
-                        tooltip=["date", "avg_time", "request_count"],
                         height=350,
                         x_label_angle=-45,
                         container=False,
                     )
-    search_button.click(
-        fn=search_web,  # Use search_web directly instead of search_and_log
-        inputs=[query_input, search_type_input, num_results_input],
-        outputs=output,
-        api_name=False,  # Hide this endpoint from API & MCP
-    )
-    # Load fresh analytics data when the page loads or Analytics tab is clicked
     demo.load(
-        fn=lambda: (last_n_days_df(14), last_n_days_avg_time_df(14)),
-        outputs=[requests_plot, avg_time_plot],
         api_name=False,
     )
-    # Expose search_web as the only MCP tool
-    gr.api(search_web, api_name="search_web")
 if __name__ == "__main__":
     # Launch with MCP server enabled
-    # The MCP endpoint will be available at: http://localhost:7860/gradio_api/mcp/sse
     demo.launch(mcp_server=True, show_api=True)

 import os
 import time
+import re
+import html
+from typing import Optional, Dict, Any, List
+from urllib.parse import urlsplit
+from datetime import datetime, timezone
 import httpx
 import trafilatura
 import gradio as gr
 from limits import parse
 from limits.aio.storage import MemoryStorage
 from limits.aio.strategies import MovingWindowRateLimiter
+from analytics import record_request, last_n_days_count_df
+# ──────────────────────────────────────────────────────────────────────────────
 # Configuration
+# ──────────────────────────────────────────────────────────────────────────────
 SERPER_API_KEY = os.getenv("SERPER_API_KEY")
 SERPER_SEARCH_ENDPOINT = "https://google.serper.dev/search"
 SERPER_NEWS_ENDPOINT = "https://google.serper.dev/news"
+HEADERS = {"X-API-KEY": SERPER_API_KEY or "", "Content-Type": "application/json"}
+# Rate limiting (shared by both tools)
 storage = MemoryStorage()
 limiter = MovingWindowRateLimiter(storage)
+rate_limit = parse("360/hour")  # shared global limit across search + fetch
+# ──────────────────────────────────────────────────────────────────────────────
+# Helpers
+# ──────────────────────────────────────────────────────────────────────────────
+def _domain_from_url(url: str) -> str:
+    try:
+        netloc = urlsplit(url).netloc
+        return netloc.replace("www.", "")
+    except Exception:
+        return ""
+def _iso_date_or_unknown(date_str: Optional[str]) -> Optional[str]:
+    if not date_str:
+        return None
+    try:
+        return dateparser.parse(date_str, fuzzy=True).strftime("%Y-%m-%d")
+    except Exception:
+        return None
+def _extract_title_from_html(html_text: str) -> Optional[str]:
+    m = re.search(r"<title[^>]*>(.*?)</title>", html_text, re.IGNORECASE | re.DOTALL)
+    if not m:
+        return None
+    title = re.sub(r"\s+", " ", m.group(1)).strip()
+    return html.unescape(title) if title else None
+# ──────────────────────────────────────────────────────────────────────────────
+# Tool: search (metadata only)
+# ──────────────────────────────────────────────────────────────────────────────
+async def search(
     query: str, search_type: str = "search", num_results: Optional[int] = 4
+) -> Dict[str, Any]:
     """
+    Perform a web or news search via Serper and return metadata ONLY.
+    Does NOT fetch or extract content from result URLs.
     """
     start_time = time.time()
+    # Validate inputs
+    if not query or not query.strip():
+        await record_request("search")
+        return {"error": "Missing 'query'. Please provide a search query string."}
     if num_results is None:
         num_results = 4
+    num_results = max(1, min(20, int(num_results)))
     if search_type not in ["search", "news"]:
         search_type = "search"
+    # Check API key
+    if not SERPER_API_KEY:
+        await record_request("search")
+        return {
+            "error": "SERPER_API_KEY is not set. Export SERPER_API_KEY and try again."
+        }
     try:
+        # Rate limit
         if not await limiter.hit(rate_limit, "global"):
+            await record_request("search")
+            return {"error": "Rate limit exceeded. Limit: 360 requests/hour."}
         endpoint = (
             SERPER_NEWS_ENDPOINT if search_type == "news" else SERPER_SEARCH_ENDPOINT
         )
+        payload: Dict[str, Any] = {"q": query, "num": num_results}
         if search_type == "news":
             payload["type"] = "news"
             payload["page"] = 1
             resp = await client.post(endpoint, headers=HEADERS, json=payload)
         if resp.status_code != 200:
+            await record_request("search")
+            return {
+                "error": f"Search API returned status {resp.status_code}. Check your API key and query."
+            }
+        data = resp.json()
+        raw_results: List[Dict[str, Any]] = (
+            data.get("news", []) if search_type == "news" else data.get("organic", [])
+        )
+        if not raw_results:
+            await record_request("search")
+            return {
+                "query": query,
+                "search_type": search_type,
+                "count": 0,
+                "results": [],
+                "message": f"No {search_type} results found.",
+            }
+        formatted: List[Dict[str, Any]] = []
+        for idx, r in enumerate(raw_results[:num_results], start=1):
+            item = {
+                "position": idx,
+                "title": r.get("title"),
+                "link": r.get("link"),
+                "domain": _domain_from_url(r.get("link", "")),
+                "snippet": r.get("snippet") or r.get("description"),
+            }
+            if search_type == "news":
+                item["source"] = r.get("source")
+                item["date"] = _iso_date_or_unknown(r.get("date"))
+            formatted.append(item)
+        await record_request("search")
+        return {
+            "query": query,
+            "search_type": search_type,
+            "count": len(formatted),
+            "results": formatted,
+            "duration_s": round(time.time() - start_time, 2),
+        }
+    except Exception as e:
+        await record_request("search")
+        return {"error": f"Search failed: {str(e)}"}
+# ──────────────────────────────────────────────────────────────────────────────
+# Tool: fetch (single URL fetch + extraction)
+# ──────────────────────────────────────────────────────────────────────────────
+async def fetch(url: str, timeout: int = 20) -> Dict[str, Any]:
+    """
+    Fetch a single URL and extract the main readable content.
+    """
+    start_time = time.time()
+    if not url or not isinstance(url, str):
+        await record_request("fetch")
+        return {"error": "Missing 'url'. Please provide a valid URL string."}
+    if not url.lower().startswith(("http://", "https://")):
+        await record_request("fetch")
+        return {"error": "URL must start with http:// or https://."}
+    try:
+        # Rate limit
+        if not await limiter.hit(rate_limit, "global"):
+            await record_request("fetch")
+            return {"error": "Rate limit exceeded. Limit: 360 requests/hour."}
+        async with httpx.AsyncClient(timeout=timeout, follow_redirects=True) as client:
+            resp = await client.get(url)
+        text = resp.text or ""
+        content = (
+            trafilatura.extract(
+                text,
+                include_formatting=False,
+                include_comments=False,
+            )
+            or ""
+        )
+        title = _extract_title_from_html(text) or ""
+        final_url_str = str(resp.url) if hasattr(resp, "url") else url
+        domain = _domain_from_url(final_url_str)
+        word_count = len(content.split()) if content else 0
+        result = {
+            "url": url,
+            "final_url": final_url_str,
+            "domain": domain,
+            "status_code": resp.status_code,
+            "title": title,
+            "fetched_at": datetime.now(timezone.utc).isoformat(),
+            "word_count": word_count,
+            "content": content.strip(),
+            "duration_s": round(time.time() - start_time, 2),
+        }
+        await record_request("fetch")
+        return result
+    except httpx.HTTPError as e:
+        await record_request("fetch")
+        return {"error": f"Network error while fetching: {str(e)}"}
     except Exception as e:
+        await record_request("fetch")
+        return {"error": f"Unexpected error while fetching: {str(e)}"}
+# ──────────────────────────────────────────────────────────────────────────────
+# Gradio UI
+# ──────────────────────────────────────────────────────────────────────────────
 with gr.Blocks(title="Web Search MCP Server") as demo:
     gr.HTML(
         """
     )
     gr.Markdown("# 🔍 Web Search MCP Server")
+    gr.Markdown(
+        "This server provides two composable MCP tools: **search** (metadata only) and **fetch** (single-URL extraction)."
+    )
     with gr.Tabs():
         with gr.Tab("App"):
             with gr.Row():
+                # ── Search panel ───────────────────────────────────────────────
                 with gr.Column(scale=3):
+                    gr.Markdown("## Search (metadata only)")
                     query_input = gr.Textbox(
                         label="Search Query",
+                        placeholder='e.g. "OpenAI news", "climate change 2024", "React hooks useState"',
+                        info="Required",
                     )
                     search_type_input = gr.Radio(
                         choices=["search", "news"],
                         value="search",
                         label="Search Type",
+                        info="Choose general web search or news",
+                    )
+                    num_results_input = gr.Slider(
+                        minimum=1,
+                        maximum=20,
+                        value=4,
+                        step=1,
+                        label="Number of Results",
+                        info="Optional (default 4)",
+                    )
+                    search_button = gr.Button("Run Search", variant="primary")
+                    search_output = gr.JSON(
+                        label="Search Results (metadata only)",
                     )
+                    gr.Examples(
+                        examples=[
+                            ["OpenAI GPT-5 latest developments", "news", 5],
+                            ["React hooks useState", "search", 4],
+                            ["Apple Vision Pro reviews", "search", 4],
+                            ["Tesla stock price today", "news", 6],
+                        ],
+                        inputs=[query_input, search_type_input, num_results_input],
+                        outputs=search_output,
+                        fn=search,
+                        cache_examples=False,
+                    )
+                # ── Fetch panel ────────────────────────────────────────────────
+                with gr.Column(scale=2):
+                    gr.Markdown("## Fetch (single URL → extracted content)")
+                    url_input = gr.Textbox(
+                        label="URL",
+                        placeholder="https://example.com/article",
+                        info="Required: the URL to fetch and extract",
+                    )
+                    timeout_input = gr.Slider(
+                        minimum=5,
+                        maximum=60,
+                        value=20,
+                        step=1,
+                        label="Timeout (seconds)",
+                        info="Optional (default 20)",
+                    )
+                    fetch_button = gr.Button("Fetch & Extract", variant="primary")
+                    fetch_output = gr.JSON(label="Fetched Content (structured)")
+                    gr.Examples(
+                        examples=[
+                            ["https://news.ycombinator.com/"],
+                            ["https://www.python.org/dev/peps/pep-0008/"],
+                            ["https://en.wikipedia.org/wiki/Model_Context_Protocol"],
+                        ],
+                        inputs=[url_input],
+                        outputs=fetch_output,
+                        fn=fetch,
+                        cache_examples=False,
+                    )
+            # Wire up buttons
+            search_button.click(
+                fn=search,
                 inputs=[query_input, search_type_input, num_results_input],
+                outputs=search_output,
+                api_name=False,
+            )
+            fetch_button.click(
+                fn=fetch,
+                inputs=[url_input, timeout_input],
+                outputs=fetch_output,
+                api_name=False,
             )
         with gr.Tab("Analytics"):
             gr.Markdown("## Community Usage Analytics")
+            gr.Markdown("Daily request counts (UTC), split by tool.")
             with gr.Row():
                 with gr.Column():
+                    search_plot = gr.BarPlot(
+                        value=last_n_days_count_df("search", 14),
                         x="date",
                         y="count",
+                        title="Daily Search Count",
+                        tooltip=["date", "count", "full_date"],
                         height=350,
+                        x_label_angle=-45,
                         container=False,
                     )
                 with gr.Column():
+                    fetch_plot = gr.BarPlot(
+                        value=last_n_days_count_df("fetch", 14),
                         x="date",
+                        y="count",
+                        title="Daily Fetch Count",
+                        tooltip=["date", "count", "full_date"],
                         height=350,
                         x_label_angle=-45,
                         container=False,
                     )
+    # Refresh analytics on load
     demo.load(
+        fn=lambda: (
+            last_n_days_count_df("search", 14),
+            last_n_days_count_df("fetch", 14),
+        ),
+        outputs=[search_plot, fetch_plot],
         api_name=False,
     )
+    # Expose MCP tools
+    gr.api(search, api_name="search")
+    gr.api(fetch, api_name="fetch")
 if __name__ == "__main__":
     # Launch with MCP server enabled
     demo.launch(mcp_server=True, show_api=True)