Refactor README and analytics for improved clarity and functionality; update app.py to enhance search and fetch tools with better error handling and analytics tracking.
Browse files- README.md +78 -150
- analytics.py +66 -65
- app.py +262 -248
README.md
CHANGED
|
@@ -7,176 +7,104 @@ sdk: gradio
|
|
| 7 |
sdk_version: 5.36.2
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
-
short_description: Search
|
| 11 |
thumbnail: >-
|
| 12 |
https://cdn-uploads.huggingface.co/production/uploads/5f17f0a0925b9863e28ad517/tfYtTMw9FgiWdyyIYz6A6.png
|
| 13 |
---
|
| 14 |
|
| 15 |
-
# Web
|
| 16 |
|
| 17 |
-
A Model Context Protocol (MCP) server that
|
| 18 |
|
| 19 |
-
##
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
|
| 21 |
-
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
-
|
| 25 |
-
- **Content extraction**: Automatically extracts main article content, removing ads and boilerplate
|
| 26 |
-
- **Rate limiting**: Built-in rate limiting (200 requests/hour) to prevent API abuse
|
| 27 |
-
- **Structured output**: Returns formatted content with metadata (title, source, date, URL)
|
| 28 |
-
- **Flexible results**: Control the number of results (1-20)
|
| 29 |
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
3. **MCP-compatible LLM client**: Such as Claude Desktop, Cursor, or any MCP-enabled application
|
| 35 |
-
|
| 36 |
-
## Installation
|
| 37 |
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
```bash
|
| 41 |
-
pip install -r requirements.txt
|
| 42 |
-
```
|
| 43 |
-
Or install manually:
|
| 44 |
```bash
|
| 45 |
-
|
| 46 |
```
|
| 47 |
-
|
| 48 |
-
3. Set your Serper API key:
|
| 49 |
```bash
|
| 50 |
-
export
|
| 51 |
```
|
|
|
|
| 52 |
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
### Starting the MCP Server
|
| 56 |
|
|
|
|
|
|
|
| 57 |
```bash
|
| 58 |
-
python
|
| 59 |
-
```
|
| 60 |
-
|
| 61 |
-
The server will start on `http://localhost:7860` with the MCP endpoint at:
|
| 62 |
-
```
|
| 63 |
-
http://localhost:7860/gradio_api/mcp/sse
|
| 64 |
```
|
| 65 |
-
|
| 66 |
-
|
| 67 |
-
|
| 68 |
-
|
| 69 |
-
|
| 70 |
-
|
| 71 |
-
{
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
}
|
| 79 |
}
|
| 80 |
}
|
| 81 |
-
|
| 82 |
-
```
|
| 83 |
-
|
| 84 |
-
##
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
**
|
| 96 |
-
- `
|
| 97 |
-
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
-
|
| 112 |
-
-
|
| 113 |
-
- Title
|
| 114 |
-
- Source and date
|
| 115 |
-
- URL
|
| 116 |
-
- Extracted main content
|
| 117 |
-
|
| 118 |
-
**When to use each search type**:
|
| 119 |
-
- **Use "news" mode for**:
|
| 120 |
-
- Breaking news or very recent events
|
| 121 |
-
- Time-sensitive information ("today", "this week")
|
| 122 |
-
- Current affairs and latest developments
|
| 123 |
-
- Press releases and announcements
|
| 124 |
-
|
| 125 |
-
- **Use "search" mode for**:
|
| 126 |
-
- General information and research
|
| 127 |
-
- Technical documentation or tutorials
|
| 128 |
-
- Historical information
|
| 129 |
-
- Diverse perspectives from various sources
|
| 130 |
-
- How-to guides and explanations
|
| 131 |
-
|
| 132 |
-
**Example Usage in LLM**:
|
| 133 |
-
```
|
| 134 |
-
# News mode examples
|
| 135 |
-
"Search for breaking news about OpenAI" -> uses news mode
|
| 136 |
-
"Find today's stock market updates" -> uses news mode
|
| 137 |
-
"Get latest climate change developments" -> uses news mode
|
| 138 |
-
|
| 139 |
-
# Search mode examples (default)
|
| 140 |
-
"Search for Python programming tutorials" -> uses search mode
|
| 141 |
-
"Find information about machine learning algorithms" -> uses search mode
|
| 142 |
-
"Research historical data about climate change" -> uses search mode
|
| 143 |
-
```
|
| 144 |
-
|
| 145 |
-
## Error Handling
|
| 146 |
-
|
| 147 |
-
The tool handles various error scenarios:
|
| 148 |
-
- Missing API key: Clear error message with setup instructions
|
| 149 |
-
- Rate limiting: Informs when limit is exceeded
|
| 150 |
-
- Failed extractions: Reports which articles couldn't be extracted
|
| 151 |
-
- Network errors: Graceful error messages
|
| 152 |
-
|
| 153 |
-
## Testing
|
| 154 |
-
|
| 155 |
-
You can test the server manually:
|
| 156 |
-
1. Open `http://localhost:7860` in your browser
|
| 157 |
-
2. Enter a search query
|
| 158 |
-
3. Adjust the number of results
|
| 159 |
-
4. Click "Search" to see the extracted content
|
| 160 |
-
|
| 161 |
-
## Tips for LLM Usage
|
| 162 |
-
|
| 163 |
-
1. **Choose the right search type**: Use "news" for fresh, breaking news; use "search" for general information
|
| 164 |
-
2. **Be specific with queries**: More specific queries yield better results
|
| 165 |
-
3. **Adjust result count**: Use fewer results for quick searches, more for comprehensive research
|
| 166 |
-
4. **Check dates**: The tool shows article dates for temporal context
|
| 167 |
-
5. **Follow up**: Use the extracted content to ask follow-up questions
|
| 168 |
-
|
| 169 |
-
## Limitations
|
| 170 |
-
|
| 171 |
-
- Rate limited to 200 requests per hour
|
| 172 |
-
- Extraction quality depends on website structure
|
| 173 |
-
- Some websites may block automated access
|
| 174 |
-
- News mode focuses on recent articles from news sources
|
| 175 |
-
- Search mode provides diverse results but may include older content
|
| 176 |
|
| 177 |
## Troubleshooting
|
|
|
|
|
|
|
|
|
|
|
|
|
| 178 |
|
| 179 |
-
|
| 180 |
-
|
| 181 |
-
3. **No content extracted**: Some websites block scrapers; try different queries
|
| 182 |
-
4. **Connection errors**: Check your internet connection and firewall settings
|
|
|
|
| 7 |
sdk_version: 5.36.2
|
| 8 |
app_file: app.py
|
| 9 |
pinned: false
|
| 10 |
+
short_description: Search & fetch the web with per-tool analytics
|
| 11 |
thumbnail: >-
|
| 12 |
https://cdn-uploads.huggingface.co/production/uploads/5f17f0a0925b9863e28ad517/tfYtTMw9FgiWdyyIYz6A6.png
|
| 13 |
---
|
| 14 |
|
| 15 |
+
# Web MCP Server
|
| 16 |
|
| 17 |
+
A Model Context Protocol (MCP) server that exposes two composable toolsβ`search` (Serper metadata) and `fetch` (single-page extraction)βalongside a live analytics dashboard that tracks daily usage for each tool. The UI runs on Gradio and can be reached directly or via MCP-compatible clients like Claude Desktop and Cursor.
|
| 18 |
|
| 19 |
+
## Highlights
|
| 20 |
+
- Dual MCP tools with shared rate limiting (`360 requests/hour`) and structured JSON responses.
|
| 21 |
+
- Daily analytics split by tool: the **Analytics** tab renders "Daily Search" (left) and "Daily Fetch" (right) bar charts covering the last 14 days.
|
| 22 |
+
- Persistent request counters keyed by UTC date and tool: `{"YYYY-MM-DD": {"search": n, "fetch": m}}`, with automatic migration from legacy totals.
|
| 23 |
+
- Pluggable storage: respects `ANALYTICS_DATA_DIR`, otherwise falls back to `/data` (if writable) or `./data` for local development.
|
| 24 |
+
- Ready-to-serve Gradio app with MCP endpoints exposed via `gr.api` for direct client consumption.
|
| 25 |
|
| 26 |
+
## Requirements
|
| 27 |
+
- Python 3.8 or newer.
|
| 28 |
+
- Serper API key (`SERPER_API_KEY`) with access to the Search and News endpoints.
|
| 29 |
+
- Dependencies listed in `requirements.txt`, including `filelock` and `pandas` for analytics storage.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
+
Install everything with:
|
| 32 |
+
```bash
|
| 33 |
+
pip install -r requirements.txt
|
| 34 |
+
```
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
+
## Configuration
|
| 37 |
+
1. Export your Serper API key:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 38 |
```bash
|
| 39 |
+
export SERPER_API_KEY="your-api-key"
|
| 40 |
```
|
| 41 |
+
2. (Optional) Override the analytics storage path:
|
|
|
|
| 42 |
```bash
|
| 43 |
+
export ANALYTICS_DATA_DIR="/path/to/persistent/storage"
|
| 44 |
```
|
| 45 |
+
If unset, the app automatically prefers `/data` when available, otherwise `./data`.
|
| 46 |
|
| 47 |
+
The request counters live in `<DATA_DIR>/request_counts.json`, guarded by a file lock to support concurrent MCP calls.
|
|
|
|
|
|
|
| 48 |
|
| 49 |
+
## Running Locally
|
| 50 |
+
Launch the Gradio server (with MCP support enabled) via:
|
| 51 |
```bash
|
| 52 |
+
python app.py
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
```
|
| 54 |
+
This starts a local UI at `http://localhost:7860` and exposes the MCP SSE endpoint at `http://localhost:7860/gradio_api/mcp/sse`.
|
| 55 |
+
|
| 56 |
+
### Connecting From MCP Clients
|
| 57 |
+
- **Claude Desktop** β update `claude_desktop_config.json`:
|
| 58 |
+
```json
|
| 59 |
+
{
|
| 60 |
+
"mcpServers": {
|
| 61 |
+
"web-search": {
|
| 62 |
+
"command": "python",
|
| 63 |
+
"args": ["/absolute/path/to/app.py"],
|
| 64 |
+
"env": {
|
| 65 |
+
"SERPER_API_KEY": "your-api-key"
|
| 66 |
+
}
|
| 67 |
}
|
| 68 |
}
|
| 69 |
}
|
| 70 |
+
```
|
| 71 |
+
- **URL-based MCP clients** β run `python app.py`, then point the client to `http://localhost:7860/gradio_api/mcp/sse`.
|
| 72 |
+
|
| 73 |
+
## Tool Reference
|
| 74 |
+
### `search`
|
| 75 |
+
- **Purpose**: Retrieve metadata-only results from Serper (general web or news).
|
| 76 |
+
- **Inputs**:
|
| 77 |
+
- `query` *(str, required)* β search terms.
|
| 78 |
+
- `search_type` *("search" | "news", default "search")* β switch to `news` for recency-focused results.
|
| 79 |
+
- `num_results` *(int, default 4, range 1β20)* β number of hits to return.
|
| 80 |
+
- **Output**: JSON containing the query echo, result count, timing, and an array of entries with `position`, `title`, `link`, `domain`, and optional `source`/`date` for news.
|
| 81 |
+
|
| 82 |
+
### `fetch`
|
| 83 |
+
- **Purpose**: Download a single URL and extract the readable article text via Trafilatura.
|
| 84 |
+
- **Inputs**:
|
| 85 |
+
- `url` *(str, required)* β must start with `http://` or `https://`.
|
| 86 |
+
- `timeout` *(int, default 20 seconds)* β client timeout for the HTTP request.
|
| 87 |
+
- **Output**: JSON with the original and final URL, domain, HTTP status, title, ISO timestamp of the fetch, word count, cleaned `content`, and duration.
|
| 88 |
+
|
| 89 |
+
Both tools increment their respective analytics buckets on every invocation, including validation failures and rate-limit denials, ensuring the dashboard mirrors real traffic.
|
| 90 |
+
|
| 91 |
+
## Analytics Dashboard
|
| 92 |
+
Open the **Analytics** tab in the Gradio UI to inspect daily activity:
|
| 93 |
+
- **Daily Search Count** (left column) β bar chart for the past 14 days of `search` tool requests.
|
| 94 |
+
- **Daily Fetch Count** (right column) β bar chart for the past 14 days of `fetch` tool requests.
|
| 95 |
+
- Tooltips reveal the display label (e.g., `Sep 17`), raw count, and ISO date key.
|
| 96 |
+
|
| 97 |
+
Data is stored in JSON and can be safely externalized for long-term tracking. Existing totals in the legacy integer-only format are automatically migrated during the first write.
|
| 98 |
+
|
| 99 |
+
## Rate Limiting & Error Handling
|
| 100 |
+
- Global moving-window limit of `360` requests per hour shared across both tools (powered by `limits`).
|
| 101 |
+
- Standardized error payloads for missing parameters, invalid URLs, Serper issues, HTTP failures, and rate-limit hits, each preserving analytics increments.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
|
| 103 |
## Troubleshooting
|
| 104 |
+
- **`SERPER_API_KEY is not set`** β export the key in the environment where the server runs.
|
| 105 |
+
- **`Rate limit exceeded`** β pause requests or reduce client concurrency.
|
| 106 |
+
- **Empty extraction** β some sites block bots; try another URL.
|
| 107 |
+
- **Storage permissions** β ensure the chosen data directory is writable; adjust `ANALYTICS_DATA_DIR` if necessary.
|
| 108 |
|
| 109 |
+
## Licensing & Contributions
|
| 110 |
+
Feel free to fork and adapt for your MCP workflows. Contributions are welcomeβopen a PR or issue with proposed analytics enhancements, additional tooling, or documentation tweaks.
|
|
|
|
|
|
analytics.py
CHANGED
|
@@ -2,8 +2,8 @@
|
|
| 2 |
import os
|
| 3 |
import json
|
| 4 |
from datetime import datetime, timedelta, timezone
|
| 5 |
-
from filelock import FileLock
|
| 6 |
-
import pandas as pd
|
| 7 |
|
| 8 |
# Determine data directory based on environment
|
| 9 |
# 1. Check for environment variable override
|
|
@@ -21,84 +21,85 @@ if not DATA_DIR:
|
|
| 21 |
os.makedirs(DATA_DIR, exist_ok=True)
|
| 22 |
|
| 23 |
COUNTS_FILE = os.path.join(DATA_DIR, "request_counts.json")
|
| 24 |
-
|
| 25 |
-
LOCK_FILE = os.path.join(DATA_DIR, "analytics.lock")
|
| 26 |
|
| 27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
if not os.path.exists(COUNTS_FILE):
|
| 29 |
return {}
|
| 30 |
with open(COUNTS_FILE) as f:
|
| 31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
|
| 33 |
-
def
|
| 34 |
with open(COUNTS_FILE, "w") as f:
|
| 35 |
json.dump(data, f)
|
| 36 |
|
| 37 |
-
def _load_times() -> dict:
|
| 38 |
-
if not os.path.exists(TIMES_FILE):
|
| 39 |
-
return {}
|
| 40 |
-
with open(TIMES_FILE) as f:
|
| 41 |
-
return json.load(f)
|
| 42 |
|
| 43 |
-
def
|
| 44 |
-
|
| 45 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
-
async def record_request(duration: float = None, num_results: int = None) -> None:
|
| 48 |
-
"""Increment today's counter (UTC) atomically and optionally record request duration."""
|
| 49 |
today = datetime.now(timezone.utc).strftime("%Y-%m-%d")
|
| 50 |
with FileLock(LOCK_FILE):
|
| 51 |
-
|
| 52 |
-
|
| 53 |
-
|
| 54 |
-
|
| 55 |
-
|
| 56 |
-
|
| 57 |
-
|
| 58 |
-
|
| 59 |
-
|
| 60 |
-
|
| 61 |
-
|
| 62 |
-
|
| 63 |
-
|
| 64 |
-
def last_n_days_df(n: int = 30) -> pd.DataFrame:
|
| 65 |
-
"""Return a DataFrame with a row for each of the past *n* days."""
|
| 66 |
-
now = datetime.now(timezone.utc)
|
| 67 |
-
with FileLock(LOCK_FILE):
|
| 68 |
-
data = _load()
|
| 69 |
-
records = []
|
| 70 |
-
for i in range(n):
|
| 71 |
-
day = (now - timedelta(days=n - 1 - i))
|
| 72 |
-
day_str = day.strftime("%Y-%m-%d")
|
| 73 |
-
# Format date for display (MMM DD)
|
| 74 |
-
display_date = day.strftime("%b %d")
|
| 75 |
-
records.append({
|
| 76 |
-
"date": display_date,
|
| 77 |
-
"count": data.get(day_str, 0),
|
| 78 |
-
"full_date": day_str # Keep full date for tooltip
|
| 79 |
-
})
|
| 80 |
-
return pd.DataFrame(records)
|
| 81 |
|
| 82 |
-
def last_n_days_avg_time_df(n: int = 30) -> pd.DataFrame:
|
| 83 |
-
"""Return a DataFrame with average request time for each of the past *n* days."""
|
| 84 |
now = datetime.now(timezone.utc)
|
| 85 |
with FileLock(LOCK_FILE):
|
| 86 |
-
|
|
|
|
| 87 |
records = []
|
| 88 |
for i in range(n):
|
| 89 |
-
day =
|
| 90 |
-
|
| 91 |
-
# Format date for display (MMM DD)
|
| 92 |
display_date = day.strftime("%b %d")
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
| 96 |
-
|
| 97 |
-
|
| 98 |
-
|
| 99 |
-
|
| 100 |
-
|
| 101 |
-
|
| 102 |
-
"full_date": day_str # Keep full date for tooltip
|
| 103 |
-
})
|
| 104 |
-
return pd.DataFrame(records)
|
|
|
|
| 2 |
import os
|
| 3 |
import json
|
| 4 |
from datetime import datetime, timedelta, timezone
|
| 5 |
+
from filelock import FileLock # pip install filelock
|
| 6 |
+
import pandas as pd # already available in HF images
|
| 7 |
|
| 8 |
# Determine data directory based on environment
|
| 9 |
# 1. Check for environment variable override
|
|
|
|
| 21 |
os.makedirs(DATA_DIR, exist_ok=True)
|
| 22 |
|
| 23 |
COUNTS_FILE = os.path.join(DATA_DIR, "request_counts.json")
|
| 24 |
+
LOCK_FILE = os.path.join(DATA_DIR, "analytics.lock")
|
|
|
|
| 25 |
|
| 26 |
+
|
| 27 |
+
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 28 |
+
# Storage helpers
|
| 29 |
+
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 30 |
+
def _load_counts() -> dict:
|
| 31 |
if not os.path.exists(COUNTS_FILE):
|
| 32 |
return {}
|
| 33 |
with open(COUNTS_FILE) as f:
|
| 34 |
+
try:
|
| 35 |
+
return json.load(f)
|
| 36 |
+
except json.JSONDecodeError:
|
| 37 |
+
return {}
|
| 38 |
+
|
| 39 |
|
| 40 |
+
def _save_counts(data: dict):
|
| 41 |
with open(COUNTS_FILE, "w") as f:
|
| 42 |
json.dump(data, f)
|
| 43 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
+
def _normalize_counts_schema(data: dict) -> dict:
|
| 46 |
+
"""
|
| 47 |
+
Ensure data is {date: {"search": int, "fetch": int}}.
|
| 48 |
+
Backward compatible with old schema {date: int}.
|
| 49 |
+
"""
|
| 50 |
+
normalized = {}
|
| 51 |
+
for day, value in data.items():
|
| 52 |
+
if isinstance(value, dict):
|
| 53 |
+
normalized[day] = {
|
| 54 |
+
"search": int(value.get("search", 0)),
|
| 55 |
+
"fetch": int(value.get("fetch", 0)),
|
| 56 |
+
}
|
| 57 |
+
else:
|
| 58 |
+
# Old schema: total count as int β attribute to "search", keep fetch=0
|
| 59 |
+
normalized[day] = {"search": int(value or 0), "fetch": 0}
|
| 60 |
+
return normalized
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 64 |
+
# Public API
|
| 65 |
+
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 66 |
+
async def record_request(tool: str) -> None:
|
| 67 |
+
"""Increment today's counter (UTC) for the given tool: 'search' or 'fetch'."""
|
| 68 |
+
tool = (tool or "").strip().lower()
|
| 69 |
+
if tool not in {"search", "fetch"}:
|
| 70 |
+
# Ignore unknown tool buckets to keep charts clean
|
| 71 |
+
tool = "search"
|
| 72 |
|
|
|
|
|
|
|
| 73 |
today = datetime.now(timezone.utc).strftime("%Y-%m-%d")
|
| 74 |
with FileLock(LOCK_FILE):
|
| 75 |
+
data = _normalize_counts_schema(_load_counts())
|
| 76 |
+
if today not in data:
|
| 77 |
+
data[today] = {"search": 0, "fetch": 0}
|
| 78 |
+
data[today][tool] = int(data[today].get(tool, 0)) + 1
|
| 79 |
+
_save_counts(data)
|
| 80 |
+
|
| 81 |
+
|
| 82 |
+
def last_n_days_count_df(tool: str, n: int = 30) -> pd.DataFrame:
|
| 83 |
+
"""Return DataFrame with a row for each of the past n days for the given tool."""
|
| 84 |
+
tool = (tool or "").strip().lower()
|
| 85 |
+
if tool not in {"search", "fetch"}:
|
| 86 |
+
tool = "search"
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 87 |
|
|
|
|
|
|
|
| 88 |
now = datetime.now(timezone.utc)
|
| 89 |
with FileLock(LOCK_FILE):
|
| 90 |
+
data = _normalize_counts_schema(_load_counts())
|
| 91 |
+
|
| 92 |
records = []
|
| 93 |
for i in range(n):
|
| 94 |
+
day = now - timedelta(days=n - 1 - i)
|
| 95 |
+
day_key = day.strftime("%Y-%m-%d")
|
|
|
|
| 96 |
display_date = day.strftime("%b %d")
|
| 97 |
+
counts = data.get(day_key, {"search": 0, "fetch": 0})
|
| 98 |
+
records.append(
|
| 99 |
+
{
|
| 100 |
+
"date": display_date,
|
| 101 |
+
"count": int(counts.get(tool, 0)),
|
| 102 |
+
"full_date": day_key,
|
| 103 |
+
}
|
| 104 |
+
)
|
| 105 |
+
return pd.DataFrame(records)
|
|
|
|
|
|
|
|
|
app.py
CHANGED
|
@@ -1,8 +1,11 @@
|
|
| 1 |
import os
|
| 2 |
-
import asyncio
|
| 3 |
import time
|
| 4 |
-
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
import httpx
|
| 7 |
import trafilatura
|
| 8 |
import gradio as gr
|
|
@@ -10,92 +13,91 @@ from dateutil import parser as dateparser
|
|
| 10 |
from limits import parse
|
| 11 |
from limits.aio.storage import MemoryStorage
|
| 12 |
from limits.aio.strategies import MovingWindowRateLimiter
|
| 13 |
-
from analytics import record_request, last_n_days_df, last_n_days_avg_time_df
|
| 14 |
|
|
|
|
|
|
|
|
|
|
| 15 |
# Configuration
|
|
|
|
| 16 |
SERPER_API_KEY = os.getenv("SERPER_API_KEY")
|
| 17 |
SERPER_SEARCH_ENDPOINT = "https://google.serper.dev/search"
|
| 18 |
SERPER_NEWS_ENDPOINT = "https://google.serper.dev/news"
|
| 19 |
-
HEADERS = {"X-API-KEY": SERPER_API_KEY, "Content-Type": "application/json"}
|
| 20 |
|
| 21 |
-
# Rate limiting
|
| 22 |
storage = MemoryStorage()
|
| 23 |
limiter = MovingWindowRateLimiter(storage)
|
| 24 |
-
rate_limit = parse("360/hour")
|
| 25 |
|
| 26 |
|
| 27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
query: str, search_type: str = "search", num_results: Optional[int] = 4
|
| 29 |
-
) -> str:
|
| 30 |
"""
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
This tool can perform two types of searches:
|
| 34 |
-
- "search" (default): General web search for diverse, relevant content from various sources
|
| 35 |
-
- "news": Specifically searches for fresh news articles and breaking stories
|
| 36 |
-
|
| 37 |
-
Use "news" mode when looking for:
|
| 38 |
-
- Breaking news or very recent events
|
| 39 |
-
- Time-sensitive information
|
| 40 |
-
- Current affairs and latest developments
|
| 41 |
-
- Today's/this week's happenings
|
| 42 |
-
|
| 43 |
-
Use "search" mode (default) for:
|
| 44 |
-
- General information and research
|
| 45 |
-
- Technical documentation or guides
|
| 46 |
-
- Historical information
|
| 47 |
-
- Diverse perspectives from various sources
|
| 48 |
-
|
| 49 |
-
Args:
|
| 50 |
-
query (str): The search query. This is REQUIRED. Examples: "apple inc earnings",
|
| 51 |
-
"climate change 2024", "AI developments"
|
| 52 |
-
search_type (str): Type of search. This is OPTIONAL. Default is "search".
|
| 53 |
-
Options: "search" (general web search) or "news" (fresh news articles).
|
| 54 |
-
Use "news" for time-sensitive, breaking news content.
|
| 55 |
-
num_results (int): Number of results to fetch. This is OPTIONAL. Default is 4.
|
| 56 |
-
Range: 1-20. More results = more context but longer response time.
|
| 57 |
-
|
| 58 |
-
Returns:
|
| 59 |
-
str: Formatted text containing extracted content with metadata (title,
|
| 60 |
-
source, date, URL, and main text) for each result, separated by dividers.
|
| 61 |
-
Returns error message if API key is missing or search fails.
|
| 62 |
-
|
| 63 |
-
Examples:
|
| 64 |
-
- search_web("OpenAI GPT-5", "news") - Get 5 fresh news articles about OpenAI
|
| 65 |
-
- search_web("python tutorial", "search") - Get 4 general results about Python (default count)
|
| 66 |
-
- search_web("stock market today", "news", 10) - Get 10 news articles about today's market
|
| 67 |
-
- search_web("machine learning basics") - Get 4 general search results (all defaults)
|
| 68 |
"""
|
| 69 |
start_time = time.time()
|
| 70 |
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
|
|
|
|
| 74 |
|
| 75 |
-
# Validate and constrain num_results
|
| 76 |
if num_results is None:
|
| 77 |
num_results = 4
|
| 78 |
-
num_results = max(1, min(
|
| 79 |
-
|
| 80 |
-
# Validate search_type
|
| 81 |
if search_type not in ["search", "news"]:
|
| 82 |
search_type = "search"
|
| 83 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 84 |
try:
|
| 85 |
-
#
|
| 86 |
if not await limiter.hit(rate_limit, "global"):
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
await record_request(duration, num_results)
|
| 90 |
-
return "Error: Rate limit exceeded. Please try again later (limit: 500 requests per hour)."
|
| 91 |
|
| 92 |
-
# Select endpoint based on search type
|
| 93 |
endpoint = (
|
| 94 |
SERPER_NEWS_ENDPOINT if search_type == "news" else SERPER_SEARCH_ENDPOINT
|
| 95 |
)
|
| 96 |
-
|
| 97 |
-
# Prepare payload
|
| 98 |
-
payload = {"q": query, "num": num_results}
|
| 99 |
if search_type == "news":
|
| 100 |
payload["type"] = "news"
|
| 101 |
payload["page"] = 1
|
|
@@ -104,107 +106,119 @@ async def search_web(
|
|
| 104 |
resp = await client.post(endpoint, headers=HEADERS, json=payload)
|
| 105 |
|
| 106 |
if resp.status_code != 200:
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
|
| 119 |
-
|
| 120 |
-
|
| 121 |
-
|
| 122 |
-
|
| 123 |
-
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
|
| 127 |
-
|
| 128 |
-
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 140 |
|
| 141 |
-
|
| 142 |
-
|
|
|
|
| 143 |
|
| 144 |
-
successful_extractions += 1
|
| 145 |
-
print(
|
| 146 |
-
f"[{datetime.now().isoformat()}] Successfully extracted content from {meta['link']}"
|
| 147 |
-
)
|
| 148 |
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
|
| 152 |
-
|
| 153 |
-
|
| 154 |
-
|
| 155 |
-
|
| 156 |
-
|
| 157 |
-
)
|
| 158 |
-
else:
|
| 159 |
-
date_iso = "Unknown"
|
| 160 |
-
except Exception:
|
| 161 |
-
date_iso = "Unknown"
|
| 162 |
-
|
| 163 |
-
chunk = (
|
| 164 |
-
f"## {meta['title']}\n"
|
| 165 |
-
f"**Source:** {meta.get('source', 'Unknown')} "
|
| 166 |
-
f"**Date:** {date_iso}\n"
|
| 167 |
-
f"**URL:** {meta['link']}\n\n"
|
| 168 |
-
f"{body.strip()}\n"
|
| 169 |
-
)
|
| 170 |
-
else:
|
| 171 |
-
# Search results don't have date/source but have domain
|
| 172 |
-
domain = meta["link"].split("/")[2].replace("www.", "")
|
| 173 |
-
|
| 174 |
-
chunk = (
|
| 175 |
-
f"## {meta['title']}\n"
|
| 176 |
-
f"**Domain:** {domain}\n"
|
| 177 |
-
f"**URL:** {meta['link']}\n\n"
|
| 178 |
-
f"{body.strip()}\n"
|
| 179 |
-
)
|
| 180 |
-
|
| 181 |
-
chunks.append(chunk)
|
| 182 |
-
|
| 183 |
-
if not chunks:
|
| 184 |
-
duration = time.time() - start_time
|
| 185 |
-
await record_request(duration, num_results)
|
| 186 |
-
return f"Found {len(results)} {search_type} results for '{query}', but couldn't extract readable content from any of them. The websites might be blocking automated access."
|
| 187 |
-
|
| 188 |
-
result = "\n---\n".join(chunks)
|
| 189 |
-
summary = f"Successfully extracted content from {successful_extractions} out of {len(results)} {search_type} results for query: '{query}'\n\n---\n\n"
|
| 190 |
-
|
| 191 |
-
print(
|
| 192 |
-
f"[{datetime.now().isoformat()}] Extraction complete: {successful_extractions}/{len(results)} successful for query '{query}'"
|
| 193 |
-
)
|
| 194 |
|
| 195 |
-
|
| 196 |
-
|
| 197 |
-
|
|
|
|
|
|
|
|
|
|
| 198 |
|
| 199 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 200 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 201 |
except Exception as e:
|
| 202 |
-
|
| 203 |
-
|
| 204 |
-
return f"Error occurred while searching: {str(e)}. Please try again or check your query."
|
| 205 |
|
| 206 |
|
| 207 |
-
#
|
|
|
|
|
|
|
| 208 |
with gr.Blocks(title="Web Search MCP Server") as demo:
|
| 209 |
gr.HTML(
|
| 210 |
"""
|
|
@@ -217,141 +231,141 @@ with gr.Blocks(title="Web Search MCP Server") as demo:
|
|
| 217 |
)
|
| 218 |
|
| 219 |
gr.Markdown("# π Web Search MCP Server")
|
|
|
|
|
|
|
|
|
|
| 220 |
|
| 221 |
with gr.Tabs():
|
| 222 |
with gr.Tab("App"):
|
| 223 |
-
gr.Markdown(
|
| 224 |
-
"""
|
| 225 |
-
This MCP server provides web search capabilities to LLMs. It can perform general web searches
|
| 226 |
-
or specifically search for fresh news articles, extracting the main content from results.
|
| 227 |
-
|
| 228 |
-
**β‘ Speed-Focused:** Optimized to complete the entire search process - from query to
|
| 229 |
-
fully extracted web content - in under 2 seconds. Check out the Analytics tab
|
| 230 |
-
to see real-time performance metrics.
|
| 231 |
-
|
| 232 |
-
**Search Types:**
|
| 233 |
-
- **General Search**: Diverse results from various sources (blogs, docs, articles, etc.)
|
| 234 |
-
- **News Search**: Fresh news articles and breaking stories from news sources
|
| 235 |
-
|
| 236 |
-
**Note:** This interface is primarily designed for MCP tool usage by LLMs, but you can
|
| 237 |
-
also test it manually below.
|
| 238 |
-
"""
|
| 239 |
-
)
|
| 240 |
-
|
| 241 |
-
gr.HTML(
|
| 242 |
-
"""
|
| 243 |
-
<div style="margin-bottom: 24px;">
|
| 244 |
-
<a href="https://huggingface.co/spaces/victor/websearch?view=api">
|
| 245 |
-
<img src="https://huggingface.co/datasets/huggingface/badges/resolve/main/use-with-mcp-lg-dark.svg"
|
| 246 |
-
alt="Use with MCP"
|
| 247 |
-
style="height: 36px;">
|
| 248 |
-
</a>
|
| 249 |
-
</div>
|
| 250 |
-
""",
|
| 251 |
-
padding=0,
|
| 252 |
-
)
|
| 253 |
-
|
| 254 |
with gr.Row():
|
|
|
|
| 255 |
with gr.Column(scale=3):
|
|
|
|
| 256 |
query_input = gr.Textbox(
|
| 257 |
label="Search Query",
|
| 258 |
-
placeholder='e.g. "OpenAI news", "climate change 2024", "
|
| 259 |
-
info="Required
|
| 260 |
)
|
| 261 |
-
with gr.Column(scale=1):
|
| 262 |
search_type_input = gr.Radio(
|
| 263 |
choices=["search", "news"],
|
| 264 |
value="search",
|
| 265 |
label="Search Type",
|
| 266 |
-
info="Choose search
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 267 |
)
|
| 268 |
|
| 269 |
-
|
| 270 |
-
|
| 271 |
-
|
| 272 |
-
|
| 273 |
-
|
| 274 |
-
|
| 275 |
-
|
| 276 |
-
|
| 277 |
-
|
| 278 |
-
|
| 279 |
-
|
| 280 |
-
|
| 281 |
-
output = gr.Textbox(
|
| 282 |
-
label="Extracted Content",
|
| 283 |
-
lines=25,
|
| 284 |
-
max_lines=50,
|
| 285 |
-
info="The extracted article content will appear here",
|
| 286 |
-
)
|
| 287 |
|
| 288 |
-
|
| 289 |
-
|
| 290 |
-
|
| 291 |
-
|
| 292 |
-
|
| 293 |
-
|
| 294 |
-
|
| 295 |
-
|
| 296 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 297 |
inputs=[query_input, search_type_input, num_results_input],
|
| 298 |
-
outputs=
|
| 299 |
-
|
| 300 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 301 |
)
|
| 302 |
|
| 303 |
with gr.Tab("Analytics"):
|
| 304 |
gr.Markdown("## Community Usage Analytics")
|
| 305 |
-
gr.Markdown(
|
| 306 |
-
"Track daily request counts and average response times from all community users."
|
| 307 |
-
)
|
| 308 |
|
| 309 |
with gr.Row():
|
| 310 |
with gr.Column():
|
| 311 |
-
|
| 312 |
-
value=
|
| 313 |
-
14
|
| 314 |
-
), # Show only last 14 days for better visibility
|
| 315 |
x="date",
|
| 316 |
y="count",
|
| 317 |
-
title="Daily
|
| 318 |
-
tooltip=["date", "count"],
|
| 319 |
height=350,
|
| 320 |
-
x_label_angle=-45,
|
| 321 |
container=False,
|
| 322 |
)
|
| 323 |
-
|
| 324 |
with gr.Column():
|
| 325 |
-
|
| 326 |
-
value=
|
| 327 |
x="date",
|
| 328 |
-
y="
|
| 329 |
-
title="
|
| 330 |
-
tooltip=["date", "
|
| 331 |
height=350,
|
| 332 |
x_label_angle=-45,
|
| 333 |
container=False,
|
| 334 |
)
|
| 335 |
|
| 336 |
-
|
| 337 |
-
fn=search_web, # Use search_web directly instead of search_and_log
|
| 338 |
-
inputs=[query_input, search_type_input, num_results_input],
|
| 339 |
-
outputs=output,
|
| 340 |
-
api_name=False, # Hide this endpoint from API & MCP
|
| 341 |
-
)
|
| 342 |
-
|
| 343 |
-
# Load fresh analytics data when the page loads or Analytics tab is clicked
|
| 344 |
demo.load(
|
| 345 |
-
fn=lambda: (
|
| 346 |
-
|
|
|
|
|
|
|
|
|
|
| 347 |
api_name=False,
|
| 348 |
)
|
| 349 |
|
| 350 |
-
# Expose
|
| 351 |
-
gr.api(
|
|
|
|
| 352 |
|
| 353 |
|
| 354 |
if __name__ == "__main__":
|
| 355 |
# Launch with MCP server enabled
|
| 356 |
-
# The MCP endpoint will be available at: http://localhost:7860/gradio_api/mcp/sse
|
| 357 |
demo.launch(mcp_server=True, show_api=True)
|
|
|
|
| 1 |
import os
|
|
|
|
| 2 |
import time
|
| 3 |
+
import re
|
| 4 |
+
import html
|
| 5 |
+
from typing import Optional, Dict, Any, List
|
| 6 |
+
from urllib.parse import urlsplit
|
| 7 |
+
from datetime import datetime, timezone
|
| 8 |
+
|
| 9 |
import httpx
|
| 10 |
import trafilatura
|
| 11 |
import gradio as gr
|
|
|
|
| 13 |
from limits import parse
|
| 14 |
from limits.aio.storage import MemoryStorage
|
| 15 |
from limits.aio.strategies import MovingWindowRateLimiter
|
|
|
|
| 16 |
|
| 17 |
+
from analytics import record_request, last_n_days_count_df
|
| 18 |
+
|
| 19 |
+
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 20 |
# Configuration
|
| 21 |
+
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 22 |
SERPER_API_KEY = os.getenv("SERPER_API_KEY")
|
| 23 |
SERPER_SEARCH_ENDPOINT = "https://google.serper.dev/search"
|
| 24 |
SERPER_NEWS_ENDPOINT = "https://google.serper.dev/news"
|
| 25 |
+
HEADERS = {"X-API-KEY": SERPER_API_KEY or "", "Content-Type": "application/json"}
|
| 26 |
|
| 27 |
+
# Rate limiting (shared by both tools)
|
| 28 |
storage = MemoryStorage()
|
| 29 |
limiter = MovingWindowRateLimiter(storage)
|
| 30 |
+
rate_limit = parse("360/hour") # shared global limit across search + fetch
|
| 31 |
|
| 32 |
|
| 33 |
+
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 34 |
+
# Helpers
|
| 35 |
+
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 36 |
+
def _domain_from_url(url: str) -> str:
|
| 37 |
+
try:
|
| 38 |
+
netloc = urlsplit(url).netloc
|
| 39 |
+
return netloc.replace("www.", "")
|
| 40 |
+
except Exception:
|
| 41 |
+
return ""
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
def _iso_date_or_unknown(date_str: Optional[str]) -> Optional[str]:
|
| 45 |
+
if not date_str:
|
| 46 |
+
return None
|
| 47 |
+
try:
|
| 48 |
+
return dateparser.parse(date_str, fuzzy=True).strftime("%Y-%m-%d")
|
| 49 |
+
except Exception:
|
| 50 |
+
return None
|
| 51 |
+
|
| 52 |
+
|
| 53 |
+
def _extract_title_from_html(html_text: str) -> Optional[str]:
|
| 54 |
+
m = re.search(r"<title[^>]*>(.*?)</title>", html_text, re.IGNORECASE | re.DOTALL)
|
| 55 |
+
if not m:
|
| 56 |
+
return None
|
| 57 |
+
title = re.sub(r"\s+", " ", m.group(1)).strip()
|
| 58 |
+
return html.unescape(title) if title else None
|
| 59 |
+
|
| 60 |
+
|
| 61 |
+
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 62 |
+
# Tool: search (metadata only)
|
| 63 |
+
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 64 |
+
async def search(
|
| 65 |
query: str, search_type: str = "search", num_results: Optional[int] = 4
|
| 66 |
+
) -> Dict[str, Any]:
|
| 67 |
"""
|
| 68 |
+
Perform a web or news search via Serper and return metadata ONLY.
|
| 69 |
+
Does NOT fetch or extract content from result URLs.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
"""
|
| 71 |
start_time = time.time()
|
| 72 |
|
| 73 |
+
# Validate inputs
|
| 74 |
+
if not query or not query.strip():
|
| 75 |
+
await record_request("search")
|
| 76 |
+
return {"error": "Missing 'query'. Please provide a search query string."}
|
| 77 |
|
|
|
|
| 78 |
if num_results is None:
|
| 79 |
num_results = 4
|
| 80 |
+
num_results = max(1, min(20, int(num_results)))
|
|
|
|
|
|
|
| 81 |
if search_type not in ["search", "news"]:
|
| 82 |
search_type = "search"
|
| 83 |
|
| 84 |
+
# Check API key
|
| 85 |
+
if not SERPER_API_KEY:
|
| 86 |
+
await record_request("search")
|
| 87 |
+
return {
|
| 88 |
+
"error": "SERPER_API_KEY is not set. Export SERPER_API_KEY and try again."
|
| 89 |
+
}
|
| 90 |
+
|
| 91 |
try:
|
| 92 |
+
# Rate limit
|
| 93 |
if not await limiter.hit(rate_limit, "global"):
|
| 94 |
+
await record_request("search")
|
| 95 |
+
return {"error": "Rate limit exceeded. Limit: 360 requests/hour."}
|
|
|
|
|
|
|
| 96 |
|
|
|
|
| 97 |
endpoint = (
|
| 98 |
SERPER_NEWS_ENDPOINT if search_type == "news" else SERPER_SEARCH_ENDPOINT
|
| 99 |
)
|
| 100 |
+
payload: Dict[str, Any] = {"q": query, "num": num_results}
|
|
|
|
|
|
|
| 101 |
if search_type == "news":
|
| 102 |
payload["type"] = "news"
|
| 103 |
payload["page"] = 1
|
|
|
|
| 106 |
resp = await client.post(endpoint, headers=HEADERS, json=payload)
|
| 107 |
|
| 108 |
if resp.status_code != 200:
|
| 109 |
+
await record_request("search")
|
| 110 |
+
return {
|
| 111 |
+
"error": f"Search API returned status {resp.status_code}. Check your API key and query."
|
| 112 |
+
}
|
| 113 |
+
|
| 114 |
+
data = resp.json()
|
| 115 |
+
raw_results: List[Dict[str, Any]] = (
|
| 116 |
+
data.get("news", []) if search_type == "news" else data.get("organic", [])
|
| 117 |
+
)
|
| 118 |
+
if not raw_results:
|
| 119 |
+
await record_request("search")
|
| 120 |
+
return {
|
| 121 |
+
"query": query,
|
| 122 |
+
"search_type": search_type,
|
| 123 |
+
"count": 0,
|
| 124 |
+
"results": [],
|
| 125 |
+
"message": f"No {search_type} results found.",
|
| 126 |
+
}
|
| 127 |
+
|
| 128 |
+
formatted: List[Dict[str, Any]] = []
|
| 129 |
+
for idx, r in enumerate(raw_results[:num_results], start=1):
|
| 130 |
+
item = {
|
| 131 |
+
"position": idx,
|
| 132 |
+
"title": r.get("title"),
|
| 133 |
+
"link": r.get("link"),
|
| 134 |
+
"domain": _domain_from_url(r.get("link", "")),
|
| 135 |
+
"snippet": r.get("snippet") or r.get("description"),
|
| 136 |
+
}
|
| 137 |
+
if search_type == "news":
|
| 138 |
+
item["source"] = r.get("source")
|
| 139 |
+
item["date"] = _iso_date_or_unknown(r.get("date"))
|
| 140 |
+
formatted.append(item)
|
| 141 |
+
|
| 142 |
+
await record_request("search")
|
| 143 |
+
return {
|
| 144 |
+
"query": query,
|
| 145 |
+
"search_type": search_type,
|
| 146 |
+
"count": len(formatted),
|
| 147 |
+
"results": formatted,
|
| 148 |
+
"duration_s": round(time.time() - start_time, 2),
|
| 149 |
+
}
|
| 150 |
|
| 151 |
+
except Exception as e:
|
| 152 |
+
await record_request("search")
|
| 153 |
+
return {"error": f"Search failed: {str(e)}"}
|
| 154 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 155 |
|
| 156 |
+
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 157 |
+
# Tool: fetch (single URL fetch + extraction)
|
| 158 |
+
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 159 |
+
async def fetch(url: str, timeout: int = 20) -> Dict[str, Any]:
|
| 160 |
+
"""
|
| 161 |
+
Fetch a single URL and extract the main readable content.
|
| 162 |
+
"""
|
| 163 |
+
start_time = time.time()
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 164 |
|
| 165 |
+
if not url or not isinstance(url, str):
|
| 166 |
+
await record_request("fetch")
|
| 167 |
+
return {"error": "Missing 'url'. Please provide a valid URL string."}
|
| 168 |
+
if not url.lower().startswith(("http://", "https://")):
|
| 169 |
+
await record_request("fetch")
|
| 170 |
+
return {"error": "URL must start with http:// or https://."}
|
| 171 |
|
| 172 |
+
try:
|
| 173 |
+
# Rate limit
|
| 174 |
+
if not await limiter.hit(rate_limit, "global"):
|
| 175 |
+
await record_request("fetch")
|
| 176 |
+
return {"error": "Rate limit exceeded. Limit: 360 requests/hour."}
|
| 177 |
+
|
| 178 |
+
async with httpx.AsyncClient(timeout=timeout, follow_redirects=True) as client:
|
| 179 |
+
resp = await client.get(url)
|
| 180 |
+
|
| 181 |
+
text = resp.text or ""
|
| 182 |
+
content = (
|
| 183 |
+
trafilatura.extract(
|
| 184 |
+
text,
|
| 185 |
+
include_formatting=False,
|
| 186 |
+
include_comments=False,
|
| 187 |
+
)
|
| 188 |
+
or ""
|
| 189 |
+
)
|
| 190 |
|
| 191 |
+
title = _extract_title_from_html(text) or ""
|
| 192 |
+
final_url_str = str(resp.url) if hasattr(resp, "url") else url
|
| 193 |
+
domain = _domain_from_url(final_url_str)
|
| 194 |
+
word_count = len(content.split()) if content else 0
|
| 195 |
+
|
| 196 |
+
result = {
|
| 197 |
+
"url": url,
|
| 198 |
+
"final_url": final_url_str,
|
| 199 |
+
"domain": domain,
|
| 200 |
+
"status_code": resp.status_code,
|
| 201 |
+
"title": title,
|
| 202 |
+
"fetched_at": datetime.now(timezone.utc).isoformat(),
|
| 203 |
+
"word_count": word_count,
|
| 204 |
+
"content": content.strip(),
|
| 205 |
+
"duration_s": round(time.time() - start_time, 2),
|
| 206 |
+
}
|
| 207 |
+
|
| 208 |
+
await record_request("fetch")
|
| 209 |
+
return result
|
| 210 |
+
|
| 211 |
+
except httpx.HTTPError as e:
|
| 212 |
+
await record_request("fetch")
|
| 213 |
+
return {"error": f"Network error while fetching: {str(e)}"}
|
| 214 |
except Exception as e:
|
| 215 |
+
await record_request("fetch")
|
| 216 |
+
return {"error": f"Unexpected error while fetching: {str(e)}"}
|
|
|
|
| 217 |
|
| 218 |
|
| 219 |
+
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 220 |
+
# Gradio UI
|
| 221 |
+
# ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 222 |
with gr.Blocks(title="Web Search MCP Server") as demo:
|
| 223 |
gr.HTML(
|
| 224 |
"""
|
|
|
|
| 231 |
)
|
| 232 |
|
| 233 |
gr.Markdown("# π Web Search MCP Server")
|
| 234 |
+
gr.Markdown(
|
| 235 |
+
"This server provides two composable MCP tools: **search** (metadata only) and **fetch** (single-URL extraction)."
|
| 236 |
+
)
|
| 237 |
|
| 238 |
with gr.Tabs():
|
| 239 |
with gr.Tab("App"):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 240 |
with gr.Row():
|
| 241 |
+
# ββ Search panel βββββββββββββββββββββββββββββββββββββββββββββββ
|
| 242 |
with gr.Column(scale=3):
|
| 243 |
+
gr.Markdown("## Search (metadata only)")
|
| 244 |
query_input = gr.Textbox(
|
| 245 |
label="Search Query",
|
| 246 |
+
placeholder='e.g. "OpenAI news", "climate change 2024", "React hooks useState"',
|
| 247 |
+
info="Required",
|
| 248 |
)
|
|
|
|
| 249 |
search_type_input = gr.Radio(
|
| 250 |
choices=["search", "news"],
|
| 251 |
value="search",
|
| 252 |
label="Search Type",
|
| 253 |
+
info="Choose general web search or news",
|
| 254 |
+
)
|
| 255 |
+
num_results_input = gr.Slider(
|
| 256 |
+
minimum=1,
|
| 257 |
+
maximum=20,
|
| 258 |
+
value=4,
|
| 259 |
+
step=1,
|
| 260 |
+
label="Number of Results",
|
| 261 |
+
info="Optional (default 4)",
|
| 262 |
+
)
|
| 263 |
+
search_button = gr.Button("Run Search", variant="primary")
|
| 264 |
+
search_output = gr.JSON(
|
| 265 |
+
label="Search Results (metadata only)",
|
| 266 |
)
|
| 267 |
|
| 268 |
+
gr.Examples(
|
| 269 |
+
examples=[
|
| 270 |
+
["OpenAI GPT-5 latest developments", "news", 5],
|
| 271 |
+
["React hooks useState", "search", 4],
|
| 272 |
+
["Apple Vision Pro reviews", "search", 4],
|
| 273 |
+
["Tesla stock price today", "news", 6],
|
| 274 |
+
],
|
| 275 |
+
inputs=[query_input, search_type_input, num_results_input],
|
| 276 |
+
outputs=search_output,
|
| 277 |
+
fn=search,
|
| 278 |
+
cache_examples=False,
|
| 279 |
+
)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 280 |
|
| 281 |
+
# ββ Fetch panel ββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 282 |
+
with gr.Column(scale=2):
|
| 283 |
+
gr.Markdown("## Fetch (single URL β extracted content)")
|
| 284 |
+
url_input = gr.Textbox(
|
| 285 |
+
label="URL",
|
| 286 |
+
placeholder="https://example.com/article",
|
| 287 |
+
info="Required: the URL to fetch and extract",
|
| 288 |
+
)
|
| 289 |
+
timeout_input = gr.Slider(
|
| 290 |
+
minimum=5,
|
| 291 |
+
maximum=60,
|
| 292 |
+
value=20,
|
| 293 |
+
step=1,
|
| 294 |
+
label="Timeout (seconds)",
|
| 295 |
+
info="Optional (default 20)",
|
| 296 |
+
)
|
| 297 |
+
fetch_button = gr.Button("Fetch & Extract", variant="primary")
|
| 298 |
+
fetch_output = gr.JSON(label="Fetched Content (structured)")
|
| 299 |
+
|
| 300 |
+
gr.Examples(
|
| 301 |
+
examples=[
|
| 302 |
+
["https://news.ycombinator.com/"],
|
| 303 |
+
["https://www.python.org/dev/peps/pep-0008/"],
|
| 304 |
+
["https://en.wikipedia.org/wiki/Model_Context_Protocol"],
|
| 305 |
+
],
|
| 306 |
+
inputs=[url_input],
|
| 307 |
+
outputs=fetch_output,
|
| 308 |
+
fn=fetch,
|
| 309 |
+
cache_examples=False,
|
| 310 |
+
)
|
| 311 |
+
|
| 312 |
+
# Wire up buttons
|
| 313 |
+
search_button.click(
|
| 314 |
+
fn=search,
|
| 315 |
inputs=[query_input, search_type_input, num_results_input],
|
| 316 |
+
outputs=search_output,
|
| 317 |
+
api_name=False,
|
| 318 |
+
)
|
| 319 |
+
fetch_button.click(
|
| 320 |
+
fn=fetch,
|
| 321 |
+
inputs=[url_input, timeout_input],
|
| 322 |
+
outputs=fetch_output,
|
| 323 |
+
api_name=False,
|
| 324 |
)
|
| 325 |
|
| 326 |
with gr.Tab("Analytics"):
|
| 327 |
gr.Markdown("## Community Usage Analytics")
|
| 328 |
+
gr.Markdown("Daily request counts (UTC), split by tool.")
|
|
|
|
|
|
|
| 329 |
|
| 330 |
with gr.Row():
|
| 331 |
with gr.Column():
|
| 332 |
+
search_plot = gr.BarPlot(
|
| 333 |
+
value=last_n_days_count_df("search", 14),
|
|
|
|
|
|
|
| 334 |
x="date",
|
| 335 |
y="count",
|
| 336 |
+
title="Daily Search Count",
|
| 337 |
+
tooltip=["date", "count", "full_date"],
|
| 338 |
height=350,
|
| 339 |
+
x_label_angle=-45,
|
| 340 |
container=False,
|
| 341 |
)
|
|
|
|
| 342 |
with gr.Column():
|
| 343 |
+
fetch_plot = gr.BarPlot(
|
| 344 |
+
value=last_n_days_count_df("fetch", 14),
|
| 345 |
x="date",
|
| 346 |
+
y="count",
|
| 347 |
+
title="Daily Fetch Count",
|
| 348 |
+
tooltip=["date", "count", "full_date"],
|
| 349 |
height=350,
|
| 350 |
x_label_angle=-45,
|
| 351 |
container=False,
|
| 352 |
)
|
| 353 |
|
| 354 |
+
# Refresh analytics on load
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 355 |
demo.load(
|
| 356 |
+
fn=lambda: (
|
| 357 |
+
last_n_days_count_df("search", 14),
|
| 358 |
+
last_n_days_count_df("fetch", 14),
|
| 359 |
+
),
|
| 360 |
+
outputs=[search_plot, fetch_plot],
|
| 361 |
api_name=False,
|
| 362 |
)
|
| 363 |
|
| 364 |
+
# Expose MCP tools
|
| 365 |
+
gr.api(search, api_name="search")
|
| 366 |
+
gr.api(fetch, api_name="fetch")
|
| 367 |
|
| 368 |
|
| 369 |
if __name__ == "__main__":
|
| 370 |
# Launch with MCP server enabled
|
|
|
|
| 371 |
demo.launch(mcp_server=True, show_api=True)
|