Spaces:

Spooker
/

webs

Running

App Files Files Community

webs / README.md

Spooker

Upload 106 files

c92aa92 verified about 2 months ago

preview code

raw

history blame contribute delete

17.5 kB

	---
	title: Open-WebSearch MCP
	emoji: 🔎
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	app_port: 3000
	pinned: false
	---

	<div align="center">

	# Open-WebSearch MCP Server

	[![ModelScope](https://img.shields.io/endpoint?url=https://gist.githubusercontent.com/Aas-ee/3af09e0f4c7821fb2e9acb96483a5ff0/raw/badge.json&color=%23de5a16)](https://www.modelscope.cn/mcp/servers/Aasee1/open-webSearch)
	[![Trust Score](https://archestra.ai/mcp-catalog/api/badge/quality/Aas-ee/open-webSearch)](https://archestra.ai/mcp-catalog/aas-ee__open-websearch)
	[![smithery badge](https://smithery.ai/badge/@Aas-ee/open-websearch)](https://smithery.ai/server/@Aas-ee/open-websearch)
	![Version](https://img.shields.io/github/v/release/Aas-ee/open-websearch)
	![License](https://img.shields.io/github/license/Aas-ee/open-websearch)
	![Issues](https://img.shields.io/github/issues/Aas-ee/open-websearch)

	[🇨🇳 中文](./README-zh.md) \| 🇺🇸 English

	</div>

	A Model Context Protocol (MCP) server based on multi-engine search results, supporting free web search without API keys.

	## Features

	- Web search using multi-engine results
	- bing
	- baidu
	- ~~linux.do~~ temporarily unsupported
	- csdn
	- duckduckgo
	- exa
	- brave
	- juejin
	- HTTP proxy configuration support for accessing restricted resources
	- No API keys or authentication required
	- Returns structured results with titles, URLs, and descriptions
	- Configurable number of results per search
	- Customizable default search engine
	- Support for fetching individual article content
	- csdn
	- github (README files)
	- generic HTTP(S) page / Markdown content

	## TODO
	- Support for ~~Bing~~ (already supported), ~~DuckDuckGo~~ (already supported), ~~Exa~~ (already supported), ~~Brave~~ (already supported), Google and other search engines
	- Support for more blogs, forums, and social platforms
	- Optimize article content extraction, add support for more sites
	- ~~Support for GitHub README fetching~~ (already supported)


	## Deploy to Hugging Face Spaces

	This repository can be deployed directly as a Docker Space.

	### 1. Create a Docker Space

	Create a new Space on Hugging Face and choose Docker as the SDK, or push this repository to an existing Docker Space.

	### 2. Required runtime variables

	In Space Settings → Variables and secrets, configure:

	\| Variable \| Recommended Value \| Description \|
	\|----------\|-------------------\|-------------\|
	\| `MODE` \| `http` \| Run the HTTP server only in Spaces \|
	\| `PORT` \| `3000` \| Must match `app_port` in the YAML header \|
	\| `ENABLE_CORS` \| `true` \| Allows browser-based access if needed \|
	\| `CORS_ORIGIN` \| `*` \| Relaxed CORS for public/demo use \|
	\| `DEFAULT_SEARCH_ENGINE` \| `duckduckgo` \| A reliable default for public Spaces \|

	Optional:

	\| Variable \| Example Value \| Description \|
	\|----------\|---------------\|-------------\|
	\| `ALLOWED_SEARCH_ENGINES` \| `duckduckgo,bing,baidu,csdn,juejin` \| Restrict allowed engines \|
	\| `USE_PROXY` \| `false` \| Enable HTTP proxy support \|
	\| `PROXY_URL` \| `http://127.0.0.1:7890` \| Proxy URL when proxying is enabled \|

	### 3. Public endpoints after deployment

	- Home page: `/`
	- Health check: `/healthz`
	- MCP endpoint: `/mcp`
	- Legacy SSE endpoint: `/sse`

	After the Space is live, your MCP URL will look like:

	```text
	https://<your-space-subdomain>.hf.space/mcp
	```

	### 4. Notes

	- The Space UI loads `/`, so this project now serves a lightweight landing page there.
	- If you change the runtime port, update both `PORT` and `app_port` to the same value.

	## Installation Guide

	### NPX Quick Start (Recommended)

	The fastest way to get started:

	```bash
	# Basic usage
	npx open-websearch@latest

	# With environment variables (Linux/macOS)
	DEFAULT_SEARCH_ENGINE=duckduckgo ENABLE_CORS=true npx open-websearch@latest

	# Windows PowerShell
	$env:DEFAULT_SEARCH_ENGINE="duckduckgo"; $env:ENABLE_CORS="true"; npx open-websearch@latest

	# Windows CMD
	set MODE=stdio && set DEFAULT_SEARCH_ENGINE=duckduckgo && npx open-websearch@latest

	# Cross-platform (requires cross-env, Used for local development)
	npm install -g open-websearch
	npx cross-env DEFAULT_SEARCH_ENGINE=duckduckgo ENABLE_CORS=true open-websearch
	```

	Environment Variables:

	\| Variable \| Default \| Options \| Description \|
	\|----------\|-------------------------\|---------\|-------------\|
	\| `ENABLE_CORS` \| `false` \| `true`, `false` \| Enable CORS \|
	\| `CORS_ORIGIN` \| `*` \| Any valid origin \| CORS origin configuration \|
	\| `DEFAULT_SEARCH_ENGINE` \| `bing` \| `bing`, `duckduckgo`, `exa`, `brave`, `baidu`, `csdn`, `juejin` \| Default search engine \|
	\| `USE_PROXY` \| `false` \| `true`, `false` \| Enable HTTP proxy \|
	\| `PROXY_URL` \| `http://127.0.0.1:7890` \| Any valid URL \| Proxy server URL \|
	\| `MODE` \| `both` \| `both`, `http`, `stdio` \| Server mode: both HTTP+STDIO, HTTP only, or STDIO only \|
	\| `PORT` \| `3000` \| 1-65535 \| Server port \|
	\| `ALLOWED_SEARCH_ENGINES` \| empty (all available) \| Comma-separated engine names \| Limit which search engines can be used; if the default engine is not in this list, the first allowed engine becomes the default \|
	\| `MCP_TOOL_SEARCH_NAME` \| `search` \| Valid MCP tool name \| Custom name for the search tool \|
	\| `MCP_TOOL_FETCH_LINUXDO_NAME` \| `fetchLinuxDoArticle` \| Valid MCP tool name \| Custom name for the Linux.do article fetch tool \|
	\| `MCP_TOOL_FETCH_CSDN_NAME` \| `fetchCsdnArticle` \| Valid MCP tool name \| Custom name for the CSDN article fetch tool \|
	\| `MCP_TOOL_FETCH_GITHUB_NAME` \| `fetchGithubReadme` \| Valid MCP tool name \| Custom name for the GitHub README fetch tool \|
	\| `MCP_TOOL_FETCH_JUEJIN_NAME` \| `fetchJuejinArticle` \| Valid MCP tool name \| Custom name for the Juejin article fetch tool \|
	\| `MCP_TOOL_FETCH_WEB_NAME` \| `fetchWebContent` \| Valid MCP tool name \| Custom name for generic web/Markdown fetch tool \|

	Common configurations:
	```bash
	# Enable proxy for restricted regions
	USE_PROXY=true PROXY_URL=http://127.0.0.1:7890 npx open-websearch@latest

	# Full configuration
	DEFAULT_SEARCH_ENGINE=duckduckgo ENABLE_CORS=true USE_PROXY=true PROXY_URL=http://127.0.0.1:7890 PORT=8080 npx open-websearch@latest
	```

	### Local Installation

	1. Clone or download this repository
	2. Install dependencies:
	```bash
	npm install
	```
	3. Build the server:
	```bash
	npm run build
	```
	4. Add the server to your MCP configuration:

	Cherry Studio:
	```json
	{
	"mcpServers": {
	"web-search": {
	"name": "Web Search MCP",
	"type": "streamableHttp",
	"description": "Multi-engine web search with article fetching",
	"isActive": true,
	"baseUrl": "http://localhost:3000/mcp"
	}
	}
	}
	```

	VSCode (Claude Dev Extension):
	```json
	{
	"mcpServers": {
	"web-search": {
	"transport": {
	"type": "streamableHttp",
	"url": "http://localhost:3000/mcp"
	}
	},
	"web-search-sse": {
	"transport": {
	"type": "sse",
	"url": "http://localhost:3000/sse"
	}
	}
	}
	}
	```

	Claude Desktop:
	```json
	{
	"mcpServers": {
	"web-search": {
	"type": "http",
	"url": "http://localhost:3000/mcp"
	},
	"web-search-sse": {
	"type": "sse",
	"url": "http://localhost:3000/sse"
	}
	}
	}
	```

	NPX Command Line Configuration:
	```json
	{
	"mcpServers": {
	"web-search": {
	"args": [
	"open-websearch@latest"
	],
	"command": "npx",
	"env": {
	"MODE": "stdio",
	"DEFAULT_SEARCH_ENGINE": "duckduckgo",
	"ALLOWED_SEARCH_ENGINES": "duckduckgo,bing,exa"
	}
	}
	}
	}
	```

	Local STDIO Configuration for Cherry Studio (Windows):
	```json
	{
	"mcpServers": {
	"open-websearch-local": {
	"command": "node",
	"args": ["C:/path/to/your/project/build/index.js"],
	"env": {
	"MODE": "stdio",
	"DEFAULT_SEARCH_ENGINE": "duckduckgo",
	"ALLOWED_SEARCH_ENGINES": "duckduckgo,bing,exa"
	}
	}
	}
	}
	```

	### Docker Deployment

	Quick deployment using Docker Compose:

	```bash
	docker-compose up -d
	```

	Or use Docker directly:
	```bash
	docker run -d --name web-search -p 3000:3000 -e ENABLE_CORS=true -e CORS_ORIGIN=* ghcr.io/aas-ee/open-web-search:latest
	```

	Environment variable configuration:

	\| Variable \| Default \| Options \| Description \|
	\|----------\|-------------------------\|---------\|-------------\|
	\| `ENABLE_CORS` \| `false` \| `true`, `false` \| Enable CORS \|
	\| `CORS_ORIGIN` \| `*` \| Any valid origin \| CORS origin configuration \|
	\| `DEFAULT_SEARCH_ENGINE` \| `bing` \| `bing`, `duckduckgo`, `exa`, `brave` \| Default search engine \|
	\| `USE_PROXY` \| `false` \| `true`, `false` \| Enable HTTP proxy \|
	\| `PROXY_URL` \| `http://127.0.0.1:7890` \| Any valid URL \| Proxy server URL \|
	\| `PORT` \| `3000` \| 1-65535 \| Server port \|

	Then configure in your MCP client:
	```json
	{
	"mcpServers": {
	"web-search": {
	"name": "Web Search MCP",
	"type": "streamableHttp",
	"description": "Multi-engine web search with article fetching",
	"isActive": true,
	"baseUrl": "http://localhost:3000/mcp"
	},
	"web-search-sse": {
	"transport": {
	"name": "Web Search MCP",
	"type": "sse",
	"description": "Multi-engine web search with article fetching",
	"isActive": true,
	"url": "http://localhost:3000/sse"
	}
	}
	}
	}
	```

	## Usage Guide

	The server provides six tools: `search`, `fetchLinuxDoArticle`, `fetchCsdnArticle`, `fetchGithubReadme`, `fetchJuejinArticle`, and `fetchWebContent`.

	### search Tool Usage

	```typescript
	{
	"query": string, // Search query
	"limit": number, // Optional: Number of results to return (default: 10)
	"engines": string[] // Optional: Engines to use (bing,baidu,linuxdo,csdn,duckduckgo,exa,brave,juejin) default bing
	}
	```

	Usage example:
	```typescript
	use_mcp_tool({
	server_name: "web-search",
	tool_name: "search",
	arguments: {
	query: "search content",
	limit: 3, // Optional parameter
	engines: ["bing", "csdn", "duckduckgo", "exa", "brave", "juejin"] // Optional parameter, supports multi-engine combined search
	}
	})
	```

	Response example:
	```json
	[
	{
	"title": "Example Search Result",
	"url": "https://example.com",
	"description": "Description text of the search result...",
	"source": "Source",
	"engine": "Engine used"
	}
	]
	```

	### fetchCsdnArticle Tool Usage

	Used to fetch complete content of CSDN blog articles.

	```typescript
	{
	"url": string // URL from CSDN search results using the search tool
	}
	```

	Usage example:
	```typescript
	use_mcp_tool({
	server_name: "web-search",
	tool_name: "fetchCsdnArticle",
	arguments: {
	url: "https://blog.csdn.net/xxx/article/details/xxx"
	}
	})
	```

	Response example:
	```json
	[
	{
	"content": "Example search result"
	}
	]
	```

	### fetchLinuxDoArticle Tool Usage

	Used to fetch complete content of Linux.do forum articles.

	```typescript
	{
	"url": string // URL from linuxdo search results using the search tool
	}
	```

	Usage example:
	```typescript
	use_mcp_tool({
	server_name: "web-search",
	tool_name: "fetchLinuxDoArticle",
	arguments: {
	url: "https://xxxx.json"
	}
	})
	```

	Response example:
	```json
	[
	{
	"content": "Example search result"
	}
	]
	```

	### fetchGithubReadme Tool Usage

	Used to fetch README content from GitHub repositories.

	```typescript
	{
	"url": string // GitHub repository URL (supports HTTPS, SSH formats)
	}
	```

	Usage example:
	```typescript
	use_mcp_tool({
	server_name: "web-search",
	tool_name: "fetchGithubReadme",
	arguments: {
	url: "https://github.com/Aas-ee/open-webSearch"
	}
	})
	```

	Supported URL formats:
	- HTTPS: `https://github.com/owner/repo`
	- HTTPS with .git: `https://github.com/owner/repo.git`
	- SSH: `git@github.com:owner/repo.git`
	- URLs with parameters: `https://github.com/owner/repo?tab=readme`

	Response example:
	```json
	[
	{
	"content": "<div align=\"center\">\n\n# Open-WebSearch MCP Server..."
	}
	]
	```

	### fetchWebContent Tool Usage

	Fetch content directly from public HTTP(S) links, including Markdown files (`.md`) and ordinary web pages.

	```typescript
	{
	"url": string, // Public HTTP(S) URL
	"maxChars": number // Optional: max returned content length (1000-200000, default 30000)
	}
	```

	Usage example:
	```typescript
	use_mcp_tool({
	server_name: "web-search",
	tool_name: "fetchWebContent",
	arguments: {
	url: "https://raw.githubusercontent.com/Aas-ee/open-webSearch/main/README.md",
	maxChars: 12000
	}
	})
	```

	Response example:
	```json
	{
	"url": "https://raw.githubusercontent.com/Aas-ee/open-webSearch/main/README.md",
	"finalUrl": "https://raw.githubusercontent.com/Aas-ee/open-webSearch/main/README.md",
	"contentType": "text/plain; charset=utf-8",
	"title": "",
	"truncated": false,
	"content": "# Open-WebSearch MCP Server ..."
	}
	```

	### fetchJuejinArticle Tool Usage

	Used to fetch complete content of Juejin articles.

	```typescript
	{
	"url": string // Juejin article URL from search results
	}
	```

	Usage example:
	```typescript
	use_mcp_tool({
	server_name: "web-search",
	tool_name: "fetchJuejinArticle",
	arguments: {
	url: "https://juejin.cn/post/7520959840199360563"
	}
	})
	```

	Supported URL format:
	- `https://juejin.cn/post/{article_id}`

	Response example:
	```json
	[
	{
	"content": "🚀 开源 AI 联网搜索工具：Open-WebSearch MCP 全新升级，支持多引擎 + 流式响应..."
	}
	]
	```

	## Usage Limitations

	Since this tool works by scraping multi-engine search results, please note the following important limitations:

	1. Rate Limiting:
	- Too many searches in a short time may cause the used engines to temporarily block requests
	- Recommendations:
	- Maintain reasonable search frequency
	- Use the limit parameter judiciously
	- Add delays between searches when necessary

	2. Result Accuracy:
	- Depends on the HTML structure of corresponding engines, may fail when engines update
	- Some results may lack metadata like descriptions
	- Complex search operators may not work as expected

	3. Legal Terms:
	- This tool is for personal use only
	- Please comply with the terms of service of corresponding engines
	- Implement appropriate rate limiting based on your actual use case

	4. Search Engine Configuration:
	- Default search engine can be set via the `DEFAULT_SEARCH_ENGINE` environment variable
	- Supported engines: bing, duckduckgo, exa, brave
	- The default engine is used when searching specific websites

	5. Proxy Configuration:
	- HTTP proxy can be configured when certain search engines are unavailable in specific regions
	- Enable proxy with environment variable `USE_PROXY=true`
	- Configure proxy server address with `PROXY_URL`

	## Contributing

	Welcome to submit issue reports and feature improvement suggestions!

	### Contributor Guide

	If you want to fork this repository and publish your own Docker image, you need to make the following configurations:

	#### GitHub Secrets Configuration

	To enable automatic Docker image building and publishing, please add the following secrets in your GitHub repository settings (Settings → Secrets and variables → Actions):

	Required Secrets:
	- `GITHUB_TOKEN`: Automatically provided by GitHub (no setup needed)

	Optional Secrets (for Alibaba Cloud ACR):
	- `ACR_REGISTRY`: Your Alibaba Cloud Container Registry URL (e.g., `registry.cn-hangzhou.aliyuncs.com`)
	- `ACR_USERNAME`: Your Alibaba Cloud ACR username
	- `ACR_PASSWORD`: Your Alibaba Cloud ACR password
	- `ACR_IMAGE_NAME`: Your image name in ACR (e.g., `your-namespace/open-web-search`)

	#### CI/CD Workflow

	The repository includes a GitHub Actions workflow (`.github/workflows/docker.yml`) that automatically:

	1. Trigger Conditions:
	- Push to `main` branch
	- Push version tags (`v*`)
	- Manual workflow trigger

	2. Build and Push to:
	- GitHub Container Registry (ghcr.io) - always enabled
	- Alibaba Cloud Container Registry - only enabled when ACR secrets are configured

	3. Image Tags:
	- `ghcr.io/your-username/open-web-search:latest`
	- `your-acr-address/your-image-name:latest` (if ACR is configured)

	#### Fork and Publish Steps:

	1. Fork the repository to your GitHub account
	2. Configure secrets (if you need ACR publishing):
	- Go to Settings → Secrets and variables → Actions in your forked repository
	- Add the ACR-related secrets listed above
	3. Push changes to the `main` branch or create version tags
	4. GitHub Actions will automatically build and push your Docker image
	5. Use your image, update the Docker command:
	```bash
	docker run -d --name web-search -p 3000:3000 -e ENABLE_CORS=true -e CORS_ORIGIN=* ghcr.io/your-username/open-web-search:latest
	```

	#### Notes:
	- If you don't configure ACR secrets, the workflow will only publish to GitHub Container Registry
	- Make sure your GitHub repository has Actions enabled
	- The workflow will use your GitHub username (converted to lowercase) as the GHCR image name

	<div align="center">

	## Star History
	If you find this project helpful, please consider giving it a ⭐ Star!

	[![Star History Chart](https://api.star-history.com/svg?repos=Aas-ee/open-webSearch&type=Date)](https://www.star-history.com/#Aas-ee/open-webSearch&Date)

	</div>