Spaces:
Runtime error
Runtime error
| title: web search MCP-server | |
| sdk: gradio | |
| colorFrom: green | |
| colorTo: green | |
| short_description: MCP server for general and custom search on web | |
| sdk_version: 5.34.0 | |
| tags: | |
| - mcp-server-track | |
| app_file: app.py | |
| pinned: true | |
| # Search Tool | |
| ## Overview | |
| **Search Tool** is a modular Python framework for performing advanced web searches, scraping content from search results, and analyzing the retrieved information using AI-powered models. The project is designed for extensibility, allowing easy integration of new search engines, scrapers, and analyzers. | |
| ## Demo video | |
| Link: https://drive.google.com/file/d/11bHRCr0tdAkCEtwKOiuzzfAp7RgZk-si/view?usp=sharing | |
|  | |
| ## Features | |
| - **Custom Site Search:** Search within a specified list of websites. | |
| - **Custom Domain Search:** Restrict searches to specific domains (e.g., `.edu`, `.gov`). | |
| - **General Web Search:** Perform open web searches. | |
| - **Content Scraping:** Extracts main textual content from URLs using [trafilatura](https://trafilatura.readthedocs.io/). | |
| - **AI Analysis:** Summarizes and analyzes scraped content using OpenAI models. | |
| - **Validation:** Ensures URLs are valid before processing. | |
| - **Extensible Architecture:** Easily add new searchers, scrapers, or analyzers. | |
| ## Project Structure | |
| ``` | |
| search_tool/ | |
| βββ src/ | |
| β βββ analyzer/ # AI-powered analyzers (e.g., OpenAI) | |
| β βββ core/ | |
| β β βββ factory/ # Factories for searcher, scraper, | |
| β β βββ interface/ # Abstract interfaces for extensibility | |
| β β βββ types.py # Enums and constants | |
| β βββ mcp_servers/ # MCP server integration | |
| β βββ models/ # Pydantic models for data validation | |
| β βββ scraper/ # Web scrapers (e.g., Trafilatura) | |
| β βββ searcher/ # Search engine integrations | |
| β βββ tools/ # User-facing tool functions | |
| β βββ utils/ # Utility functions (e.g., URL validation) | |
| βββ test.py # Example/test script | |
| βββ requirements.txt # Python dependencies | |
| βββ pyproject.toml # Project metadata and dependencies | |
| βββ .env # Environment variables (e.g., API keys) | |
| βββ README.md # Project documentation | |
| ``` | |
| ## Installation | |
| 1. **Clone the repository:** | |
| ```sh | |
| git clone https://github.com/ola172/web-search-mcp-server.git | |
| cd search_tool | |
| ``` | |
| 2. **Set up a virtual environment (recommended):** | |
| ```sh | |
| python3 -m venv .venv | |
| source .venv/bin/activate | |
| ``` | |
| 3. **Install dependencies:** | |
| ```sh | |
| pip install -r requirements.txt | |
| ``` | |
| 4. **Configure environment variables:** | |
| - Copy `.env.example` to `.env` | |
| - Add your secrets: | |
| ## Usage | |
| ### Core Tools | |
| Each tool validates input, performs the search, scrapes the results, and analyzes the content. | |
| - **General Web Search:** `search_on_web` | |
| - **Custom Sites Search:** `search_custom_sites` | |
| - **Custom Domains Search:** `search_custom_domain` | |
| ### MCP Server Integration | |
| The project includes an MCP server (`web_search_server.py`) for exposing search tools as mcp tools. | |
| ## Extending the Framework | |
| - **Add a new searcher:** Implement the `SearchInterface` and register it in `SearcherFactory`. | |
| - **Add a new scraper:** Implement the `ScraperInterface` and register it in `ScraperFactory`. | |
| - **Add a new analyzer:** Implement the `AnalyzerInterface` and register it in `AnalyzerFactory`. | |
| ## Configuration | |
| - **API Keys:** Store sensitive keys (e.g., OpenAI) in the `.env` file. | |
| - **Search Engine IDs:** For Google Custom Search, configure `API_KEY` and `SEARCH_ENGINE_ID` in the relevant modules. | |
| ## Dependencies | |
| - `openai` | |
| - `trafilatura` | |
| - `pydantic` | |
| - `googlesearch-python` | |
| - `python-dotenv` | |
| - `google-api-python-client` | |
| See `requirements.txt` for the full list. | |
| ## License | |
| This project is for educational and research purposes. Please ensure compliance with the terms of service of any third-party APIs used. | |
| ## Acknowledgements | |
| - OpenAI | |
| - Trafilatura | |
| - Google Custom Search | |
| For questions or contributions, please open an issue or pull request. |