Spaces:
Sleeping
Sleeping
| <!-- Use this file to provide workspace-specific custom instructions to Copilot. For more details, visit https://code.visualstudio.com/docs/copilot/copilot-customization#_use-a-githubcopilotinstructionsmd-file --> | |
| # Web Scraper Project Instructions | |
| This is a Python Gradio application for web scraping that: | |
| - Scrapes text content from websites | |
| - Formats content as markdown | |
| - Generates sitemaps from page links | |
| - Provides MCP (Model Context Protocol) server functionality | |
| ## Key Libraries | |
| - gradio[mcp]: For the web interface and MCP server capabilities | |
| - requests: For HTTP requests | |
| - beautifulsoup4: For HTML parsing | |
| - markdownify: For converting HTML to markdown | |
| - urllib.parse: For URL handling | |
| ## Project Structure | |
| - `app.py`: Main web interface application | |
| - `mcp_server.py`: MCP server that exposes tools for AI integration | |
| ## MCP Tools | |
| The MCP server exposes three main tools: | |
| - `scrape_content`: Extract website content as markdown | |
| - `generate_sitemap`: Create sitemap from page links | |
| - `analyze_website`: Complete analysis with content and sitemap | |
| ## Code Style | |
| - Use type hints where appropriate | |
| - Include proper error handling for web requests | |
| - Follow PEP 8 style guidelines | |
| - Add docstrings for functions with clear parameter descriptions | |
| - MCP functions should have descriptive docstrings as they become tool descriptions | |