| # Customizable Web Scraper | |
| ## Overview | |
| The **Customizable Web Scraper** is a lightweight Python tool that allows users to extract specific elements from any webpage using a simple graphical interface. Built with **Streamlit**, **BeautifulSoup**, and **Pandas**, this tool enables users to analyze HTML structure, select relevant tags, and download the extracted data in CSV format. | |
| ## Features | |
| β **User-friendly Streamlit interface** | |
| π **Automatic detection of available HTML tags** | |
| π **Custom tag selection** (`h1`, `h2`, `p`, `a`, `img`, `ul`, etc.) | |
| π **Displays scraped data in a structured table** | |
| π₯ **Download extracted data as a CSV file** | |
| ## Installation | |
| ### Prerequisites | |
| Ensure you have **Python 3.x** installed on your system. | |
| ### Steps | |
| 1. Clone this repository or download the script: | |
| ```sh | |
| git clone https://github.com/your-repository/Customizable-Scraper.git | |
| cd Customizable-Scraper | |
| ``` | |
| 2. Install the required dependencies: | |
| ```sh | |
| pip install streamlit requests beautifulsoup4 pandas | |
| ``` | |
| 3. Run the Streamlit app: | |
| ```sh | |
| streamlit run app.py | |
| ``` | |
| ## Usage | |
| 1. **Enter a URL**: Provide the webpage link you want to scrape. | |
| 2. **Analyze the page**: The scraper will identify available HTML tags. | |
| 3. **Select tags**: Choose which elements (headings, paragraphs, links, images, lists, etc.) to extract. | |
| 4. **Scrape Data**: Click the **"Scrape Data"** button to fetch and display the extracted content. | |
| 5. **Download CSV**: Export the scraped data as a CSV file for offline use. | |
| ## Technologies Used | |
| - **Streamlit** β Interactive UI for user-friendly operation | |
| - **Requests** β Fetching webpage content | |
| - **BeautifulSoup4** β Parsing and extracting HTML elements | |
| - **Pandas** β Structuring and exporting scraped data | |
| ## Limitations | |
| β οΈ This scraper **cannot**: | |
| - Extract data from **JavaScript-rendered content** | |
| - Access **login-restricted** or **protected** pages | |
| - Scrape sites that block requests in **robots.txt** | |
| ## License | |
| This project is **open-source** and available for personal and educational use. | |
| ## Contributions | |
| πΉ Contributions are welcome! | |
| If youβd like to improve this project, feel free to fork the repository, make enhancements, and submit a **Pull Request**. | |