ZainabEman's picture
Upload 2 files
e146eba verified

Customizable Web Scraper

Overview

The Customizable Web Scraper is a lightweight Python tool that allows users to extract specific elements from any webpage using a simple graphical interface. Built with Streamlit, BeautifulSoup, and Pandas, this tool enables users to analyze HTML structure, select relevant tags, and download the extracted data in CSV format.

Features

โœ… User-friendly Streamlit interface
๐Ÿ” Automatic detection of available HTML tags
๐Ÿ“Œ Custom tag selection (h1, h2, p, a, img, ul, etc.)
๐Ÿ“Š Displays scraped data in a structured table
๐Ÿ“ฅ Download extracted data as a CSV file

Installation

Prerequisites

Ensure you have Python 3.x installed on your system.

Steps

  1. Clone this repository or download the script:
    git clone https://github.com/your-repository/Customizable-Scraper.git
    cd Customizable-Scraper
    
  2. Install the required dependencies:
    pip install streamlit requests beautifulsoup4 pandas
    
  3. Run the Streamlit app:
    streamlit run app.py
    

Usage

  1. Enter a URL: Provide the webpage link you want to scrape.
  2. Analyze the page: The scraper will identify available HTML tags.
  3. Select tags: Choose which elements (headings, paragraphs, links, images, lists, etc.) to extract.
  4. Scrape Data: Click the "Scrape Data" button to fetch and display the extracted content.
  5. Download CSV: Export the scraped data as a CSV file for offline use.

Technologies Used

  • Streamlit โ€“ Interactive UI for user-friendly operation
  • Requests โ€“ Fetching webpage content
  • BeautifulSoup4 โ€“ Parsing and extracting HTML elements
  • Pandas โ€“ Structuring and exporting scraped data

Limitations

โš ๏ธ This scraper cannot:

  • Extract data from JavaScript-rendered content
  • Access login-restricted or protected pages
  • Scrape sites that block requests in robots.txt

License

This project is open-source and available for personal and educational use.

Contributions

๐Ÿ”น Contributions are welcome!
If youโ€™d like to improve this project, feel free to fork the repository, make enhancements, and submit a Pull Request.