Customizable Web Scraper
Overview
The Customizable Web Scraper is a lightweight Python tool that allows users to extract specific elements from any webpage using a simple graphical interface. Built with Streamlit, BeautifulSoup, and Pandas, this tool enables users to analyze HTML structure, select relevant tags, and download the extracted data in CSV format.
Features
โ
User-friendly Streamlit interface
๐ Automatic detection of available HTML tags
๐ Custom tag selection (h1, h2, p, a, img, ul, etc.)
๐ Displays scraped data in a structured table
๐ฅ Download extracted data as a CSV file
Installation
Prerequisites
Ensure you have Python 3.x installed on your system.
Steps
- Clone this repository or download the script:
git clone https://github.com/your-repository/Customizable-Scraper.git cd Customizable-Scraper - Install the required dependencies:
pip install streamlit requests beautifulsoup4 pandas - Run the Streamlit app:
streamlit run app.py
Usage
- Enter a URL: Provide the webpage link you want to scrape.
- Analyze the page: The scraper will identify available HTML tags.
- Select tags: Choose which elements (headings, paragraphs, links, images, lists, etc.) to extract.
- Scrape Data: Click the "Scrape Data" button to fetch and display the extracted content.
- Download CSV: Export the scraped data as a CSV file for offline use.
Technologies Used
- Streamlit โ Interactive UI for user-friendly operation
- Requests โ Fetching webpage content
- BeautifulSoup4 โ Parsing and extracting HTML elements
- Pandas โ Structuring and exporting scraped data
Limitations
โ ๏ธ This scraper cannot:
- Extract data from JavaScript-rendered content
- Access login-restricted or protected pages
- Scrape sites that block requests in robots.txt
License
This project is open-source and available for personal and educational use.
Contributions
๐น Contributions are welcome!
If youโd like to improve this project, feel free to fork the repository, make enhancements, and submit a Pull Request.