ZainabEman's picture
Upload 2 files
e146eba verified
# Customizable Web Scraper
## Overview
The **Customizable Web Scraper** is a lightweight Python tool that allows users to extract specific elements from any webpage using a simple graphical interface. Built with **Streamlit**, **BeautifulSoup**, and **Pandas**, this tool enables users to analyze HTML structure, select relevant tags, and download the extracted data in CSV format.
## Features
βœ… **User-friendly Streamlit interface**
πŸ” **Automatic detection of available HTML tags**
πŸ“Œ **Custom tag selection** (`h1`, `h2`, `p`, `a`, `img`, `ul`, etc.)
πŸ“Š **Displays scraped data in a structured table**
πŸ“₯ **Download extracted data as a CSV file**
## Installation
### Prerequisites
Ensure you have **Python 3.x** installed on your system.
### Steps
1. Clone this repository or download the script:
```sh
git clone https://github.com/your-repository/Customizable-Scraper.git
cd Customizable-Scraper
```
2. Install the required dependencies:
```sh
pip install streamlit requests beautifulsoup4 pandas
```
3. Run the Streamlit app:
```sh
streamlit run app.py
```
## Usage
1. **Enter a URL**: Provide the webpage link you want to scrape.
2. **Analyze the page**: The scraper will identify available HTML tags.
3. **Select tags**: Choose which elements (headings, paragraphs, links, images, lists, etc.) to extract.
4. **Scrape Data**: Click the **"Scrape Data"** button to fetch and display the extracted content.
5. **Download CSV**: Export the scraped data as a CSV file for offline use.
## Technologies Used
- **Streamlit** – Interactive UI for user-friendly operation
- **Requests** – Fetching webpage content
- **BeautifulSoup4** – Parsing and extracting HTML elements
- **Pandas** – Structuring and exporting scraped data
## Limitations
⚠️ This scraper **cannot**:
- Extract data from **JavaScript-rendered content**
- Access **login-restricted** or **protected** pages
- Scrape sites that block requests in **robots.txt**
## License
This project is **open-source** and available for personal and educational use.
## Contributions
πŸ”Ή Contributions are welcome!
If you’d like to improve this project, feel free to fork the repository, make enhancements, and submit a **Pull Request**.