ZainabEman
/

Customizable-Web_Scrapper

Model card Files Files and versions

Customizable-Web_Scrapper / README.md

ZainabEman's picture

Upload 2 files

e146eba verified about 1 year ago

|

history blame contribute delete

2.4 kB

	# Customizable Web Scraper

	## Overview
	The Customizable Web Scraper is a lightweight Python tool that allows users to extract specific elements from any webpage using a simple graphical interface. Built with Streamlit, BeautifulSoup, and Pandas, this tool enables users to analyze HTML structure, select relevant tags, and download the extracted data in CSV format.

	## Features
	✅ User-friendly Streamlit interface
	🔍 Automatic detection of available HTML tags
	📌 Custom tag selection (`h1`, `h2`, `p`, `a`, `img`, `ul`, etc.)
	📊 Displays scraped data in a structured table
	📥 Download extracted data as a CSV file

	## Installation

	### Prerequisites
	Ensure you have Python 3.x installed on your system.

	### Steps
	1. Clone this repository or download the script:
	```sh
	git clone https://github.com/your-repository/Customizable-Scraper.git
	cd Customizable-Scraper
	```
	2. Install the required dependencies:
	```sh
	pip install streamlit requests beautifulsoup4 pandas
	```
	3. Run the Streamlit app:
	```sh
	streamlit run app.py
	```

	## Usage

	1. Enter a URL: Provide the webpage link you want to scrape.
	2. Analyze the page: The scraper will identify available HTML tags.
	3. Select tags: Choose which elements (headings, paragraphs, links, images, lists, etc.) to extract.
	4. Scrape Data: Click the "Scrape Data" button to fetch and display the extracted content.
	5. Download CSV: Export the scraped data as a CSV file for offline use.

	## Technologies Used
	- Streamlit – Interactive UI for user-friendly operation
	- Requests – Fetching webpage content
	- BeautifulSoup4 – Parsing and extracting HTML elements
	- Pandas – Structuring and exporting scraped data

	## Limitations
	⚠️ This scraper cannot:
	- Extract data from JavaScript-rendered content
	- Access login-restricted or protected pages
	- Scrape sites that block requests in robots.txt

	## License
	This project is open-source and available for personal and educational use.

	## Contributions
	🔹 Contributions are welcome!
	If you’d like to improve this project, feel free to fork the repository, make enhancements, and submit a Pull Request.