File size: 2,395 Bytes
e146eba
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
# Customizable Web Scraper  

## Overview  
The **Customizable Web Scraper** is a lightweight Python tool that allows users to extract specific elements from any webpage using a simple graphical interface. Built with **Streamlit**, **BeautifulSoup**, and **Pandas**, this tool enables users to analyze HTML structure, select relevant tags, and download the extracted data in CSV format.  

## Features  
โœ… **User-friendly Streamlit interface**  
๐Ÿ” **Automatic detection of available HTML tags**  
๐Ÿ“Œ **Custom tag selection** (`h1`, `h2`, `p`, `a`, `img`, `ul`, etc.)  
๐Ÿ“Š **Displays scraped data in a structured table**  
๐Ÿ“ฅ **Download extracted data as a CSV file**  

## Installation  

### Prerequisites  
Ensure you have **Python 3.x** installed on your system.  

### Steps  
1. Clone this repository or download the script:  
   ```sh

   git clone https://github.com/your-repository/Customizable-Scraper.git

   cd Customizable-Scraper

   ```  
2. Install the required dependencies:  
   ```sh

   pip install streamlit requests beautifulsoup4 pandas

   ```  
3. Run the Streamlit app:  
   ```sh

   streamlit run app.py

   ```  

## Usage  

1. **Enter a URL**: Provide the webpage link you want to scrape.  
2. **Analyze the page**: The scraper will identify available HTML tags.  
3. **Select tags**: Choose which elements (headings, paragraphs, links, images, lists, etc.) to extract.  
4. **Scrape Data**: Click the **"Scrape Data"** button to fetch and display the extracted content.  
5. **Download CSV**: Export the scraped data as a CSV file for offline use.  

## Technologies Used  
- **Streamlit** โ€“ Interactive UI for user-friendly operation  
- **Requests** โ€“ Fetching webpage content  
- **BeautifulSoup4** โ€“ Parsing and extracting HTML elements  
- **Pandas** โ€“ Structuring and exporting scraped data  

## Limitations  
โš ๏ธ This scraper **cannot**:  
- Extract data from **JavaScript-rendered content**  
- Access **login-restricted** or **protected** pages  
- Scrape sites that block requests in **robots.txt**  

## License  
This project is **open-source** and available for personal and educational use.  

## Contributions  
๐Ÿ”น Contributions are welcome!  
If youโ€™d like to improve this project, feel free to fork the repository, make enhancements, and submit a **Pull Request**.