YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Customizable Web Scraper

Overview

The Customizable Web Scraper is a lightweight Python tool that allows users to extract specific elements from any webpage using a simple graphical interface. Built with Streamlit, BeautifulSoup, and Pandas, this tool enables users to analyze HTML structure, select relevant tags, and download the extracted data in CSV format.

Features

βœ… User-friendly Streamlit interface
πŸ” Automatic detection of available HTML tags
πŸ“Œ Custom tag selection (h1, h2, p, a, img, ul, etc.)
πŸ“Š Displays scraped data in a structured table
πŸ“₯ Download extracted data as a CSV file

Installation

Prerequisites

Ensure you have Python 3.x installed on your system.

Steps

  1. Clone this repository or download the script:
    git clone https://github.com/your-repository/Customizable-Scraper.git
    cd Customizable-Scraper
    
  2. Install the required dependencies:
    pip install streamlit requests beautifulsoup4 pandas
    
  3. Run the Streamlit app:
    streamlit run app.py
    

Usage

  1. Enter a URL: Provide the webpage link you want to scrape.
  2. Analyze the page: The scraper will identify available HTML tags.
  3. Select tags: Choose which elements (headings, paragraphs, links, images, lists, etc.) to extract.
  4. Scrape Data: Click the "Scrape Data" button to fetch and display the extracted content.
  5. Download CSV: Export the scraped data as a CSV file for offline use.

Technologies Used

  • Streamlit – Interactive UI for user-friendly operation
  • Requests – Fetching webpage content
  • BeautifulSoup4 – Parsing and extracting HTML elements
  • Pandas – Structuring and exporting scraped data

Limitations

⚠️ This scraper cannot:

  • Extract data from JavaScript-rendered content
  • Access login-restricted or protected pages
  • Scrape sites that block requests in robots.txt

License

This project is open-source and available for personal and educational use.

Contributions

πŸ”Ή Contributions are welcome!
If you’d like to improve this project, feel free to fork the repository, make enhancements, and submit a Pull Request.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support