webscrapingexample / README.md
LovnishVerma's picture
Update README.md
102d6ed verified
---
title: Webscrapingexample
emoji: 🐨
colorFrom: pink
colorTo: purple
sdk: docker
pinned: false
---
# Flask Web Scraper
Live Demo: https://lovnishverma-webscrapingexample.hf.space/
Live Demo: https://webscaraping-simplified.onrender.com
GitHub Repo Link: https://github.com/lovnishverma/webscaraping-simplified
## Overview
This is a simple **Flask-based web scraping application** that allows users to enter a URL and an HTML tag to extract and display content from that webpage.
![image](https://github.com/user-attachments/assets/13838a50-71e5-411d-ac7c-034e5caac405)
## Features
βœ… **User-friendly Web Interface:** Enter URL and tag to scrape data.
βœ… **Web Scraping with BeautifulSoup:** Extracts text from specified HTML tags.
βœ… **Error Handling:** Displays an error if URL or tag is missing.
βœ… **Minimal and Lightweight:** Uses Flask for the backend.
## Requirements
Ensure you have the following installed before running the project:
- 🐍 Python
- 🌐 Flask
- πŸ”— Requests
- πŸ—οΈ BeautifulSoup4 (bs4)
- πŸ¦„ Gunicorn (for production/deployment)
You can install dependencies locally using:
```sh
pip install -r requirements.txt
```
## Project Structure
πŸ“‚ **Project Directory:**
```
/your_project_directory
│── app.py # πŸ—οΈ Main Flask application
│── Dockerfile # 🐳 Docker configuration for Hugging Face
│── requirements.txt # πŸ“¦ Project dependencies
│── templates/
β”‚ │── index.html # πŸ“„ Home page with form
β”‚ │── result.html # πŸ“„ Page to display scraped data
│── README.md # πŸ“– Project Documentation
```
## Usage & Deployment (Hugging Face Spaces)
This project is configured to easily deploy on **Hugging Face Spaces** using the Docker SDK.
πŸš€ **Deploying to Hugging Face:**
1. Create a new Space on [Hugging Face](https://huggingface.co/spaces).
2. Set the **Space SDK** to **Docker** and choose the **Blank** template.
3. Upload your project files (`app.py`, `Dockerfile`, `requirements.txt`, and the `templates` folder) to the Space.
4. The Space will automatically build the container and start the app on port `7860` using `gunicorn`.
🌍 **Access the Web App:**
Once the build is complete, your app will be live at your Hugging Face Space URL:
```
[https://yourusername-yourspacename.hf.space/](https://yourusername-yourspacename.hf.space/)
```
πŸ“ **Enter URL and Tag:**
1. Provide a valid URL.
2. Specify an HTML tag (e.g., `p`, `h1`, `div`).
3. Click submit to fetch and display the data.
## Code
```python
from flask import Flask, render_template, request # Flask is used to create a web app
import requests # To send HTTP requests
from bs4 import BeautifulSoup # BeautifulSoup is used for web scraping
app = Flask(__name__) # Creating a Flask app instance
# Home route - Displays the form
@app.route("/")
def index():
return render_template("index.html") # Renders the index.html template
# Scraping route - Scrapes data based on user input
@app.route("/scrape", methods=["POST"])
def scrape():
url, tag = request.form.get("url"), request.form.get("tag") # Get URL and tag from the form
if not url or not tag: # If any value is missing, return an error
return render_template("result.html", error="Both URL and Tag are required.")
# Send an HTTP request to fetch the webpage content
response = requests.get(url, headers={"User-Agent": "Mozilla/5.0"})
response.raise_for_status() # Raise an error if the request fails
# Parse the HTML content of the page
soup = BeautifulSoup(response.text, "html.parser")
# Extract text from all occurrences of the given tag
elements = [e.get_text() for e in soup.find_all(tag)]
# Render the result page with extracted data
return render_template("result.html", tag=tag, url=url, title=soup.title.string or "No Title", elements=elements)
# Run the Flask server
if __name__ == "__main__":
app.run(debug=True) # Debug mode is enabled to show errors in the console
```
## Notes
⚠️ **Important Considerations:**
* Works only with publicly accessible websites.
* Some websites may block requests (**Use user-agent headers to avoid 403 errors**).
* Handles missing input errors but does not handle all exceptions.
## Future Improvements
✨ **Potential Enhancements:**
* πŸ”„ Add support for multiple tags.
* πŸ“œ Implement pagination for large data sets.
* πŸ’Ύ Store scraped data in a database.
---
πŸŽ‰ **Happy Coding! πŸš€**