Spaces:
Running
title: Webscrapingexample
emoji: π¨
colorFrom: pink
colorTo: purple
sdk: docker
pinned: false
Flask Web Scraper
Live Demo: https://lovnishverma-webscrapingexample.hf.space/
Live Demo: https://webscaraping-simplified.onrender.com
GitHub Repo Link: https://github.com/lovnishverma/webscaraping-simplified
Overview
This is a simple Flask-based web scraping application that allows users to enter a URL and an HTML tag to extract and display content from that webpage.
Features
β User-friendly Web Interface: Enter URL and tag to scrape data.
β Web Scraping with BeautifulSoup: Extracts text from specified HTML tags.
β Error Handling: Displays an error if URL or tag is missing.
β Minimal and Lightweight: Uses Flask for the backend.
Requirements
Ensure you have the following installed before running the project:
- π Python
- π Flask
- π Requests
- ποΈ BeautifulSoup4 (bs4)
- π¦ Gunicorn (for production/deployment)
You can install dependencies locally using:
pip install -r requirements.txt
Project Structure
π Project Directory:
/your_project_directory
βββ app.py # ποΈ Main Flask application
βββ Dockerfile # π³ Docker configuration for Hugging Face
βββ requirements.txt # π¦ Project dependencies
βββ templates/
β βββ index.html # π Home page with form
β βββ result.html # π Page to display scraped data
βββ README.md # π Project Documentation
Usage & Deployment (Hugging Face Spaces)
This project is configured to easily deploy on Hugging Face Spaces using the Docker SDK.
π Deploying to Hugging Face:
- Create a new Space on Hugging Face.
- Set the Space SDK to Docker and choose the Blank template.
- Upload your project files (
app.py,Dockerfile,requirements.txt, and thetemplatesfolder) to the Space. - The Space will automatically build the container and start the app on port
7860usinggunicorn.
π Access the Web App: Once the build is complete, your app will be live at your Hugging Face Space URL:
[https://yourusername-yourspacename.hf.space/](https://yourusername-yourspacename.hf.space/)
π Enter URL and Tag:
- Provide a valid URL.
- Specify an HTML tag (e.g.,
p,h1,div). - Click submit to fetch and display the data.
Code
from flask import Flask, render_template, request # Flask is used to create a web app
import requests # To send HTTP requests
from bs4 import BeautifulSoup # BeautifulSoup is used for web scraping
app = Flask(__name__) # Creating a Flask app instance
# Home route - Displays the form
@app.route("/")
def index():
return render_template("index.html") # Renders the index.html template
# Scraping route - Scrapes data based on user input
@app.route("/scrape", methods=["POST"])
def scrape():
url, tag = request.form.get("url"), request.form.get("tag") # Get URL and tag from the form
if not url or not tag: # If any value is missing, return an error
return render_template("result.html", error="Both URL and Tag are required.")
# Send an HTTP request to fetch the webpage content
response = requests.get(url, headers={"User-Agent": "Mozilla/5.0"})
response.raise_for_status() # Raise an error if the request fails
# Parse the HTML content of the page
soup = BeautifulSoup(response.text, "html.parser")
# Extract text from all occurrences of the given tag
elements = [e.get_text() for e in soup.find_all(tag)]
# Render the result page with extracted data
return render_template("result.html", tag=tag, url=url, title=soup.title.string or "No Title", elements=elements)
# Run the Flask server
if __name__ == "__main__":
app.run(debug=True) # Debug mode is enabled to show errors in the console
Notes
β οΈ Important Considerations:
- Works only with publicly accessible websites.
- Some websites may block requests (Use user-agent headers to avoid 403 errors).
- Handles missing input errors but does not handle all exceptions.
Future Improvements
β¨ Potential Enhancements:
- π Add support for multiple tags.
- π Implement pagination for large data sets.
- πΎ Store scraped data in a database.
π Happy Coding! π