--- title: Webscrapingexample emoji: 🐨 colorFrom: pink colorTo: purple sdk: docker pinned: false --- # Flask Web Scraper Live Demo: https://lovnishverma-webscrapingexample.hf.space/ Live Demo: https://webscaraping-simplified.onrender.com GitHub Repo Link: https://github.com/lovnishverma/webscaraping-simplified ## Overview This is a simple **Flask-based web scraping application** that allows users to enter a URL and an HTML tag to extract and display content from that webpage. ![image](https://github.com/user-attachments/assets/13838a50-71e5-411d-ac7c-034e5caac405) ## Features ✅ **User-friendly Web Interface:** Enter URL and tag to scrape data. ✅ **Web Scraping with BeautifulSoup:** Extracts text from specified HTML tags. ✅ **Error Handling:** Displays an error if URL or tag is missing. ✅ **Minimal and Lightweight:** Uses Flask for the backend. ## Requirements Ensure you have the following installed before running the project: - 🐍 Python - 🌐 Flask - 🔗 Requests - 🏗️ BeautifulSoup4 (bs4) - 🦄 Gunicorn (for production/deployment) You can install dependencies locally using: ```sh pip install -r requirements.txt ``` ## Project Structure 📂 **Project Directory:** ``` /your_project_directory │── app.py # 🏗️ Main Flask application │── Dockerfile # 🐳 Docker configuration for Hugging Face │── requirements.txt # 📦 Project dependencies │── templates/ │ │── index.html # 📄 Home page with form │ │── result.html # 📄 Page to display scraped data │── README.md # 📖 Project Documentation ``` ## Usage & Deployment (Hugging Face Spaces) This project is configured to easily deploy on **Hugging Face Spaces** using the Docker SDK. 🚀 **Deploying to Hugging Face:** 1. Create a new Space on [Hugging Face](https://huggingface.co/spaces). 2. Set the **Space SDK** to **Docker** and choose the **Blank** template. 3. Upload your project files (`app.py`, `Dockerfile`, `requirements.txt`, and the `templates` folder) to the Space. 4. The Space will automatically build the container and start the app on port `7860` using `gunicorn`. 🌍 **Access the Web App:** Once the build is complete, your app will be live at your Hugging Face Space URL: ``` [https://yourusername-yourspacename.hf.space/](https://yourusername-yourspacename.hf.space/) ``` 📝 **Enter URL and Tag:** 1. Provide a valid URL. 2. Specify an HTML tag (e.g., `p`, `h1`, `div`). 3. Click submit to fetch and display the data. ## Code ```python from flask import Flask, render_template, request # Flask is used to create a web app import requests # To send HTTP requests from bs4 import BeautifulSoup # BeautifulSoup is used for web scraping app = Flask(__name__) # Creating a Flask app instance # Home route - Displays the form @app.route("/") def index(): return render_template("index.html") # Renders the index.html template # Scraping route - Scrapes data based on user input @app.route("/scrape", methods=["POST"]) def scrape(): url, tag = request.form.get("url"), request.form.get("tag") # Get URL and tag from the form if not url or not tag: # If any value is missing, return an error return render_template("result.html", error="Both URL and Tag are required.") # Send an HTTP request to fetch the webpage content response = requests.get(url, headers={"User-Agent": "Mozilla/5.0"}) response.raise_for_status() # Raise an error if the request fails # Parse the HTML content of the page soup = BeautifulSoup(response.text, "html.parser") # Extract text from all occurrences of the given tag elements = [e.get_text() for e in soup.find_all(tag)] # Render the result page with extracted data return render_template("result.html", tag=tag, url=url, title=soup.title.string or "No Title", elements=elements) # Run the Flask server if __name__ == "__main__": app.run(debug=True) # Debug mode is enabled to show errors in the console ``` ## Notes ⚠️ **Important Considerations:** * Works only with publicly accessible websites. * Some websites may block requests (**Use user-agent headers to avoid 403 errors**). * Handles missing input errors but does not handle all exceptions. ## Future Improvements ✨ **Potential Enhancements:** * 🔄 Add support for multiple tags. * 📜 Implement pagination for large data sets. * 💾 Store scraped data in a database. --- 🎉 **Happy Coding! 🚀**