File size: 4,533 Bytes
ed21ad0 83019af 102d6ed 83019af 102d6ed 83019af 102d6ed 83019af 102d6ed 83019af 102d6ed 83019af 102d6ed 83019af 102d6ed 83019af 102d6ed 83019af 102d6ed 83019af 102d6ed 83019af 102d6ed 83019af 102d6ed 83019af 102d6ed 83019af 102d6ed 83019af 102d6ed 83019af 102d6ed 83019af 102d6ed 83019af 102d6ed 83019af 102d6ed 83019af 102d6ed | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 | ---
title: Webscrapingexample
emoji: π¨
colorFrom: pink
colorTo: purple
sdk: docker
pinned: false
---
# Flask Web Scraper
Live Demo: https://lovnishverma-webscrapingexample.hf.space/
Live Demo: https://webscaraping-simplified.onrender.com
GitHub Repo Link: https://github.com/lovnishverma/webscaraping-simplified
## Overview
This is a simple **Flask-based web scraping application** that allows users to enter a URL and an HTML tag to extract and display content from that webpage.

## Features
β
**User-friendly Web Interface:** Enter URL and tag to scrape data.
β
**Web Scraping with BeautifulSoup:** Extracts text from specified HTML tags.
β
**Error Handling:** Displays an error if URL or tag is missing.
β
**Minimal and Lightweight:** Uses Flask for the backend.
## Requirements
Ensure you have the following installed before running the project:
- π Python
- π Flask
- π Requests
- ποΈ BeautifulSoup4 (bs4)
- π¦ Gunicorn (for production/deployment)
You can install dependencies locally using:
```sh
pip install -r requirements.txt
```
## Project Structure
π **Project Directory:**
```
/your_project_directory
βββ app.py # ποΈ Main Flask application
βββ Dockerfile # π³ Docker configuration for Hugging Face
βββ requirements.txt # π¦ Project dependencies
βββ templates/
β βββ index.html # π Home page with form
β βββ result.html # π Page to display scraped data
βββ README.md # π Project Documentation
```
## Usage & Deployment (Hugging Face Spaces)
This project is configured to easily deploy on **Hugging Face Spaces** using the Docker SDK.
π **Deploying to Hugging Face:**
1. Create a new Space on [Hugging Face](https://huggingface.co/spaces).
2. Set the **Space SDK** to **Docker** and choose the **Blank** template.
3. Upload your project files (`app.py`, `Dockerfile`, `requirements.txt`, and the `templates` folder) to the Space.
4. The Space will automatically build the container and start the app on port `7860` using `gunicorn`.
π **Access the Web App:**
Once the build is complete, your app will be live at your Hugging Face Space URL:
```
[https://yourusername-yourspacename.hf.space/](https://yourusername-yourspacename.hf.space/)
```
π **Enter URL and Tag:**
1. Provide a valid URL.
2. Specify an HTML tag (e.g., `p`, `h1`, `div`).
3. Click submit to fetch and display the data.
## Code
```python
from flask import Flask, render_template, request # Flask is used to create a web app
import requests # To send HTTP requests
from bs4 import BeautifulSoup # BeautifulSoup is used for web scraping
app = Flask(__name__) # Creating a Flask app instance
# Home route - Displays the form
@app.route("/")
def index():
return render_template("index.html") # Renders the index.html template
# Scraping route - Scrapes data based on user input
@app.route("/scrape", methods=["POST"])
def scrape():
url, tag = request.form.get("url"), request.form.get("tag") # Get URL and tag from the form
if not url or not tag: # If any value is missing, return an error
return render_template("result.html", error="Both URL and Tag are required.")
# Send an HTTP request to fetch the webpage content
response = requests.get(url, headers={"User-Agent": "Mozilla/5.0"})
response.raise_for_status() # Raise an error if the request fails
# Parse the HTML content of the page
soup = BeautifulSoup(response.text, "html.parser")
# Extract text from all occurrences of the given tag
elements = [e.get_text() for e in soup.find_all(tag)]
# Render the result page with extracted data
return render_template("result.html", tag=tag, url=url, title=soup.title.string or "No Title", elements=elements)
# Run the Flask server
if __name__ == "__main__":
app.run(debug=True) # Debug mode is enabled to show errors in the console
```
## Notes
β οΈ **Important Considerations:**
* Works only with publicly accessible websites.
* Some websites may block requests (**Use user-agent headers to avoid 403 errors**).
* Handles missing input errors but does not handle all exceptions.
## Future Improvements
β¨ **Potential Enhancements:**
* π Add support for multiple tags.
* π Implement pagination for large data sets.
* πΎ Store scraped data in a database.
---
π **Happy Coding! π**
|