LovnishVerma commited on
Commit
83019af
Β·
verified Β·
1 Parent(s): 3bd04e3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +125 -1
README.md CHANGED
@@ -7,4 +7,128 @@ sdk: docker
7
  pinned: false
8
  ---
9
 
10
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  pinned: false
8
  ---
9
 
10
+ # Flask Web Scraper
11
+
12
+ Live Demo: https://lovnishverma-webscrapingexample.hf.space/
13
+
14
+ Live Demo: https://webscaraping-simplified.onrender.com
15
+
16
+ GitHub Repo Link: https://github.com/lovnishverma/webscaraping-simplified
17
+
18
+ ## Overview
19
+ This is a simple **Flask-based web scraping application** that allows users to enter a URL and an HTML tag to extract and display content from that webpage.
20
+
21
+ ![image](https://github.com/user-attachments/assets/13838a50-71e5-411d-ac7c-034e5caac405)
22
+
23
+
24
+ ## Features
25
+ βœ… **User-friendly Web Interface:** Enter URL and tag to scrape data.
26
+
27
+ βœ… **Web Scraping with BeautifulSoup:** Extracts text from specified HTML tags.
28
+
29
+ βœ… **Error Handling:** Displays an error if URL or tag is missing.
30
+
31
+ βœ… **Minimal and Lightweight:** Uses Flask for the backend.
32
+
33
+ ## Requirements
34
+ Ensure you have the following installed before running the project:
35
+
36
+ - 🐍 Python
37
+ - 🌐 Flask
38
+ - πŸ”— Requests
39
+ - πŸ—οΈ BeautifulSoup4 (bs4)
40
+
41
+ You can install dependencies using:
42
+ ```sh
43
+ pip install flask requests beautifulsoup4
44
+ ```
45
+
46
+ ## Project Structure
47
+ πŸ“‚ **Project Directory:**
48
+ ```
49
+ /your_project_directory
50
+ │── app.py # πŸ—οΈ Main Flask application
51
+ │── templates/
52
+ β”‚ │── index.html # πŸ“„ Home page with form
53
+ β”‚ │── result.html # πŸ“„ Page to display scraped data
54
+ │── static/ # 🎨 (Optional) CSS/JS files
55
+ │── README.md # πŸ“– Project Documentation
56
+ ```
57
+
58
+ ## Usage
59
+ πŸš€ **Run the Flask Application:**
60
+
61
+ Create start.sh file and write:
62
+
63
+ ```sh
64
+ python app.py
65
+ ```
66
+
67
+ 🌍 **Access the Web App:**
68
+ Open your browser and visit:
69
+ ```
70
+ http://yourprojectname.glitch.me/
71
+ ```
72
+
73
+ πŸ“ **Enter URL and Tag:**
74
+ 1. Provide a valid URL.
75
+ 2. Specify an HTML tag (e.g., `p`, `h1`, `div`).
76
+ 3. Click submit to fetch and display the data.
77
+
78
+ ## Code
79
+ ```python
80
+ from flask import Flask, render_template, request # Flask is used to create a web app
81
+ import requests # To send HTTP requests
82
+ from bs4 import BeautifulSoup # BeautifulSoup is used for web scraping
83
+
84
+ app = Flask(__name__) # Creating a Flask app instance
85
+
86
+ # Home route - Displays the form
87
+ @app.route("/")
88
+ def index():
89
+ return render_template("index.html") # Renders the index.html template
90
+
91
+ # Scraping route - Scrapes data based on user input
92
+ @app.route("/scrape", methods=["POST"])
93
+ def scrape():
94
+ url, tag = request.form.get("url"), request.form.get("tag") # Get URL and tag from the form
95
+ if not url or not tag: # If any value is missing, return an error
96
+ return render_template("result.html", error="Both URL and Tag are required.")
97
+
98
+ # Send an HTTP request to fetch the webpage content
99
+ response = requests.get(url, headers={"User-Agent": "Mozilla/5.0"})
100
+ response.raise_for_status() # Raise an error if the request fails
101
+
102
+ # Parse the HTML content of the page
103
+ soup = BeautifulSoup(response.text, "html.parser")
104
+
105
+ # Extract text from all occurrences of the given tag
106
+ elements = [e.get_text() for e in soup.find_all(tag)]
107
+
108
+ # Render the result page with extracted data
109
+ return render_template("result.html", tag=tag, url=url, title=soup.title.string or "No Title", elements=elements)
110
+
111
+ # Run the Flask server
112
+ if __name__ == "__main__":
113
+ app.run(debug=True) # Debug mode is enabled to show errors in the console
114
+
115
+ ```
116
+
117
+ ![image](https://github.com/user-attachments/assets/1a81b045-6962-4d62-92b9-ea3260705df2)
118
+
119
+
120
+ ## Notes
121
+ ⚠️ **Important Considerations:**
122
+ - Works only with publicly accessible websites.
123
+ - Some websites may block requests (**Use user-agent headers to avoid 403 errors**).
124
+ - Handles missing input errors but does not handle all exceptions.
125
+
126
+ ## Future Improvements
127
+ ✨ **Potential Enhancements:**
128
+ - πŸ”„ Add support for multiple tags.
129
+ - πŸ“œ Implement pagination for large data sets.
130
+ - πŸ’Ύ Store scraped data in a database.
131
+
132
+ ---
133
+ πŸŽ‰ **Happy Coding! πŸš€**
134
+