Spaces:
Running
Running
| ```python | |
| # First, we will need to install the required packages if not already installed. | |
| # You can do this by running: pip install Flask beautifulsoup4 requests | |
| from flask import Flask, request, jsonify | |
| from bs4 import BeautifulSoup | |
| import requests | |
| # Create Flask app | |
| app = Flask(__name__) | |
| # Define a route for the micro service to scrape URLs | |
| def scrape(): | |
| # Get JSON data from the request | |
| data = request.get_json() | |
| url = data.get('url') | |
| if not url: | |
| return jsonify({"error": "URL not provided"}), 400 | |
| try: | |
| # Send a GET request to the URL | |
| response = requests.get(url) | |
| response.raise_for_status() # Raise an error for bad responses | |
| # Parse the content using BeautifulSoup | |
| soup = BeautifulSoup(response.content, 'html.parser') | |
| # Extract title of the page | |
| title = soup.title.string if soup.title else 'No title found' | |
| # Extract all paragraphs from the page | |
| paragraphs = [p.get_text() for p in soup.find_all('p')] | |
| # Prepare the output | |
| result = { | |
| "url": url, | |
| "title": title, | |
| "paragraphs": paragraphs | |
| } | |
| return jsonify(result), 200 | |
| except requests.exceptions.RequestException as e: | |
| # Handle request exceptions | |
| return jsonify({"error": str(e)}), 500 | |
| # Run the application | |
| if __name__ == '__main__': | |
| app.run(debug=True) | |
| ``` | |
| ### Instructions to Run and Test the Microservice | |
| 1. **Save the code** in a file named `scrape_service.py`. | |
| 2. **Install required packages** (if not already installed) by running: | |
| ``` | |
| pip install Flask beautifulsoup4 requests | |
| ``` | |
| 3. **Run the Flask application**: | |
| ``` | |
| python scrape_service.py | |
| ``` | |
| The application will start at `http://127.0.0.1:5000/`. | |
| 4. **Test the service** with a POST request. You can use `curl` from the terminal or a tool like Postman. Here’s an example using `curl`: | |
| ```bash | |
| curl -X POST http://127.0.0.1:5000/scrape -H "Content-Type: application/json" -d '{"url": "https://www.example.com"}' | |
| ``` | |
| 5. **Expected Output**: The service will respond with a JSON object containing the URL, the title of the page, and the paragraphs. | |
| Example response: | |
| ```json | |
| { | |
| "url": "https://www.example.com", | |
| "title": "Example Domain", | |
| "paragraphs": [ | |
| "This domain is for use in illustrative examples...", | |
| "More information..." | |
| ] | |
| } | |
| ``` | |
| This microservice uses Flask to accept a POST request with a URL, scrapes that URL using BeautifulSoup, and returns the page title and paragraphs as a JSON response. | |
| The service is designed to be robust, handling HTTP errors gracefully and returning appropriate error messages if the input URL is missing or invalid. |