Spaces:
Sleeping
Sleeping
metadata
title: Web Scraper API
emoji: 🕸️
colorFrom: indigo
colorTo: blue
sdk: docker
app_port: 7860
pinned: true
Web Scraping Service
This is a simple web scraping service built with FastAPI and BeautifulSoup, designed to be deployed on Hugging Face Spaces using Docker.
Features
- URL Scraping: Extracts main content from a given URL.
- Content Cleaning: Removes ads, scripts, styles, and other clutter using heuristic rules.
- JSON Output: Returns clean text, title, and metadata.
- Dockerized: Easy to deploy and run anywhere.
Local Development
Install dependencies:
pip install -r requirements.txtRun the application:
uvicorn main:app --reloadTest: Open your browser to
http://127.0.0.1:8000/docsto see the interactive API documentation.
Deployment on Hugging Face Spaces
- Create a new Space on Hugging Face.
- Select Docker as the SDK.
- Upload the files in this repository to the Space.
Dockerfilerequirements.txtmain.pyREADME.md
- The application will build and start automatically on port 7860.
API Usage
Endpoint: POST /scrape
Request Body:
{
"url": "https://example.com/article"
}
Response:
{
"url": "https://example.com/article",
"title": "Example Article Title",
"content": "Extracted text content...",
"status": "success"
}
Endpoint: GET /scrape
Query Parameter: url
Example: https://your-space-url.hf.space/scrape?url=https://example.com