Spaces:

Almaatla
/

web-scraper

Sleeping

web-scraper / README.md

Update README.md

1133199 verified 19 days ago

1.6 kB

title: Web Scraper API
emoji: 🕸️
colorFrom: indigo
colorTo: blue
sdk: docker
app_port: 7860
pinned: true

Web Scraping Service

This is a simple web scraping service built with FastAPI and BeautifulSoup, designed to be deployed on Hugging Face Spaces using Docker.

URL Scraping: Extracts main content from a given URL.
Content Cleaning: Removes ads, scripts, styles, and other clutter using heuristic rules.
JSON Output: Returns clean text, title, and metadata.
Dockerized: Easy to deploy and run anywhere.

Install dependencies:
```
pip install -r requirements.txt
```
Run the application:
```
uvicorn main:app --reload
```
Test: Open your browser to http://127.0.0.1:8000/docs to see the interactive API documentation.

Create a new Space on Hugging Face.
Select Docker as the SDK.
Upload the files in this repository to the Space.
- Dockerfile
- requirements.txt
- main.py
- README.md
The application will build and start automatically on port 7860.

Request Body:

{
  "url": "https://example.com/article"
}

Response:

{
  "url": "https://example.com/article",
  "title": "Example Article Title",
  "content": "Extracted text content...",
  "status": "success"
}

Query Parameter: url

Example: https://your-space-url.hf.space/scrape?url=https://example.com