web-scraper / README.md
Almaatla's picture
Update README.md
1133199 verified
metadata
title: Web Scraper API
emoji: 🕸️
colorFrom: indigo
colorTo: blue
sdk: docker
app_port: 7860
pinned: true

Web Scraping Service

This is a simple web scraping service built with FastAPI and BeautifulSoup, designed to be deployed on Hugging Face Spaces using Docker.

Features

  • URL Scraping: Extracts main content from a given URL.
  • Content Cleaning: Removes ads, scripts, styles, and other clutter using heuristic rules.
  • JSON Output: Returns clean text, title, and metadata.
  • Dockerized: Easy to deploy and run anywhere.

Local Development

  1. Install dependencies:

    pip install -r requirements.txt
    
  2. Run the application:

    uvicorn main:app --reload
    
  3. Test: Open your browser to http://127.0.0.1:8000/docs to see the interactive API documentation.

Deployment on Hugging Face Spaces

  1. Create a new Space on Hugging Face.
  2. Select Docker as the SDK.
  3. Upload the files in this repository to the Space.
    • Dockerfile
    • requirements.txt
    • main.py
    • README.md
  4. The application will build and start automatically on port 7860.

API Usage

Endpoint: POST /scrape

Request Body:

{
  "url": "https://example.com/article"
}

Response:

{
  "url": "https://example.com/article",
  "title": "Example Article Title",
  "content": "Extracted text content...",
  "status": "success"
}

Endpoint: GET /scrape

Query Parameter: url

Example: https://your-space-url.hf.space/scrape?url=https://example.com