Spaces:
Sleeping
Sleeping
File size: 1,595 Bytes
0e565e7 1133199 0e565e7 5795c69 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 | ---
title: Web Scraper API
emoji: 🕸️
colorFrom: indigo
colorTo: blue
sdk: docker
app_port: 7860
pinned: true
---
# Web Scraping Service
This is a simple web scraping service built with FastAPI and BeautifulSoup, designed to be deployed on Hugging Face Spaces using Docker.
## Features
- **URL Scraping**: Extracts main content from a given URL.
- **Content Cleaning**: Removes ads, scripts, styles, and other clutter using heuristic rules.
- **JSON Output**: Returns clean text, title, and metadata.
- **Dockerized**: Easy to deploy and run anywhere.
## Local Development
1. **Install dependencies**:
```bash
pip install -r requirements.txt
```
2. **Run the application**:
```bash
uvicorn main:app --reload
```
3. **Test**:
Open your browser to `http://127.0.0.1:8000/docs` to see the interactive API documentation.
## Deployment on Hugging Face Spaces
1. Create a new Space on Hugging Face.
2. Select **Docker** as the SDK.
3. Upload the files in this repository to the Space.
- `Dockerfile`
- `requirements.txt`
- `main.py`
- `README.md`
4. The application will build and start automatically on port 7860.
## API Usage
### Endpoint: `POST /scrape`
**Request Body:**
```json
{
"url": "https://example.com/article"
}
```
**Response:**
```json
{
"url": "https://example.com/article",
"title": "Example Article Title",
"content": "Extracted text content...",
"status": "success"
}
```
### Endpoint: `GET /scrape`
**Query Parameter:** `url`
Example: `https://your-space-url.hf.space/scrape?url=https://example.com`
|