File size: 1,595 Bytes
0e565e7
 
 
 
 
 
 
1133199
0e565e7
 
5795c69
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
---
title: Web Scraper API
emoji: 🕸️
colorFrom: indigo
colorTo: blue
sdk: docker
app_port: 7860
pinned: true
---

# Web Scraping Service

This is a simple web scraping service built with FastAPI and BeautifulSoup, designed to be deployed on Hugging Face Spaces using Docker.

## Features

- **URL Scraping**: Extracts main content from a given URL.
- **Content Cleaning**: Removes ads, scripts, styles, and other clutter using heuristic rules.
- **JSON Output**: Returns clean text, title, and metadata.
- **Dockerized**: Easy to deploy and run anywhere.

## Local Development

1.  **Install dependencies**:
    ```bash
    pip install -r requirements.txt
    ```

2.  **Run the application**:
    ```bash
    uvicorn main:app --reload
    ```

3.  **Test**:
    Open your browser to `http://127.0.0.1:8000/docs` to see the interactive API documentation.

## Deployment on Hugging Face Spaces

1.  Create a new Space on Hugging Face.
2.  Select **Docker** as the SDK.
3.  Upload the files in this repository to the Space.
    - `Dockerfile`
    - `requirements.txt`
    - `main.py`
    - `README.md`
4.  The application will build and start automatically on port 7860.

## API Usage

### Endpoint: `POST /scrape`

**Request Body:**
```json
{
  "url": "https://example.com/article"
}
```

**Response:**
```json
{
  "url": "https://example.com/article",
  "title": "Example Article Title",
  "content": "Extracted text content...",
  "status": "success"
}
```

### Endpoint: `GET /scrape`

**Query Parameter:** `url`

Example: `https://your-space-url.hf.space/scrape?url=https://example.com`