Spaces:
Sleeping
Sleeping
Update README.md
Browse files
README.md
CHANGED
|
@@ -11,37 +11,50 @@ pinned: false
|
|
| 11 |
|
| 12 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
| 13 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
# Web Scraper for n8n
|
| 15 |
|
| 16 |
-
A
|
| 17 |
|
| 18 |
## Features
|
| 19 |
-
|
| 20 |
-
- β
|
| 21 |
-
- β
|
| 22 |
-
- β
|
| 23 |
-
- β
|
| 24 |
-
- β
|
| 25 |
|
| 26 |
## Usage with n8n
|
| 27 |
|
| 28 |
-
1.
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
- **
|
| 32 |
-
- **URL**: `https://your-username-space-name.hf.space/scrape`
|
| 33 |
-
- **Headers**: `Content-Type: application/json`
|
| 34 |
-
- **Body**:
|
| 35 |
```json
|
| 36 |
{
|
| 37 |
-
"url": "
|
| 38 |
}
|
| 39 |
```
|
| 40 |
|
| 41 |
-
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
|
| 12 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
| 13 |
|
| 14 |
+
|
| 15 |
+
---
|
| 16 |
+
title: Web Scraper for n8n
|
| 17 |
+
emoji: π
|
| 18 |
+
colorFrom: blue
|
| 19 |
+
colorTo: green
|
| 20 |
+
sdk: gradio
|
| 21 |
+
sdk_version: 4.19.0
|
| 22 |
+
app_file: app.py
|
| 23 |
+
pinned: false
|
| 24 |
+
license: mit
|
| 25 |
+
---
|
| 26 |
+
|
| 27 |
# Web Scraper for n8n
|
| 28 |
|
| 29 |
+
A simple web scraper API that extracts text from webpages, designed to work with n8n via HTTP requests.
|
| 30 |
|
| 31 |
## Features
|
| 32 |
+
|
| 33 |
+
- β
Extract text content from any webpage
|
| 34 |
+
- β
BeautifulSoup for smart HTML parsing
|
| 35 |
+
- β
Simple regex fallback
|
| 36 |
+
- β
JSON API for n8n integration
|
| 37 |
+
- β
Gradio web interface
|
| 38 |
|
| 39 |
## Usage with n8n
|
| 40 |
|
| 41 |
+
1. **HTTP Request Node Configuration:**
|
| 42 |
+
- **Method:** POST
|
| 43 |
+
- **URL:** `https://your-username-space-name.hf.space/scrape`
|
| 44 |
+
- **Body:**
|
|
|
|
|
|
|
|
|
|
| 45 |
```json
|
| 46 |
{
|
| 47 |
+
"url": "https://example.com"
|
| 48 |
}
|
| 49 |
```
|
| 50 |
|
| 51 |
+
2. **Example Response:**
|
| 52 |
+
```json
|
| 53 |
+
{
|
| 54 |
+
"success": true,
|
| 55 |
+
"url": "https://example.com",
|
| 56 |
+
"execution_time": 0.45,
|
| 57 |
+
"method": "beautifulsoup",
|
| 58 |
+
"extracted_text": "...",
|
| 59 |
+
"text_length": 1234
|
| 60 |
+
}
|