Spaces:
Sleeping
Sleeping
ming commited on
Commit Β·
8ca69c7
1
Parent(s): bd7d2c1
docs: Update README.md with comprehensive V4 API documentation
Browse files- Add V4 API endpoints documentation (stream and stream-ndjson)
- Document V4 configuration environment variables
- Add V4 performance metrics (memory, inference speed, GPU support)
- Include comprehensive V4 usage examples (Python, cURL)
- Document three summarization styles (executive, skimmer, eli5)
- Add V4 output schema with 6 structured fields
- Update deployment options for different memory constraints
- Update security section with SSRF protection details
- Correct V2 warmup default to false
- Add troubleshooting section for V4-specific issues
README.md
CHANGED
|
@@ -11,11 +11,15 @@ app_port: 7860
|
|
| 11 |
|
| 12 |
# Text Summarizer API
|
| 13 |
|
| 14 |
-
A FastAPI-based text summarization service
|
| 15 |
|
| 16 |
## π Features
|
| 17 |
|
| 18 |
-
- **
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
- **RESTful API** with FastAPI
|
| 20 |
- **Health monitoring** and logging
|
| 21 |
- **Docker containerized** for easy deployment
|
|
@@ -45,6 +49,12 @@ POST /api/v2/summarize/stream
|
|
| 45 |
POST /api/v3/scrape-and-summarize/stream
|
| 46 |
```
|
| 47 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
## π Live Deployment
|
| 49 |
|
| 50 |
**β
Successfully deployed and tested on Hugging Face Spaces!**
|
|
@@ -56,14 +66,28 @@ POST /api/v3/scrape-and-summarize/stream
|
|
| 56 |
|
| 57 |
### Quick Test
|
| 58 |
```bash
|
| 59 |
-
# Test the live deployment
|
| 60 |
curl https://colin730-SummarizerApp.hf.space/health
|
|
|
|
|
|
|
| 61 |
curl -X POST https://colin730-SummarizerApp.hf.space/api/v2/summarize/stream \
|
| 62 |
-H "Content-Type: application/json" \
|
| 63 |
-d '{"text":"This is a test of the live API.","max_tokens":50}'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
```
|
| 65 |
|
| 66 |
-
**Request
|
|
|
|
|
|
|
| 67 |
```json
|
| 68 |
{
|
| 69 |
"text": "Your long text to summarize here...",
|
|
@@ -72,6 +96,33 @@ curl -X POST https://colin730-SummarizerApp.hf.space/api/v2/summarize/stream \
|
|
| 72 |
}
|
| 73 |
```
|
| 74 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 75 |
### API Documentation
|
| 76 |
- **Swagger UI**: `/docs`
|
| 77 |
- **ReDoc**: `/redoc`
|
|
@@ -105,6 +156,15 @@ The service uses the following environment variables:
|
|
| 105 |
- `SCRAPING_UA_ROTATION`: Enable user-agent rotation (default: `true`)
|
| 106 |
- `SCRAPING_RATE_LIMIT_PER_MINUTE`: Rate limit per IP (default: `10`)
|
| 107 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 108 |
### Server Configuration
|
| 109 |
- `SERVER_HOST`: Server host (default: `127.0.0.1`)
|
| 110 |
- `SERVER_PORT`: Server port (default: `8000`)
|
|
@@ -132,13 +192,29 @@ This app is optimized for deployment on Hugging Face Spaces using Docker SDK.
|
|
| 132 |
- Optimized for free tier resource limits
|
| 133 |
|
| 134 |
**Environment Variables for HF Spaces:**
|
|
|
|
|
|
|
| 135 |
```bash
|
| 136 |
ENABLE_V1_WARMUP=false
|
| 137 |
-
ENABLE_V2_WARMUP=
|
|
|
|
|
|
|
| 138 |
HF_MODEL_ID=sshleifer/distilbart-cnn-6-6
|
| 139 |
HF_HOME=/tmp/huggingface
|
| 140 |
```
|
| 141 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 142 |
## π Performance
|
| 143 |
|
| 144 |
### V1 (Ollama + Transformers Pipeline)
|
|
@@ -160,12 +236,30 @@ HF_HOME=/tmp/huggingface
|
|
| 160 |
- **Total latency**: 2-5 seconds (scrape + summarize)
|
| 161 |
- **Success rate**: 95%+ article extraction
|
| 162 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 163 |
### Memory Optimization
|
| 164 |
- **V1 warmup disabled by default** (`ENABLE_V1_WARMUP=false`)
|
| 165 |
-
- **V2 warmup
|
| 166 |
-
- **
|
| 167 |
-
- **
|
| 168 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 169 |
|
| 170 |
## π οΈ Development
|
| 171 |
|
|
@@ -312,6 +406,142 @@ for line in response.iter_lines():
|
|
| 312 |
- Website blocks scrapers
|
| 313 |
- User has already extracted the text manually
|
| 314 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 315 |
### Android Client (SSE)
|
| 316 |
```kotlin
|
| 317 |
// Android SSE client example
|
|
@@ -366,6 +596,16 @@ curl -X POST "https://colin730-SummarizerApp.hf.space/api/v3/scrape-and-summariz
|
|
| 366 |
curl -X POST "https://colin730-SummarizerApp.hf.space/api/v3/scrape-and-summarize/stream" \
|
| 367 |
-H "Content-Type: application/json" \
|
| 368 |
-d '{"text": "Your article text here (minimum 50 characters)...", "max_tokens": 256}'
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 369 |
```
|
| 370 |
|
| 371 |
### Test Script
|
|
@@ -378,7 +618,10 @@ curl -X POST "https://colin730-SummarizerApp.hf.space/api/v3/scrape-and-summariz
|
|
| 378 |
|
| 379 |
- Non-root user execution
|
| 380 |
- Input validation and sanitization
|
| 381 |
-
-
|
|
|
|
|
|
|
|
|
|
| 382 |
- API key authentication (optional)
|
| 383 |
|
| 384 |
## π Monitoring
|
|
@@ -393,10 +636,16 @@ The service includes:
|
|
| 393 |
|
| 394 |
### Common Issues
|
| 395 |
|
| 396 |
-
1. **Model not loading**: Check if Ollama is running and model is pulled
|
| 397 |
-
2. **Out of memory**:
|
|
|
|
|
|
|
|
|
|
|
|
|
| 398 |
3. **Slow startup**: Normal on first run due to model download
|
| 399 |
-
4. **
|
|
|
|
|
|
|
| 400 |
|
| 401 |
### Logs
|
| 402 |
View application logs in the Hugging Face Spaces interface or check the health endpoint for service status.
|
|
@@ -420,15 +669,25 @@ MIT License - see LICENSE file for details.
|
|
| 420 |
**Successfully deployed and tested on Hugging Face Spaces!** π
|
| 421 |
|
| 422 |
- β
**Proxy-aware FastAPI** with `root_path` support
|
| 423 |
-
- β
**All endpoints working** (health, docs,
|
| 424 |
- β
**Real-time streaming** summarization
|
|
|
|
|
|
|
| 425 |
- β
**No 404 errors** - all paths correctly configured
|
| 426 |
- β
**Test script included** for easy verification
|
| 427 |
|
| 428 |
-
###
|
| 429 |
-
-
|
| 430 |
-
-
|
| 431 |
-
-
|
| 432 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 433 |
|
| 434 |
**Live Space:** https://colin730-SummarizerApp.hf.space π―
|
|
|
|
| 11 |
|
| 12 |
# Text Summarizer API
|
| 13 |
|
| 14 |
+
A FastAPI-based text summarization service with multiple summarization engines: Ollama, HuggingFace Transformers, Web Scraping, and Structured Output with Qwen models.
|
| 15 |
|
| 16 |
## π Features
|
| 17 |
|
| 18 |
+
- **Multiple Summarization Engines**: Ollama, HuggingFace Transformers, and Qwen models
|
| 19 |
+
- **Structured JSON Output**: V4 API returns rich metadata (title, key points, category, sentiment, reading time)
|
| 20 |
+
- **Web Scraping Integration**: V3 and V4 APIs can scrape articles directly from URLs
|
| 21 |
+
- **Real-time Streaming**: All endpoints support Server-Sent Events (SSE) streaming
|
| 22 |
+
- **GPU Acceleration**: V4 supports CUDA, MPS (Apple Silicon), with automatic quantization
|
| 23 |
- **RESTful API** with FastAPI
|
| 24 |
- **Health monitoring** and logging
|
| 25 |
- **Docker containerized** for easy deployment
|
|
|
|
| 49 |
POST /api/v3/scrape-and-summarize/stream
|
| 50 |
```
|
| 51 |
|
| 52 |
+
### V4 API (Structured Output with Qwen)
|
| 53 |
+
```
|
| 54 |
+
POST /api/v4/scrape-and-summarize/stream
|
| 55 |
+
POST /api/v4/scrape-and-summarize/stream-ndjson
|
| 56 |
+
```
|
| 57 |
+
|
| 58 |
## π Live Deployment
|
| 59 |
|
| 60 |
**β
Successfully deployed and tested on Hugging Face Spaces!**
|
|
|
|
| 66 |
|
| 67 |
### Quick Test
|
| 68 |
```bash
|
| 69 |
+
# Test the live deployment - health check
|
| 70 |
curl https://colin730-SummarizerApp.hf.space/health
|
| 71 |
+
|
| 72 |
+
# Test V2 API (lightweight streaming)
|
| 73 |
curl -X POST https://colin730-SummarizerApp.hf.space/api/v2/summarize/stream \
|
| 74 |
-H "Content-Type: application/json" \
|
| 75 |
-d '{"text":"This is a test of the live API.","max_tokens":50}'
|
| 76 |
+
|
| 77 |
+
# Test V3 API (web scraping)
|
| 78 |
+
curl -X POST https://colin730-SummarizerApp.hf.space/api/v3/scrape-and-summarize/stream \
|
| 79 |
+
-H "Content-Type: application/json" \
|
| 80 |
+
-d '{"url":"https://example.com/article","max_tokens":128}'
|
| 81 |
+
|
| 82 |
+
# Test V4 API (structured output, if enabled)
|
| 83 |
+
curl -X POST https://colin730-SummarizerApp.hf.space/api/v4/scrape-and-summarize/stream-ndjson \
|
| 84 |
+
-H "Content-Type: application/json" \
|
| 85 |
+
-d '{"text":"This is a test article. It contains important information.","style":"executive","max_tokens":256}'
|
| 86 |
```
|
| 87 |
|
| 88 |
+
**Request Formats by API Version:**
|
| 89 |
+
|
| 90 |
+
V1/V2 (Simple text summarization):
|
| 91 |
```json
|
| 92 |
{
|
| 93 |
"text": "Your long text to summarize here...",
|
|
|
|
| 96 |
}
|
| 97 |
```
|
| 98 |
|
| 99 |
+
V3 (URL scraping or text):
|
| 100 |
+
```json
|
| 101 |
+
{
|
| 102 |
+
"url": "https://example.com/article",
|
| 103 |
+
"max_tokens": 256,
|
| 104 |
+
"include_metadata": true,
|
| 105 |
+
"use_cache": true
|
| 106 |
+
}
|
| 107 |
+
```
|
| 108 |
+
|
| 109 |
+
V4 (Structured output with styles):
|
| 110 |
+
```json
|
| 111 |
+
{
|
| 112 |
+
"url": "https://example.com/article",
|
| 113 |
+
"style": "executive",
|
| 114 |
+
"max_tokens": 512,
|
| 115 |
+
"include_metadata": true,
|
| 116 |
+
"use_cache": true
|
| 117 |
+
}
|
| 118 |
+
```
|
| 119 |
+
|
| 120 |
+
**Which API to Use?**
|
| 121 |
+
- **V1**: Local deployment with Ollama (requires external service)
|
| 122 |
+
- **V2**: Lightweight cloud deployment, simple text summaries
|
| 123 |
+
- **V3**: When you need to scrape articles from URLs + simple summaries
|
| 124 |
+
- **V4**: When you need rich metadata (category, sentiment, key points) + GPU acceleration
|
| 125 |
+
|
| 126 |
### API Documentation
|
| 127 |
- **Swagger UI**: `/docs`
|
| 128 |
- **ReDoc**: `/redoc`
|
|
|
|
| 156 |
- `SCRAPING_UA_ROTATION`: Enable user-agent rotation (default: `true`)
|
| 157 |
- `SCRAPING_RATE_LIMIT_PER_MINUTE`: Rate limit per IP (default: `10`)
|
| 158 |
|
| 159 |
+
### V4 Configuration (Structured Summarization)
|
| 160 |
+
- `ENABLE_V4_STRUCTURED`: Enable V4 API (default: `true`)
|
| 161 |
+
- `ENABLE_V4_WARMUP`: Load model at startup (default: `false` to save memory)
|
| 162 |
+
- `V4_MODEL_ID`: Model to use (default: `Qwen/Qwen2.5-1.5B-Instruct`, alternative: `Qwen/Qwen2.5-3B-Instruct`)
|
| 163 |
+
- `V4_MAX_TOKENS`: Max tokens to generate (default: `256`, range: 128-2048)
|
| 164 |
+
- `V4_TEMPERATURE`: Sampling temperature (default: `0.2` for consistent output)
|
| 165 |
+
- `V4_ENABLE_QUANTIZATION`: Enable INT8 quantization on CPU or 4-bit NF4 on CUDA (default: `true`)
|
| 166 |
+
- `V4_USE_FP16_FOR_SPEED`: Use FP16 precision for 2-3x faster inference on GPU (default: `false`)
|
| 167 |
+
|
| 168 |
### Server Configuration
|
| 169 |
- `SERVER_HOST`: Server host (default: `127.0.0.1`)
|
| 170 |
- `SERVER_PORT`: Server port (default: `8000`)
|
|
|
|
| 192 |
- Optimized for free tier resource limits
|
| 193 |
|
| 194 |
**Environment Variables for HF Spaces:**
|
| 195 |
+
|
| 196 |
+
For memory-constrained deployments (free tier):
|
| 197 |
```bash
|
| 198 |
ENABLE_V1_WARMUP=false
|
| 199 |
+
ENABLE_V2_WARMUP=false
|
| 200 |
+
ENABLE_V3_SCRAPING=true
|
| 201 |
+
ENABLE_V4_STRUCTURED=false
|
| 202 |
HF_MODEL_ID=sshleifer/distilbart-cnn-6-6
|
| 203 |
HF_HOME=/tmp/huggingface
|
| 204 |
```
|
| 205 |
|
| 206 |
+
For GPU-enabled deployments (paid tier with 16GB+ RAM):
|
| 207 |
+
```bash
|
| 208 |
+
ENABLE_V1_WARMUP=false
|
| 209 |
+
ENABLE_V2_WARMUP=false
|
| 210 |
+
ENABLE_V3_SCRAPING=true
|
| 211 |
+
ENABLE_V4_STRUCTURED=true
|
| 212 |
+
ENABLE_V4_WARMUP=false
|
| 213 |
+
V4_MODEL_ID=Qwen/Qwen2.5-3B-Instruct
|
| 214 |
+
V4_ENABLE_QUANTIZATION=true
|
| 215 |
+
V4_USE_FP16_FOR_SPEED=true
|
| 216 |
+
```
|
| 217 |
+
|
| 218 |
## π Performance
|
| 219 |
|
| 220 |
### V1 (Ollama + Transformers Pipeline)
|
|
|
|
| 236 |
- **Total latency**: 2-5 seconds (scrape + summarize)
|
| 237 |
- **Success rate**: 95%+ article extraction
|
| 238 |
|
| 239 |
+
### V4 (Structured Summarization with Qwen)
|
| 240 |
+
- **V4 Models**: Qwen/Qwen2.5-1.5B-Instruct (default) or Qwen/Qwen2.5-3B-Instruct (higher quality)
|
| 241 |
+
- **Memory usage**:
|
| 242 |
+
- 1.5B model: ~2-3GB RAM (FP16 on GPU), ~1GB (4-bit NF4 on CUDA)
|
| 243 |
+
- 3B model: ~6-7GB RAM (FP16 on GPU), ~3-4GB (4-bit NF4 on CUDA)
|
| 244 |
+
- **Inference speed**:
|
| 245 |
+
- 1.5B model: 20-46 seconds per request
|
| 246 |
+
- 3B model: 40-60 seconds per request
|
| 247 |
+
- NDJSON streaming: 43% faster time-to-first-token
|
| 248 |
+
- **GPU acceleration**: CUDA > MPS (Apple Silicon) > CPU (4x speed difference)
|
| 249 |
+
- **Output format**: Structured JSON with 6 fields (title, summary, key_points, category, sentiment, read_time_min)
|
| 250 |
+
- **Styles**: executive, skimmer, eli5
|
| 251 |
+
|
| 252 |
### Memory Optimization
|
| 253 |
- **V1 warmup disabled by default** (`ENABLE_V1_WARMUP=false`)
|
| 254 |
+
- **V2 warmup disabled by default** (`ENABLE_V2_WARMUP=false`)
|
| 255 |
+
- **V4 warmup disabled by default** (`ENABLE_V4_WARMUP=false`) - Saves 2-7GB RAM
|
| 256 |
+
- **HuggingFace Spaces deployment options**:
|
| 257 |
+
- V2-only: ~500MB (fits free tier)
|
| 258 |
+
- V2+V3: ~550MB (fits free tier)
|
| 259 |
+
- V2+V3+V4 (1.5B): ~3GB (requires paid tier)
|
| 260 |
+
- V2+V3+V4 (3B): ~7GB (requires paid tier)
|
| 261 |
+
- **Local development**: All versions can run simultaneously with 8-10GB RAM
|
| 262 |
+
- **GPU deployment**: V4 benefits significantly from CUDA or MPS acceleration
|
| 263 |
|
| 264 |
## π οΈ Development
|
| 265 |
|
|
|
|
| 406 |
- Website blocks scrapers
|
| 407 |
- User has already extracted the text manually
|
| 408 |
|
| 409 |
+
### V4 API (Structured Output with Qwen) - High-Quality Summaries
|
| 410 |
+
|
| 411 |
+
**V4 supports two streaming formats and three summarization styles**
|
| 412 |
+
|
| 413 |
+
#### Streaming Format 1: Standard JSON Streaming (stream)
|
| 414 |
+
```python
|
| 415 |
+
import requests
|
| 416 |
+
import json
|
| 417 |
+
|
| 418 |
+
# V4 scrape article from URL and stream structured JSON
|
| 419 |
+
response = requests.post(
|
| 420 |
+
"https://colin730-SummarizerApp.hf.space/api/v4/scrape-and-summarize/stream",
|
| 421 |
+
json={
|
| 422 |
+
"url": "https://example.com/article",
|
| 423 |
+
"style": "executive", # Options: "executive", "skimmer", "eli5"
|
| 424 |
+
"max_tokens": 256,
|
| 425 |
+
"include_metadata": True,
|
| 426 |
+
"use_cache": True
|
| 427 |
+
},
|
| 428 |
+
stream=True
|
| 429 |
+
)
|
| 430 |
+
|
| 431 |
+
for line in response.iter_lines():
|
| 432 |
+
if line.startswith(b'data: '):
|
| 433 |
+
data = json.loads(line[6:])
|
| 434 |
+
|
| 435 |
+
# First event: metadata
|
| 436 |
+
if data.get("type") == "metadata":
|
| 437 |
+
print(f"Style: {data['data']['style']}")
|
| 438 |
+
print(f"Scrape time: {data['data']['scrape_latency_ms']}ms\n")
|
| 439 |
+
|
| 440 |
+
# Content events (streaming JSON tokens)
|
| 441 |
+
elif "content" in data:
|
| 442 |
+
print(data["content"], end="")
|
| 443 |
+
if data["done"]:
|
| 444 |
+
# Parse final JSON
|
| 445 |
+
summary = json.loads(accumulated_content)
|
| 446 |
+
print(f"\n\nTitle: {summary['title']}")
|
| 447 |
+
print(f"Category: {summary['category']}")
|
| 448 |
+
print(f"Sentiment: {summary['sentiment']}")
|
| 449 |
+
print(f"Key Points: {summary['key_points']}")
|
| 450 |
+
break
|
| 451 |
+
```
|
| 452 |
+
|
| 453 |
+
#### Streaming Format 2: NDJSON Patch Streaming (stream-ndjson) - 43% Faster
|
| 454 |
+
```python
|
| 455 |
+
import requests
|
| 456 |
+
import json
|
| 457 |
+
|
| 458 |
+
# V4 NDJSON streaming - progressive JSON updates for real-time UI
|
| 459 |
+
response = requests.post(
|
| 460 |
+
"https://colin730-SummarizerApp.hf.space/api/v4/scrape-and-summarize/stream-ndjson",
|
| 461 |
+
json={
|
| 462 |
+
"text": "Your article text here (minimum 50 characters)...",
|
| 463 |
+
"style": "skimmer", # Brief, fact-focused summary
|
| 464 |
+
"max_tokens": 512,
|
| 465 |
+
"include_metadata": True
|
| 466 |
+
},
|
| 467 |
+
stream=True
|
| 468 |
+
)
|
| 469 |
+
|
| 470 |
+
summary = {}
|
| 471 |
+
|
| 472 |
+
for line in response.iter_lines():
|
| 473 |
+
if line.startswith(b'data: '):
|
| 474 |
+
event = json.loads(line[6:])
|
| 475 |
+
|
| 476 |
+
# First event: metadata
|
| 477 |
+
if event.get("type") == "metadata":
|
| 478 |
+
print(f"Input: {event['data']['input_type']}")
|
| 479 |
+
print(f"Style: {event['data']['style']}\n")
|
| 480 |
+
|
| 481 |
+
# NDJSON patch events
|
| 482 |
+
elif "delta" in event:
|
| 483 |
+
delta = event["delta"]
|
| 484 |
+
state = event["state"]
|
| 485 |
+
|
| 486 |
+
if delta and delta.get("op") == "set":
|
| 487 |
+
# Field set operation
|
| 488 |
+
field = delta["field"]
|
| 489 |
+
value = delta["value"]
|
| 490 |
+
summary[field] = value
|
| 491 |
+
print(f"{field}: {value}")
|
| 492 |
+
|
| 493 |
+
elif delta and delta.get("op") == "append":
|
| 494 |
+
# Array append operation
|
| 495 |
+
field = delta["field"]
|
| 496 |
+
value = delta["value"]
|
| 497 |
+
if field not in summary:
|
| 498 |
+
summary[field] = []
|
| 499 |
+
summary[field].append(value)
|
| 500 |
+
print(f"+ {field}: {value}")
|
| 501 |
+
|
| 502 |
+
elif delta and delta.get("op") == "done":
|
| 503 |
+
# Final state
|
| 504 |
+
print(f"\nβ
Complete! Total time: {event.get('latency_ms', 0):.0f}ms")
|
| 505 |
+
print(f"Tokens used: {event.get('tokens_used', 0)}")
|
| 506 |
+
break
|
| 507 |
+
```
|
| 508 |
+
|
| 509 |
+
#### Summarization Styles
|
| 510 |
+
|
| 511 |
+
**Executive Style** (`"executive"`):
|
| 512 |
+
- Target audience: Business professionals, decision makers
|
| 513 |
+
- Format: Concise, action-oriented, business impact focus
|
| 514 |
+
- Example output: Strategic insights, financial implications, market trends
|
| 515 |
+
|
| 516 |
+
**Skimmer Style** (`"skimmer"`):
|
| 517 |
+
- Target audience: Busy readers wanting quick facts
|
| 518 |
+
- Format: Bullet-point style, scannable, fact-dense
|
| 519 |
+
- Example output: Core facts, numbers, dates, names
|
| 520 |
+
|
| 521 |
+
**ELI5 Style** (`"eli5"`):
|
| 522 |
+
- Target audience: General public, non-technical readers
|
| 523 |
+
- Format: Simple explanations, analogies, relatable examples
|
| 524 |
+
- Example output: What it means, why it matters, real-world impact
|
| 525 |
+
|
| 526 |
+
#### V4 Output Schema
|
| 527 |
+
|
| 528 |
+
All V4 responses return structured JSON with these 6 fields:
|
| 529 |
+
|
| 530 |
+
```json
|
| 531 |
+
{
|
| 532 |
+
"title": "Click-worthy title (<100 chars)",
|
| 533 |
+
"main_summary": "2-4 sentence summary (<500 chars)",
|
| 534 |
+
"key_points": [
|
| 535 |
+
"Key point 1",
|
| 536 |
+
"Key point 2",
|
| 537 |
+
"Key point 3"
|
| 538 |
+
],
|
| 539 |
+
"category": "Technology",
|
| 540 |
+
"sentiment": "Positive",
|
| 541 |
+
"read_time_min": 5
|
| 542 |
+
}
|
| 543 |
+
```
|
| 544 |
+
|
| 545 |
### Android Client (SSE)
|
| 546 |
```kotlin
|
| 547 |
// Android SSE client example
|
|
|
|
| 596 |
curl -X POST "https://colin730-SummarizerApp.hf.space/api/v3/scrape-and-summarize/stream" \
|
| 597 |
-H "Content-Type: application/json" \
|
| 598 |
-d '{"text": "Your article text here (minimum 50 characters)...", "max_tokens": 256}'
|
| 599 |
+
|
| 600 |
+
# V4 API - Standard JSON streaming (URL mode)
|
| 601 |
+
curl -X POST "https://colin730-SummarizerApp.hf.space/api/v4/scrape-and-summarize/stream" \
|
| 602 |
+
-H "Content-Type: application/json" \
|
| 603 |
+
-d '{"url": "https://example.com/article", "style": "executive", "max_tokens": 256}'
|
| 604 |
+
|
| 605 |
+
# V4 API - NDJSON patch streaming (Text mode) - 43% faster time-to-first-token
|
| 606 |
+
curl -X POST "https://colin730-SummarizerApp.hf.space/api/v4/scrape-and-summarize/stream-ndjson" \
|
| 607 |
+
-H "Content-Type: application/json" \
|
| 608 |
+
-d '{"text": "Your article text (minimum 50 chars)...", "style": "skimmer", "max_tokens": 512}'
|
| 609 |
```
|
| 610 |
|
| 611 |
### Test Script
|
|
|
|
| 618 |
|
| 619 |
- Non-root user execution
|
| 620 |
- Input validation and sanitization
|
| 621 |
+
- **SSRF protection**: V3 and V4 APIs block localhost and private IP ranges
|
| 622 |
+
- **Rate limiting**: Configurable per-IP rate limits for scraping endpoints
|
| 623 |
+
- **URL validation**: Strict URL format checking (HTTP/HTTPS only)
|
| 624 |
+
- **Content limits**: Maximum text lengths enforced (50,000 chars for V3/V4)
|
| 625 |
- API key authentication (optional)
|
| 626 |
|
| 627 |
## π Monitoring
|
|
|
|
| 636 |
|
| 637 |
### Common Issues
|
| 638 |
|
| 639 |
+
1. **Model not loading**: Check if Ollama is running and model is pulled (V1 only)
|
| 640 |
+
2. **Out of memory**:
|
| 641 |
+
- V1: Ensure 2-4GB RAM available
|
| 642 |
+
- V2/V3: Ensure ~500-550MB RAM available
|
| 643 |
+
- V4 (1.5B): Ensure 2-3GB RAM available
|
| 644 |
+
- V4 (3B): Ensure 6-7GB RAM available
|
| 645 |
3. **Slow startup**: Normal on first run due to model download
|
| 646 |
+
4. **V4 slow inference**: Enable GPU acceleration (CUDA or MPS) and FP16 for 2-4x speedup
|
| 647 |
+
5. **V4 quantization slow**: Quantization takes 1-2 minutes on startup; disable warmup to defer until first request
|
| 648 |
+
6. **API errors**: Check logs via `/docs` endpoint
|
| 649 |
|
| 650 |
### Logs
|
| 651 |
View application logs in the Hugging Face Spaces interface or check the health endpoint for service status.
|
|
|
|
| 669 |
**Successfully deployed and tested on Hugging Face Spaces!** π
|
| 670 |
|
| 671 |
- β
**Proxy-aware FastAPI** with `root_path` support
|
| 672 |
+
- β
**All endpoints working** (health, docs, V1-V4 APIs)
|
| 673 |
- β
**Real-time streaming** summarization
|
| 674 |
+
- β
**Structured JSON output** with V4 API
|
| 675 |
+
- β
**GPU acceleration support** (CUDA, MPS, CPU fallback)
|
| 676 |
- β
**No 404 errors** - all paths correctly configured
|
| 677 |
- β
**Test script included** for easy verification
|
| 678 |
|
| 679 |
+
### API Versions Available
|
| 680 |
+
- **V1**: Ollama + Transformers (requires external Ollama service)
|
| 681 |
+
- **V2**: HuggingFace streaming (lightweight, ~500MB)
|
| 682 |
+
- **V3**: Web scraping + Summarization (lightweight, ~550MB)
|
| 683 |
+
- **V4**: Structured output with Qwen (GPU-optimized, 2-7GB depending on model)
|
| 684 |
+
|
| 685 |
+
### Recent Features
|
| 686 |
+
- Added V4 structured summarization API with Qwen models
|
| 687 |
+
- NDJSON patch streaming for 43% faster time-to-first-token
|
| 688 |
+
- Three summarization styles: executive, skimmer, eli5
|
| 689 |
+
- GPU optimization with CUDA/MPS/CPU auto-detection
|
| 690 |
+
- Automatic quantization (4-bit NF4, FP16, INT8)
|
| 691 |
+
- Rich metadata output (category, sentiment, reading time)
|
| 692 |
|
| 693 |
**Live Space:** https://colin730-SummarizerApp.hf.space π―
|