ming commited on
Commit
8ca69c7
Β·
1 Parent(s): bd7d2c1

docs: Update README.md with comprehensive V4 API documentation

Browse files

- Add V4 API endpoints documentation (stream and stream-ndjson)
- Document V4 configuration environment variables
- Add V4 performance metrics (memory, inference speed, GPU support)
- Include comprehensive V4 usage examples (Python, cURL)
- Document three summarization styles (executive, skimmer, eli5)
- Add V4 output schema with 6 structured fields
- Update deployment options for different memory constraints
- Update security section with SSRF protection details
- Correct V2 warmup default to false
- Add troubleshooting section for V4-specific issues

Files changed (1) hide show
  1. README.md +278 -19
README.md CHANGED
@@ -11,11 +11,15 @@ app_port: 7860
11
 
12
  # Text Summarizer API
13
 
14
- A FastAPI-based text summarization service powered by Ollama and Mistral 7B model.
15
 
16
  ## πŸš€ Features
17
 
18
- - **Fast text summarization** using local LLM inference
 
 
 
 
19
  - **RESTful API** with FastAPI
20
  - **Health monitoring** and logging
21
  - **Docker containerized** for easy deployment
@@ -45,6 +49,12 @@ POST /api/v2/summarize/stream
45
  POST /api/v3/scrape-and-summarize/stream
46
  ```
47
 
 
 
 
 
 
 
48
  ## 🌐 Live Deployment
49
 
50
  **βœ… Successfully deployed and tested on Hugging Face Spaces!**
@@ -56,14 +66,28 @@ POST /api/v3/scrape-and-summarize/stream
56
 
57
  ### Quick Test
58
  ```bash
59
- # Test the live deployment
60
  curl https://colin730-SummarizerApp.hf.space/health
 
 
61
  curl -X POST https://colin730-SummarizerApp.hf.space/api/v2/summarize/stream \
62
  -H "Content-Type: application/json" \
63
  -d '{"text":"This is a test of the live API.","max_tokens":50}'
 
 
 
 
 
 
 
 
 
 
64
  ```
65
 
66
- **Request Format (V1 and V2 compatible):**
 
 
67
  ```json
68
  {
69
  "text": "Your long text to summarize here...",
@@ -72,6 +96,33 @@ curl -X POST https://colin730-SummarizerApp.hf.space/api/v2/summarize/stream \
72
  }
73
  ```
74
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
75
  ### API Documentation
76
  - **Swagger UI**: `/docs`
77
  - **ReDoc**: `/redoc`
@@ -105,6 +156,15 @@ The service uses the following environment variables:
105
  - `SCRAPING_UA_ROTATION`: Enable user-agent rotation (default: `true`)
106
  - `SCRAPING_RATE_LIMIT_PER_MINUTE`: Rate limit per IP (default: `10`)
107
 
 
 
 
 
 
 
 
 
 
108
  ### Server Configuration
109
  - `SERVER_HOST`: Server host (default: `127.0.0.1`)
110
  - `SERVER_PORT`: Server port (default: `8000`)
@@ -132,13 +192,29 @@ This app is optimized for deployment on Hugging Face Spaces using Docker SDK.
132
  - Optimized for free tier resource limits
133
 
134
  **Environment Variables for HF Spaces:**
 
 
135
  ```bash
136
  ENABLE_V1_WARMUP=false
137
- ENABLE_V2_WARMUP=true
 
 
138
  HF_MODEL_ID=sshleifer/distilbart-cnn-6-6
139
  HF_HOME=/tmp/huggingface
140
  ```
141
 
 
 
 
 
 
 
 
 
 
 
 
 
142
  ## πŸ“Š Performance
143
 
144
  ### V1 (Ollama + Transformers Pipeline)
@@ -160,12 +236,30 @@ HF_HOME=/tmp/huggingface
160
  - **Total latency**: 2-5 seconds (scrape + summarize)
161
  - **Success rate**: 95%+ article extraction
162
 
 
 
 
 
 
 
 
 
 
 
 
 
 
163
  ### Memory Optimization
164
  - **V1 warmup disabled by default** (`ENABLE_V1_WARMUP=false`)
165
- - **V2 warmup enabled by default** (`ENABLE_V2_WARMUP=true`)
166
- - **HuggingFace Spaces**: V2-only deployment (no Ollama)
167
- - **Local development**: V1 endpoints work if Ollama is running externally
168
- - **distilbart-cnn-6-6 model**: Optimized for HuggingFace Spaces free tier with CNN/DailyMail fine-tuning
 
 
 
 
 
169
 
170
  ## πŸ› οΈ Development
171
 
@@ -312,6 +406,142 @@ for line in response.iter_lines():
312
  - Website blocks scrapers
313
  - User has already extracted the text manually
314
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
315
  ### Android Client (SSE)
316
  ```kotlin
317
  // Android SSE client example
@@ -366,6 +596,16 @@ curl -X POST "https://colin730-SummarizerApp.hf.space/api/v3/scrape-and-summariz
366
  curl -X POST "https://colin730-SummarizerApp.hf.space/api/v3/scrape-and-summarize/stream" \
367
  -H "Content-Type: application/json" \
368
  -d '{"text": "Your article text here (minimum 50 characters)...", "max_tokens": 256}'
 
 
 
 
 
 
 
 
 
 
369
  ```
370
 
371
  ### Test Script
@@ -378,7 +618,10 @@ curl -X POST "https://colin730-SummarizerApp.hf.space/api/v3/scrape-and-summariz
378
 
379
  - Non-root user execution
380
  - Input validation and sanitization
381
- - Rate limiting (configurable)
 
 
 
382
  - API key authentication (optional)
383
 
384
  ## πŸ“ˆ Monitoring
@@ -393,10 +636,16 @@ The service includes:
393
 
394
  ### Common Issues
395
 
396
- 1. **Model not loading**: Check if Ollama is running and model is pulled
397
- 2. **Out of memory**: Ensure sufficient RAM (8GB+) for Mistral 7B
 
 
 
 
398
  3. **Slow startup**: Normal on first run due to model download
399
- 4. **API errors**: Check logs via `/docs` endpoint
 
 
400
 
401
  ### Logs
402
  View application logs in the Hugging Face Spaces interface or check the health endpoint for service status.
@@ -420,15 +669,25 @@ MIT License - see LICENSE file for details.
420
  **Successfully deployed and tested on Hugging Face Spaces!** πŸš€
421
 
422
  - βœ… **Proxy-aware FastAPI** with `root_path` support
423
- - βœ… **All endpoints working** (health, docs, V2 API)
424
  - βœ… **Real-time streaming** summarization
 
 
425
  - βœ… **No 404 errors** - all paths correctly configured
426
  - βœ… **Test script included** for easy verification
427
 
428
- ### Recent Fixes Applied
429
- - Added `root_path=os.getenv("HF_SPACE_ROOT_PATH", "")` for HF Spaces proxy awareness
430
- - Ensured binding to `0.0.0.0:7860` as required by HF Spaces
431
- - Verified V2 router paths (`/api/v2/summarize/stream`) with no double prefixes
432
- - Created test script for external endpoint verification
 
 
 
 
 
 
 
 
433
 
434
  **Live Space:** https://colin730-SummarizerApp.hf.space 🎯
 
11
 
12
  # Text Summarizer API
13
 
14
+ A FastAPI-based text summarization service with multiple summarization engines: Ollama, HuggingFace Transformers, Web Scraping, and Structured Output with Qwen models.
15
 
16
  ## πŸš€ Features
17
 
18
+ - **Multiple Summarization Engines**: Ollama, HuggingFace Transformers, and Qwen models
19
+ - **Structured JSON Output**: V4 API returns rich metadata (title, key points, category, sentiment, reading time)
20
+ - **Web Scraping Integration**: V3 and V4 APIs can scrape articles directly from URLs
21
+ - **Real-time Streaming**: All endpoints support Server-Sent Events (SSE) streaming
22
+ - **GPU Acceleration**: V4 supports CUDA, MPS (Apple Silicon), with automatic quantization
23
  - **RESTful API** with FastAPI
24
  - **Health monitoring** and logging
25
  - **Docker containerized** for easy deployment
 
49
  POST /api/v3/scrape-and-summarize/stream
50
  ```
51
 
52
+ ### V4 API (Structured Output with Qwen)
53
+ ```
54
+ POST /api/v4/scrape-and-summarize/stream
55
+ POST /api/v4/scrape-and-summarize/stream-ndjson
56
+ ```
57
+
58
  ## 🌐 Live Deployment
59
 
60
  **βœ… Successfully deployed and tested on Hugging Face Spaces!**
 
66
 
67
  ### Quick Test
68
  ```bash
69
+ # Test the live deployment - health check
70
  curl https://colin730-SummarizerApp.hf.space/health
71
+
72
+ # Test V2 API (lightweight streaming)
73
  curl -X POST https://colin730-SummarizerApp.hf.space/api/v2/summarize/stream \
74
  -H "Content-Type: application/json" \
75
  -d '{"text":"This is a test of the live API.","max_tokens":50}'
76
+
77
+ # Test V3 API (web scraping)
78
+ curl -X POST https://colin730-SummarizerApp.hf.space/api/v3/scrape-and-summarize/stream \
79
+ -H "Content-Type: application/json" \
80
+ -d '{"url":"https://example.com/article","max_tokens":128}'
81
+
82
+ # Test V4 API (structured output, if enabled)
83
+ curl -X POST https://colin730-SummarizerApp.hf.space/api/v4/scrape-and-summarize/stream-ndjson \
84
+ -H "Content-Type: application/json" \
85
+ -d '{"text":"This is a test article. It contains important information.","style":"executive","max_tokens":256}'
86
  ```
87
 
88
+ **Request Formats by API Version:**
89
+
90
+ V1/V2 (Simple text summarization):
91
  ```json
92
  {
93
  "text": "Your long text to summarize here...",
 
96
  }
97
  ```
98
 
99
+ V3 (URL scraping or text):
100
+ ```json
101
+ {
102
+ "url": "https://example.com/article",
103
+ "max_tokens": 256,
104
+ "include_metadata": true,
105
+ "use_cache": true
106
+ }
107
+ ```
108
+
109
+ V4 (Structured output with styles):
110
+ ```json
111
+ {
112
+ "url": "https://example.com/article",
113
+ "style": "executive",
114
+ "max_tokens": 512,
115
+ "include_metadata": true,
116
+ "use_cache": true
117
+ }
118
+ ```
119
+
120
+ **Which API to Use?**
121
+ - **V1**: Local deployment with Ollama (requires external service)
122
+ - **V2**: Lightweight cloud deployment, simple text summaries
123
+ - **V3**: When you need to scrape articles from URLs + simple summaries
124
+ - **V4**: When you need rich metadata (category, sentiment, key points) + GPU acceleration
125
+
126
  ### API Documentation
127
  - **Swagger UI**: `/docs`
128
  - **ReDoc**: `/redoc`
 
156
  - `SCRAPING_UA_ROTATION`: Enable user-agent rotation (default: `true`)
157
  - `SCRAPING_RATE_LIMIT_PER_MINUTE`: Rate limit per IP (default: `10`)
158
 
159
+ ### V4 Configuration (Structured Summarization)
160
+ - `ENABLE_V4_STRUCTURED`: Enable V4 API (default: `true`)
161
+ - `ENABLE_V4_WARMUP`: Load model at startup (default: `false` to save memory)
162
+ - `V4_MODEL_ID`: Model to use (default: `Qwen/Qwen2.5-1.5B-Instruct`, alternative: `Qwen/Qwen2.5-3B-Instruct`)
163
+ - `V4_MAX_TOKENS`: Max tokens to generate (default: `256`, range: 128-2048)
164
+ - `V4_TEMPERATURE`: Sampling temperature (default: `0.2` for consistent output)
165
+ - `V4_ENABLE_QUANTIZATION`: Enable INT8 quantization on CPU or 4-bit NF4 on CUDA (default: `true`)
166
+ - `V4_USE_FP16_FOR_SPEED`: Use FP16 precision for 2-3x faster inference on GPU (default: `false`)
167
+
168
  ### Server Configuration
169
  - `SERVER_HOST`: Server host (default: `127.0.0.1`)
170
  - `SERVER_PORT`: Server port (default: `8000`)
 
192
  - Optimized for free tier resource limits
193
 
194
  **Environment Variables for HF Spaces:**
195
+
196
+ For memory-constrained deployments (free tier):
197
  ```bash
198
  ENABLE_V1_WARMUP=false
199
+ ENABLE_V2_WARMUP=false
200
+ ENABLE_V3_SCRAPING=true
201
+ ENABLE_V4_STRUCTURED=false
202
  HF_MODEL_ID=sshleifer/distilbart-cnn-6-6
203
  HF_HOME=/tmp/huggingface
204
  ```
205
 
206
+ For GPU-enabled deployments (paid tier with 16GB+ RAM):
207
+ ```bash
208
+ ENABLE_V1_WARMUP=false
209
+ ENABLE_V2_WARMUP=false
210
+ ENABLE_V3_SCRAPING=true
211
+ ENABLE_V4_STRUCTURED=true
212
+ ENABLE_V4_WARMUP=false
213
+ V4_MODEL_ID=Qwen/Qwen2.5-3B-Instruct
214
+ V4_ENABLE_QUANTIZATION=true
215
+ V4_USE_FP16_FOR_SPEED=true
216
+ ```
217
+
218
  ## πŸ“Š Performance
219
 
220
  ### V1 (Ollama + Transformers Pipeline)
 
236
  - **Total latency**: 2-5 seconds (scrape + summarize)
237
  - **Success rate**: 95%+ article extraction
238
 
239
+ ### V4 (Structured Summarization with Qwen)
240
+ - **V4 Models**: Qwen/Qwen2.5-1.5B-Instruct (default) or Qwen/Qwen2.5-3B-Instruct (higher quality)
241
+ - **Memory usage**:
242
+ - 1.5B model: ~2-3GB RAM (FP16 on GPU), ~1GB (4-bit NF4 on CUDA)
243
+ - 3B model: ~6-7GB RAM (FP16 on GPU), ~3-4GB (4-bit NF4 on CUDA)
244
+ - **Inference speed**:
245
+ - 1.5B model: 20-46 seconds per request
246
+ - 3B model: 40-60 seconds per request
247
+ - NDJSON streaming: 43% faster time-to-first-token
248
+ - **GPU acceleration**: CUDA > MPS (Apple Silicon) > CPU (4x speed difference)
249
+ - **Output format**: Structured JSON with 6 fields (title, summary, key_points, category, sentiment, read_time_min)
250
+ - **Styles**: executive, skimmer, eli5
251
+
252
  ### Memory Optimization
253
  - **V1 warmup disabled by default** (`ENABLE_V1_WARMUP=false`)
254
+ - **V2 warmup disabled by default** (`ENABLE_V2_WARMUP=false`)
255
+ - **V4 warmup disabled by default** (`ENABLE_V4_WARMUP=false`) - Saves 2-7GB RAM
256
+ - **HuggingFace Spaces deployment options**:
257
+ - V2-only: ~500MB (fits free tier)
258
+ - V2+V3: ~550MB (fits free tier)
259
+ - V2+V3+V4 (1.5B): ~3GB (requires paid tier)
260
+ - V2+V3+V4 (3B): ~7GB (requires paid tier)
261
+ - **Local development**: All versions can run simultaneously with 8-10GB RAM
262
+ - **GPU deployment**: V4 benefits significantly from CUDA or MPS acceleration
263
 
264
  ## πŸ› οΈ Development
265
 
 
406
  - Website blocks scrapers
407
  - User has already extracted the text manually
408
 
409
+ ### V4 API (Structured Output with Qwen) - High-Quality Summaries
410
+
411
+ **V4 supports two streaming formats and three summarization styles**
412
+
413
+ #### Streaming Format 1: Standard JSON Streaming (stream)
414
+ ```python
415
+ import requests
416
+ import json
417
+
418
+ # V4 scrape article from URL and stream structured JSON
419
+ response = requests.post(
420
+ "https://colin730-SummarizerApp.hf.space/api/v4/scrape-and-summarize/stream",
421
+ json={
422
+ "url": "https://example.com/article",
423
+ "style": "executive", # Options: "executive", "skimmer", "eli5"
424
+ "max_tokens": 256,
425
+ "include_metadata": True,
426
+ "use_cache": True
427
+ },
428
+ stream=True
429
+ )
430
+
431
+ for line in response.iter_lines():
432
+ if line.startswith(b'data: '):
433
+ data = json.loads(line[6:])
434
+
435
+ # First event: metadata
436
+ if data.get("type") == "metadata":
437
+ print(f"Style: {data['data']['style']}")
438
+ print(f"Scrape time: {data['data']['scrape_latency_ms']}ms\n")
439
+
440
+ # Content events (streaming JSON tokens)
441
+ elif "content" in data:
442
+ print(data["content"], end="")
443
+ if data["done"]:
444
+ # Parse final JSON
445
+ summary = json.loads(accumulated_content)
446
+ print(f"\n\nTitle: {summary['title']}")
447
+ print(f"Category: {summary['category']}")
448
+ print(f"Sentiment: {summary['sentiment']}")
449
+ print(f"Key Points: {summary['key_points']}")
450
+ break
451
+ ```
452
+
453
+ #### Streaming Format 2: NDJSON Patch Streaming (stream-ndjson) - 43% Faster
454
+ ```python
455
+ import requests
456
+ import json
457
+
458
+ # V4 NDJSON streaming - progressive JSON updates for real-time UI
459
+ response = requests.post(
460
+ "https://colin730-SummarizerApp.hf.space/api/v4/scrape-and-summarize/stream-ndjson",
461
+ json={
462
+ "text": "Your article text here (minimum 50 characters)...",
463
+ "style": "skimmer", # Brief, fact-focused summary
464
+ "max_tokens": 512,
465
+ "include_metadata": True
466
+ },
467
+ stream=True
468
+ )
469
+
470
+ summary = {}
471
+
472
+ for line in response.iter_lines():
473
+ if line.startswith(b'data: '):
474
+ event = json.loads(line[6:])
475
+
476
+ # First event: metadata
477
+ if event.get("type") == "metadata":
478
+ print(f"Input: {event['data']['input_type']}")
479
+ print(f"Style: {event['data']['style']}\n")
480
+
481
+ # NDJSON patch events
482
+ elif "delta" in event:
483
+ delta = event["delta"]
484
+ state = event["state"]
485
+
486
+ if delta and delta.get("op") == "set":
487
+ # Field set operation
488
+ field = delta["field"]
489
+ value = delta["value"]
490
+ summary[field] = value
491
+ print(f"{field}: {value}")
492
+
493
+ elif delta and delta.get("op") == "append":
494
+ # Array append operation
495
+ field = delta["field"]
496
+ value = delta["value"]
497
+ if field not in summary:
498
+ summary[field] = []
499
+ summary[field].append(value)
500
+ print(f"+ {field}: {value}")
501
+
502
+ elif delta and delta.get("op") == "done":
503
+ # Final state
504
+ print(f"\nβœ… Complete! Total time: {event.get('latency_ms', 0):.0f}ms")
505
+ print(f"Tokens used: {event.get('tokens_used', 0)}")
506
+ break
507
+ ```
508
+
509
+ #### Summarization Styles
510
+
511
+ **Executive Style** (`"executive"`):
512
+ - Target audience: Business professionals, decision makers
513
+ - Format: Concise, action-oriented, business impact focus
514
+ - Example output: Strategic insights, financial implications, market trends
515
+
516
+ **Skimmer Style** (`"skimmer"`):
517
+ - Target audience: Busy readers wanting quick facts
518
+ - Format: Bullet-point style, scannable, fact-dense
519
+ - Example output: Core facts, numbers, dates, names
520
+
521
+ **ELI5 Style** (`"eli5"`):
522
+ - Target audience: General public, non-technical readers
523
+ - Format: Simple explanations, analogies, relatable examples
524
+ - Example output: What it means, why it matters, real-world impact
525
+
526
+ #### V4 Output Schema
527
+
528
+ All V4 responses return structured JSON with these 6 fields:
529
+
530
+ ```json
531
+ {
532
+ "title": "Click-worthy title (<100 chars)",
533
+ "main_summary": "2-4 sentence summary (<500 chars)",
534
+ "key_points": [
535
+ "Key point 1",
536
+ "Key point 2",
537
+ "Key point 3"
538
+ ],
539
+ "category": "Technology",
540
+ "sentiment": "Positive",
541
+ "read_time_min": 5
542
+ }
543
+ ```
544
+
545
  ### Android Client (SSE)
546
  ```kotlin
547
  // Android SSE client example
 
596
  curl -X POST "https://colin730-SummarizerApp.hf.space/api/v3/scrape-and-summarize/stream" \
597
  -H "Content-Type: application/json" \
598
  -d '{"text": "Your article text here (minimum 50 characters)...", "max_tokens": 256}'
599
+
600
+ # V4 API - Standard JSON streaming (URL mode)
601
+ curl -X POST "https://colin730-SummarizerApp.hf.space/api/v4/scrape-and-summarize/stream" \
602
+ -H "Content-Type: application/json" \
603
+ -d '{"url": "https://example.com/article", "style": "executive", "max_tokens": 256}'
604
+
605
+ # V4 API - NDJSON patch streaming (Text mode) - 43% faster time-to-first-token
606
+ curl -X POST "https://colin730-SummarizerApp.hf.space/api/v4/scrape-and-summarize/stream-ndjson" \
607
+ -H "Content-Type: application/json" \
608
+ -d '{"text": "Your article text (minimum 50 chars)...", "style": "skimmer", "max_tokens": 512}'
609
  ```
610
 
611
  ### Test Script
 
618
 
619
  - Non-root user execution
620
  - Input validation and sanitization
621
+ - **SSRF protection**: V3 and V4 APIs block localhost and private IP ranges
622
+ - **Rate limiting**: Configurable per-IP rate limits for scraping endpoints
623
+ - **URL validation**: Strict URL format checking (HTTP/HTTPS only)
624
+ - **Content limits**: Maximum text lengths enforced (50,000 chars for V3/V4)
625
  - API key authentication (optional)
626
 
627
  ## πŸ“ˆ Monitoring
 
636
 
637
  ### Common Issues
638
 
639
+ 1. **Model not loading**: Check if Ollama is running and model is pulled (V1 only)
640
+ 2. **Out of memory**:
641
+ - V1: Ensure 2-4GB RAM available
642
+ - V2/V3: Ensure ~500-550MB RAM available
643
+ - V4 (1.5B): Ensure 2-3GB RAM available
644
+ - V4 (3B): Ensure 6-7GB RAM available
645
  3. **Slow startup**: Normal on first run due to model download
646
+ 4. **V4 slow inference**: Enable GPU acceleration (CUDA or MPS) and FP16 for 2-4x speedup
647
+ 5. **V4 quantization slow**: Quantization takes 1-2 minutes on startup; disable warmup to defer until first request
648
+ 6. **API errors**: Check logs via `/docs` endpoint
649
 
650
  ### Logs
651
  View application logs in the Hugging Face Spaces interface or check the health endpoint for service status.
 
669
  **Successfully deployed and tested on Hugging Face Spaces!** πŸš€
670
 
671
  - βœ… **Proxy-aware FastAPI** with `root_path` support
672
+ - βœ… **All endpoints working** (health, docs, V1-V4 APIs)
673
  - βœ… **Real-time streaming** summarization
674
+ - βœ… **Structured JSON output** with V4 API
675
+ - βœ… **GPU acceleration support** (CUDA, MPS, CPU fallback)
676
  - βœ… **No 404 errors** - all paths correctly configured
677
  - βœ… **Test script included** for easy verification
678
 
679
+ ### API Versions Available
680
+ - **V1**: Ollama + Transformers (requires external Ollama service)
681
+ - **V2**: HuggingFace streaming (lightweight, ~500MB)
682
+ - **V3**: Web scraping + Summarization (lightweight, ~550MB)
683
+ - **V4**: Structured output with Qwen (GPU-optimized, 2-7GB depending on model)
684
+
685
+ ### Recent Features
686
+ - Added V4 structured summarization API with Qwen models
687
+ - NDJSON patch streaming for 43% faster time-to-first-token
688
+ - Three summarization styles: executive, skimmer, eli5
689
+ - GPU optimization with CUDA/MPS/CPU auto-detection
690
+ - Automatic quantization (4-bit NF4, FP16, INT8)
691
+ - Rich metadata output (category, sentiment, reading time)
692
 
693
  **Live Space:** https://colin730-SummarizerApp.hf.space 🎯