Spaces:

vineethsaivs
/

article-summarizer

Sleeping

App Files Files Community

Vineeth Sai commited on Sep 1, 2025

Commit

501847e

0 Parent(s):

Initial deploy to HF Spaces (Docker)

Browse files

Files changed (14) hide show

## GitHub Copilot Chat.md +33 -0
.dockerignore +10 -0
.gitignore +21 -0
Dockerfile +44 -0
README.md +268 -0
app.py +617 -0
main.py +286 -0
model_test.py +52 -0
requirements.txt +16 -0
setup_web_app.sh +59 -0
start.sh +17 -0
summarize_qwen.py +143 -0
templates/index.html +528 -0
templates/index0.html +551 -0

## GitHub Copilot Chat.md ADDED Viewed

	@@ -0,0 +1,33 @@

+## GitHub Copilot Chat
+- Extension Version: 0.22.4 (prod)
+- VS Code: vscode/1.95.3
+- OS: Mac
+## Network
+User Settings:
+```json
+  "github.copilot.advanced": {
+    "debug.useElectronFetcher": true,
+    "debug.useNodeFetcher": false
+  }
+```
+Connecting to https://api.github.com:
+- DNS ipv4 Lookup: 140.82.116.5 (35 ms)
+- DNS ipv6 Lookup: 64:ff9b::8c52:7405 (17 ms)
+- Electron Fetcher (configured): HTTP 200 (125 ms)
+- Node Fetcher: HTTP 200 (86 ms)
+- Helix Fetcher: HTTP 200 (307 ms)
+Connecting to https://api.individual.githubcopilot.com/_ping:
+- DNS ipv4 Lookup: 140.82.114.22 (18 ms)
+- DNS ipv6 Lookup: 64:ff9b::8c52:7216 (18 ms)
+- Electron Fetcher (configured): HTTP 200 (233 ms)
+- Node Fetcher: HTTP 200 (256 ms)
+- Helix Fetcher: HTTP 200 (253 ms)
+## Documentation
+In corporate networks: [Troubleshooting firewall settings for GitHub Copilot](https://docs.github.com/en/copilot/troubleshooting-github-copilot/troubleshooting-firewall-settings-for-github-copilot).

.dockerignore ADDED Viewed

	@@ -0,0 +1,10 @@

+venv/
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+*.log
+.cache/
+.huggingface/
+.git/
+.gitignore

.gitignore ADDED Viewed

	@@ -0,0 +1,21 @@

+cat > .gitignore << 'EOF'
+# Python
+venv/
+__pycache__/
+*.pyc
+*.pyo
+*.pyd
+*.egg-info/
+.cache/
+# macOS
+.DS_Store
+# Audio & generated artifacts
+*.wav
+static/audio/*
+static/summaries/*
+# Git / tooling
+.git/
+EOF

Dockerfile ADDED Viewed

	@@ -0,0 +1,44 @@

+# Small, compatible base image
+FROM python:3.10-slim
+ENV PYTHONDONTWRITEBYTECODE=1 \
+    PYTHONUNBUFFERED=1 \
+    PIP_NO_CACHE_DIR=1 \
+    HF_HOME=/cache/hf \
+    TRANSFORMERS_CACHE=/cache/hf \
+    TORCH_HOME=/cache/torch \
+    PORT=7860 \
+    RUNNING_GUNICORN=1 \
+    # Optional: set 1 to allow proxy fallback for stubborn sites (non-paywalled).
+    ALLOW_PROXY_FALLBACK=0
+# System deps: espeak-ng for Kokoro phonemizer, sndfile for soundfile, ffmpeg optional
+RUN apt-get update && apt-get install -y --no-install-recommends \
+    espeak-ng ffmpeg libsndfile1 git build-essential \
+    && rm -rf /var/lib/apt/lists/*
+WORKDIR /app
+# Install Python deps first to leverage Docker layer caching
+COPY requirements.txt .
+RUN pip install --upgrade pip && pip install -r requirements.txt
+# (Optional but useful) Preload models during build so first start is snappy
+# If this step times out in your Space, just comment it out.
+RUN python - <<'PY'
+from transformers import AutoModelForCausalLM, AutoTokenizer
+print("Downloading Qwen/Qwen3-0.6B…")
+AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
+AutoModelForCausalLM.from_pretrained("Qwen/Qwen3-0.6B", torch_dtype="auto", device_map="auto")
+print("Priming Kokoro…")
+from kokoro import KPipeline
+KPipeline(lang_code="a")
+print("Preload done.")
+PY
+# Copy the app
+COPY . .
+# Flask will read PORT from env (default 7860). We already handle this in app.py.
+EXPOSE 7860
+CMD ["python", "app.py"]

README.md ADDED Viewed

	@@ -0,0 +1,268 @@

+# 🤖 AI Article Summarizer with Text-to-Speech
+A beautiful web application that scrapes articles from URLs, generates concise summaries using Qwen3-0.6B, and optionally converts them to speech using Kokoro TTS.
+## ✨ Features
+- **🌐 Web Scraping**: Extract clean article text from any URL
+- **🤖 AI Summarization**: Generate concise summaries using Qwen3-0.6B
+- **🎵 Text-to-Speech**: Convert summaries to natural speech with Kokoro TTS
+- **🎭 Multiple Voices**: Choose from 8 different voice options
+- **📱 Responsive Design**: Works on desktop and mobile devices
+- **⚡ Real-time Processing**: Live status updates and progress indicators
+## 🚀 Quick Start
+### 1. Clone/Download Files
+Create a new directory and save these files:
+- `app.py` - Flask web application
+- `templates/index.html` - Web interface
+- `setup_web_app.sh` - Setup script
+### 2. Run Setup Script
+```bash
+# Make setup script executable
+chmod +x setup_web_app.sh
+# Run setup (installs everything automatically)
+./setup_web_app.sh
+```
+### 3. Start the Web Application
+```bash
+# Activate virtual environment
+source venv/bin/activate
+# Run the Flask app
+python app.py
+```
+### 4. Open in Browser
+Navigate to: **http://localhost:5000**
+## 📋 Manual Setup (Alternative)
+If you prefer manual setup:
+### Prerequisites
+- Python 3.8+
+- macOS: `brew install espeak`
+- Linux: `sudo apt-get install espeak-ng`
+### Installation
+```bash
+# Create virtual environment
+python3 -m venv venv
+source venv/bin/activate
+# Install packages
+pip install Flask torch transformers trafilatura soundfile kokoro librosa
+# Create directories
+mkdir -p templates static/audio static/summaries
+```
+## 🎭 Available Voices
+| Voice | Type | Quality | Description |
+|-------|------|---------|-------------|
+| af_heart ❤️ | Female | A | Warm, best quality (default) |
+| af_bella 🔥 | Female | A- | Energetic, high quality |
+| af_nicole 🎧 | Female | B- | Professional tone |
+| am_michael | Male | C+ | Clear male voice |
+| am_fenrir | Male | C+ | Strong male voice |
+| af_sarah | Female | C+ | Gentle female voice |
+| bf_emma 🇬🇧 | Female | B- | British accent |
+| bm_george 🇬🇧 | Male | C | British male accent |
+## 🖥️ Web Interface Features
+### Main Interface
+- Clean, modern design with gradient background
+- Real-time model loading status
+- URL input with validation
+- Optional text-to-speech toggle
+- Voice selection with quality indicators
+### Processing
+- Live progress indicators
+- Error handling with user-friendly messages
+- Summary statistics (compression ratio, word count)
+- Timestamp tracking
+### Results
+- Beautiful summary display with syntax highlighting
+- Integrated audio player for TTS output
+- Downloadable audio files
+- Responsive design for mobile devices
+## 📊 Technical Details
+### AI Models Used
+- **Qwen3-0.6B**: 600M parameter language model for summarization
+- **Kokoro TTS**: 82M parameter text-to-speech model
+- **Trafilatura**: Web scraping and content extraction
+### Performance
+- **Model Loading**: ~1-2 minutes on first startup
+- **Summarization**: ~2-5 seconds per article
+- **TTS Generation**: ~1-3 seconds for typical summaries
+- **Memory Usage**: ~2-4GB RAM (depending on hardware)
+### File Structure
+```
+your-project/
+├── app.py                 # Flask web application
+├── templates/
+│   └── index.html         # Web interface
+├── static/
+│   ├── audio/            # Generated audio files
+│   └── summaries/        # Cached summaries (optional)
+├── venv/                 # Virtual environment
+└── requirements.txt      # Python dependencies
+```
+## 🔧 Configuration
+### Environment Variables (Optional)
+```bash
+export FLASK_ENV=development    # For debugging
+export FLASK_PORT=5000         # Custom port
+```
+### Model Configuration
+- Models are loaded automatically on startup
+- First run downloads ~1.2GB of model files
+- Models are cached locally for faster subsequent starts
+## 🐛 Troubleshooting
+### Common Issues
+**Models not loading**
+```bash
+# Check internet connection and disk space
+df -h
+ping huggingface.co
+```
+**espeak not found**
+```bash
+# macOS
+brew install espeak
+# Linux
+sudo apt-get install espeak-ng
+```
+**Permission errors**
+```bash
+# Make sure virtual environment is activated
+source venv/bin/activate
+which python  # Should show venv path
+```
+**Port already in use**
+```bash
+# Kill process using port 5000
+lsof -ti:5000 | xargs kill -9
+# Or use different port
+python app.py --port 5001
+```
+### Performance Optimization
+**For faster startup:**
+- Keep models in memory between requests
+- Use SSD storage for model cache
+- Ensure sufficient RAM (4GB+ recommended)
+**For better quality:**
+- Use `af_heart` or `af_bella` voices
+- Keep summaries under 500 words for best TTS quality
+- Use clean, well-formatted article URLs
+## 📱 Usage Tips
+1. **Best Article Sources**: News sites, blogs, Wikipedia work well
+2. **URL Format**: Use full URLs (https://example.com/article)
+3. **Summary Length**: Typically 100-300 words from longer articles
+4. **Audio Quality**: Higher-grade voices sound more natural
+5. **Mobile Use**: Interface is fully responsive
+## 🔒 Privacy & Security
+- No article content is stored permanently
+- Audio files are generated locally
+- No data is sent to external services (except model downloads)
+- All processing happens on your machine
+## 🚀 Advanced Usage
+### API Endpoints
+The web app exposes these endpoints:
+- `GET /` - Main interface
+- `GET /status` - Model loading status
+- `GET /voices` - Available voice list
+- `POST /process` - Process article (JSON)
+### Example API Usage
+```javascript
+fetch('/process', {
+    method: 'POST',
+    headers: {'Content-Type': 'application/json'},
+    body: JSON.stringify({
+        url: 'https://example.com/article',
+        generate_audio: true,
+        voice: 'af_heart'
+    })
+});
+```
+## 📈 Future Enhancements
+Possible improvements:
+- Multiple language support
+- Custom voice training
+- Batch processing
+- Summary length control
+- Export options (PDF, EPUB)
+- Integration with read-later apps
+## 🤝 Contributing
+Feel free to:
+- Report bugs
+- Suggest features
+- Submit improvements
+- Share cool use cases
+## 📄 License
+This project uses:
+- Qwen3-0.6B (Apache 2.0)
+- Kokoro TTS (Apache 2.0)
+- Flask (BSD)
+- Other open-source libraries
+## 🙏 Acknowledgments
+- **Qwen Team** for the summarization model
+- **hexgrad** for Kokoro TTS
+- **Trafilatura** for web scraping
+- **Flask** for the web framework
+---
+**Enjoy your AI-powered article summarizer!** 🎉
+For support or questions, check the troubleshooting section or open an issue.

app.py ADDED Viewed

	@@ -0,0 +1,617 @@

+# #!/usr/bin/env python3
+# """
+# Flask Web Application for Article Summarizer with TTS
+# """
+# from flask import Flask, render_template, request, jsonify, send_file, url_for
+# import os
+# import sys
+# import torch
+# import trafilatura
+# import soundfile as sf
+# import time
+# import threading
+# from datetime import datetime
+# from transformers import AutoModelForCausalLM, AutoTokenizer
+# from kokoro import KPipeline
+# import logging
+# # Configure logging
+# logging.basicConfig(level=logging.INFO)
+# logger = logging.getLogger(__name__)
+# app = Flask(__name__)
+# app.config['SECRET_KEY'] = 'your-secret-key-here'
+# # Global variables to store models (load once, use many times)
+# qwen_model = None
+# qwen_tokenizer = None
+# kokoro_pipeline = None
+# model_loading_status = {"loaded": False, "error": None}
+# # Create directories for generated files
+# os.makedirs("static/audio", exist_ok=True)
+# os.makedirs("static/summaries", exist_ok=True)
+# def load_models():
+#     """Load Qwen and Kokoro models on startup"""
+#     global qwen_model, qwen_tokenizer, kokoro_pipeline, model_loading_status
+#     try:
+#         logger.info("Loading Qwen3-0.6B model...")
+#         model_name = "Qwen/Qwen3-0.6B"
+#         qwen_tokenizer = AutoTokenizer.from_pretrained(model_name)
+#         qwen_model = AutoModelForCausalLM.from_pretrained(
+#             model_name,
+#             torch_dtype="auto",
+#             device_map="auto"
+#         )
+#         logger.info("Loading Kokoro TTS model...")
+#         kokoro_pipeline = KPipeline(lang_code='a')
+#         model_loading_status["loaded"] = True
+#         logger.info("All models loaded successfully!")
+#     except Exception as e:
+#         model_loading_status["error"] = str(e)
+#         logger.error(f"Failed to load models: {e}")
+# def scrape_article_text(url: str) -> tuple[str, str]:
+#     """
+#     Scrape article and return (content, error_message)
+#     """
+#     try:
+#         downloaded = trafilatura.fetch_url(url)
+#         if downloaded is None:
+#             return None, "Failed to download the article content."
+#         article_text = trafilatura.extract(downloaded, include_comments=False, include_tables=False)
+#         if article_text:
+#             return article_text, None
+#         else:
+#             return None, "Could not find main article text on the page."
+#     except Exception as e:
+#         return None, f"Error scraping article: {str(e)}"
+# def summarize_with_qwen(text: str) -> tuple[str, str]:
+#     """
+#     Generate summary and return (summary, error_message)
+#     """
+#     try:
+#         prompt = f"""
+#         Please provide a concise and clear summary of the following article.
+#         Focus on the main points, key findings, and conclusions. The summary should be
+#         easy to understand for someone who has not read the original text.
+#         ARTICLE:
+#         {text}
+#         """
+#         messages = [{"role": "user", "content": prompt}]
+#         text_input = qwen_tokenizer.apply_chat_template(
+#             messages,
+#             tokenize=False,
+#             add_generation_prompt=True,
+#             enable_thinking=False
+#         )
+#         model_inputs = qwen_tokenizer([text_input], return_tensors="pt").to(qwen_model.device)
+#         generated_ids = qwen_model.generate(
+#             **model_inputs,
+#             max_new_tokens=512,
+#             temperature=0.7,
+#             top_p=0.8,
+#             top_k=20
+#         )
+#         output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
+#         summary = qwen_tokenizer.decode(output_ids, skip_special_tokens=True).strip()
+#         return summary, None
+#     except Exception as e:
+#         return None, f"Error generating summary: {str(e)}"
+# def generate_speech(summary: str, voice: str) -> tuple[str, str, float]:
+#     """
+#     Generate speech and return (filename, error_message, duration)
+#     """
+#     try:
+#         generator = kokoro_pipeline(summary, voice=voice)
+#         audio_chunks = []
+#         total_duration = 0
+#         for i, (gs, ps, audio) in enumerate(generator):
+#             audio_chunks.append(audio)
+#             total_duration += len(audio) / 24000
+#         if len(audio_chunks) > 1:
+#             combined_audio = torch.cat(audio_chunks, dim=0)
+#         else:
+#             combined_audio = audio_chunks[0]
+#         # Generate unique filename
+#         timestamp = int(time.time())
+#         filename = f"summary_{timestamp}.wav"
+#         filepath = os.path.join("static", "audio", filename)
+#         sf.write(filepath, combined_audio.numpy(), 24000)
+#         return filename, None, total_duration
+#     except Exception as e:
+#         return None, f"Error generating speech: {str(e)}", 0
+# @app.route('/')
+# def index():
+#     """Main page"""
+#     return render_template('index.html')
+# @app.route('/status')
+# def status():
+#     """Check if models are loaded"""
+#     return jsonify(model_loading_status)
+# @app.route('/process', methods=['POST'])
+# def process_article():
+#     """Process article URL - scrape, summarize, and optionally generate speech"""
+#     if not model_loading_status["loaded"]:
+#         return jsonify({
+#             "success": False,
+#             "error": "Models not loaded yet. Please wait."
+#         })
+#     data = request.get_json()
+#     url = data.get('url', '').strip()
+#     generate_audio = data.get('generate_audio', False)
+#     voice = data.get('voice', 'af_heart')
+#     if not url:
+#         return jsonify({"success": False, "error": "Please provide a valid URL."})
+#     # Step 1: Scrape article
+#     article_content, scrape_error = scrape_article_text(url)
+#     if scrape_error:
+#         return jsonify({"success": False, "error": scrape_error})
+#     # Step 2: Generate summary
+#     summary, summary_error = summarize_with_qwen(article_content)
+#     if summary_error:
+#         return jsonify({"success": False, "error": summary_error})
+#     # Prepare response
+#     response_data = {
+#         "success": True,
+#         "summary": summary,
+#         "article_length": len(article_content),
+#         "summary_length": len(summary),
+#         "compression_ratio": round(len(summary) / len(article_content) * 100, 1),
+#         "timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S")
+#     }
+#     # Step 3: Generate speech if requested
+#     if generate_audio:
+#         audio_filename, audio_error, duration = generate_speech(summary, voice)
+#         if audio_error:
+#             response_data["audio_error"] = audio_error
+#         else:
+#             response_data["audio_file"] = f"/static/audio/{audio_filename}"
+#             response_data["audio_duration"] = round(duration, 2)
+#     return jsonify(response_data)
+# @app.route('/voices')
+# def get_voices():
+#     """Get available voice options"""
+#     voices = [
+#         {"id": "af_heart", "name": "Female - Heart", "grade": "A", "description": "❤️ Warm female voice (best quality)"},
+#         {"id": "af_bella", "name": "Female - Bella", "grade": "A-", "description": "🔥 Energetic female voice"},
+#         {"id": "af_nicole", "name": "Female - Nicole", "grade": "B-", "description": "🎧 Professional female voice"},
+#         {"id": "am_michael", "name": "Male - Michael", "grade": "C+", "description": "Clear male voice"},
+#         {"id": "am_fenrir", "name": "Male - Fenrir", "grade": "C+", "description": "Strong male voice"},
+#         {"id": "af_sarah", "name": "Female - Sarah", "grade": "C+", "description": "Gentle female voice"},
+#         {"id": "bf_emma", "name": "British Female - Emma", "grade": "B-", "description": "🇬🇧 British accent"},
+#         {"id": "bm_george", "name": "British Male - George", "grade": "C", "description": "🇬🇧 British male voice"}
+#     ]
+#     return jsonify(voices)
+# # Kick off model loading when running under Gunicorn/containers
+# if os.environ.get("RUNNING_GUNICORN", "0") == "1":
+#     threading.Thread(target=load_models, daemon=True).start()
+# if __name__ == '__main__':
+#     import argparse
+#     # Parse command line arguments
+#     parser = argparse.ArgumentParser(description='AI Article Summarizer Web App')
+#     parser.add_argument('--port', type=int, default=5001, help='Port to run the server on (default: 5001)')
+#     parser.add_argument('--host', type=str, default='0.0.0.0', help='Host to bind to (default: 0.0.0.0)')
+#     args = parser.parse_args()
+#     # Load models in background thread
+#     threading.Thread(target=load_models, daemon=True).start()
+#     # Run Flask app
+#     print("🚀 Starting Article Summarizer Web App...")
+#     print("📚 Models are loading in the background...")
+#     print(f"🌐 Open http://localhost:{args.port} in your browser")
+#     try:
+#         app.run(debug=True, host=args.host, port=args.port)
+#     except OSError as e:
+#         if "Address already in use" in str(e):
+#             print(f"❌ Port {args.port} is already in use!")
+#             print("💡 Try a different port:")
+#             print(f"   python app.py --port {args.port + 1}")
+#             print("📱 Or disable AirPlay Receiver in System Preferences → General → AirDrop & Handoff")
+#         else:
+#             raise
+#!/usr/bin/env python3
+"""
+Flask Web Application for Article Summarizer with TTS
+"""
+from flask import Flask, render_template, request, jsonify
+import os
+import time
+import threading
+import logging
+from datetime import datetime
+import re
+import torch
+import trafilatura
+import soundfile as sf
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from kokoro import KPipeline
+import requests  # ensure requests>=2.32.0 in requirements.txt
+# ---------------- Logging ----------------
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger("summarizer")
+# ---------------- Flask ----------------
+app = Flask(__name__)
+app.config["SECRET_KEY"] = os.environ.get("SECRET_KEY", "change-me")
+# ---------------- Globals ----------------
+qwen_model = None
+qwen_tokenizer = None
+kokoro_pipeline = None
+model_loading_status = {"loaded": False, "error": None}
+_load_lock = threading.Lock()
+_loaded_once = False  # idempotence guard across threads
+# Voice whitelist
+ALLOWED_VOICES = {
+    "af_heart", "af_bella", "af_nicole", "am_michael",
+    "am_fenrir", "af_sarah", "bf_emma", "bm_george"
+}
+# HTTP headers to look like a real browser for sites that block bots
+BROWSER_HEADERS = {
+    "User-Agent": (
+        "Mozilla/5.0 (Macintosh; Intel Mac OS X 13_5) AppleWebKit/537.36 "
+        "(KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36"
+    ),
+    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
+    "Accept-Language": "en-US,en;q=0.9",
+}
+# Create output dirs
+os.makedirs("static/audio", exist_ok=True)
+os.makedirs("static/summaries", exist_ok=True)
+# ---------------- Helpers ----------------
+def _get_device():
+    # Works for both CPU/GPU; safer than qwen_model.device
+    return next(qwen_model.parameters()).device
+def _safe_trim_to_tokens(text: str, tokenizer, max_tokens: int) -> str:
+    ids = tokenizer.encode(text, add_special_tokens=False)
+    if len(ids) <= max_tokens:
+        return text
+    ids = ids[:max_tokens]
+    return tokenizer.decode(ids, skip_special_tokens=True)
+# Remove any leaked <think>…</think> (with optional attributes) or similar tags
+_THINK_BLOCK_RE = re.compile(
+    r"<\s*(think|reasoning|thought)\b[^>]*>.*?<\s*/\s*\1\s*>",
+    re.IGNORECASE | re.DOTALL,
+)
+_THINK_TAGS_RE = re.compile(r"</?\s*(think|reasoning|thought)\b[^>]*>", re.IGNORECASE)
+def _strip_reasoning(text: str) -> str:
+    cleaned = _THINK_BLOCK_RE.sub("", text)          # remove full blocks
+    cleaned = _THINK_TAGS_RE.sub("", cleaned)        # remove any stray tags
+    # optionally collapse leftover triple-backtick blocks that only had think text
+    cleaned = re.sub(r"```(?:\w+)?\s*```", "", cleaned)
+    return cleaned.strip()
+def _normalize_url_for_proxy(u: str) -> str:
+    # r.jina.ai expects 'http://<host>/<path>' after it; unify scheme-less
+    u2 = u.replace("https://", "").replace("http://", "")
+    return f"https://r.jina.ai/http://{u2}"
+# ---------------- Model Load ----------------
+def load_models():
+    """Load Qwen and Kokoro models on startup (idempotent)."""
+    global qwen_model, qwen_tokenizer, kokoro_pipeline, model_loading_status, _loaded_once
+    with _load_lock:
+        if _loaded_once:
+            return
+        try:
+            logger.info("Loading Qwen3-0.6B…")
+            model_name = "Qwen/Qwen3-0.6B"
+            qwen_tokenizer = AutoTokenizer.from_pretrained(model_name)
+            qwen_model = AutoModelForCausalLM.from_pretrained(
+                model_name,
+                torch_dtype="auto",
+                device_map="auto",  # CPU or GPU automatically
+            )
+            qwen_model.eval()  # inference mode
+            logger.info("Loading Kokoro TTS…")
+            kokoro_pipeline = KPipeline(lang_code="a")
+            model_loading_status["loaded"] = True
+            model_loading_status["error"] = None
+            _loaded_once = True
+            logger.info("✅ Models ready")
+        except Exception as e:
+            err = f"{type(e).__name__}: {e}"
+            model_loading_status["loaded"] = False
+            model_loading_status["error"] = err
+            logger.exception("Failed to load models: %s", err)
+# ---------------- Core Logic ----------------
+def scrape_article_text(url: str) -> tuple[str | None, str | None]:
+    """
+    Try to fetch & extract article text.
+    Strategy:
+      1) Trafilatura.fetch_url (vanilla)
+      2) requests.get with browser headers + trafilatura.extract
+      3) (optional) Proxy fallback if ALLOW_PROXY_FALLBACK=1
+    Returns (content, error)
+    """
+    try:
+        # --- 1) Direct fetch via Trafilatura ---
+        downloaded = trafilatura.fetch_url(url)
+        if downloaded:
+            text = trafilatura.extract(downloaded, include_comments=False, include_tables=False)
+            if text:
+                return text, None
+        # --- 2) Raw requests + Trafilatura extract ---
+        try:
+            r = requests.get(url, headers=BROWSER_HEADERS, timeout=15)
+            if r.status_code == 200 and r.text:
+                text = trafilatura.extract(r.text, include_comments=False, include_tables=False, url=url)
+                if text:
+                    return text, None
+            elif r.status_code == 403:
+                logger.info("Site returned 403; considering proxy fallback (if enabled).")
+        except requests.RequestException as e:
+            logger.info("requests.get failed: %s", e)
+        # --- 3) Optional proxy fallback (off by default) ---
+        if os.environ.get("ALLOW_PROXY_FALLBACK", "0") == "1":
+            proxy_url = _normalize_url_for_proxy(url)
+            try:
+                pr = requests.get(proxy_url, headers=BROWSER_HEADERS, timeout=15)
+                if pr.status_code == 200 and pr.text:
+                    extracted = trafilatura.extract(pr.text) or pr.text
+                    if extracted and extracted.strip():
+                        return extracted.strip(), None
+            except requests.RequestException as e:
+                logger.info("Proxy fallback failed: %s", e)
+        return None, (
+            "Failed to download the article content (site may block automated fetches). "
+            "Try another URL, paste the text manually, or set ALLOW_PROXY_FALLBACK=1."
+        )
+    except Exception as e:
+        return None, f"Error scraping article: {e}"
+def summarize_with_qwen(text: str) -> tuple[str | None, str | None]:
+    """Generate summary and return (summary, error)."""
+    try:
+        # Budget input tokens based on max context; fallback to 4096
+        try:
+            max_ctx = int(getattr(qwen_model.config, "max_position_embeddings", 4096))
+        except Exception:
+            max_ctx = 4096
+        # Leave room for prompt + output tokens
+        max_input_tokens = max(512, max_ctx - 1024)
+        prompt_hdr = (
+            "Please provide a concise and clear summary of the following article. "
+            "Focus on the main points, key findings, and conclusions. "
+            "Keep it easy to understand for someone who hasn't read the original.\n\nARTICLE:\n"
+        )
+        # Trim article to safe length
+        article_trimmed = _safe_trim_to_tokens(text, qwen_tokenizer, max_input_tokens)
+        user_content = prompt_hdr + article_trimmed
+        messages = [
+            {
+                "role": "system",
+                "content": (
+                    "You are a helpful assistant. Return ONLY the final summary as plain text. "
+                    "Do not include analysis, steps, or <think> tags."
+                ),
+            },
+            {"role": "user", "content": user_content},  # <-- important: pass the TRIMMED content
+        ]
+        # Build the chat prompt text (disable thinking if supported)
+        try:
+            text_input = qwen_tokenizer.apply_chat_template(
+                messages, tokenize=False, add_generation_prompt=True, enable_thinking=False
+            )
+        except TypeError:
+            text_input = qwen_tokenizer.apply_chat_template(
+                messages, tokenize=False, add_generation_prompt=True
+            )
+        device = _get_device()
+        model_inputs = qwen_tokenizer([text_input], return_tensors="pt").to(device)
+        with torch.inference_mode():
+            generated_ids = qwen_model.generate(
+                **model_inputs,
+                max_new_tokens=512,
+                temperature=0.7,
+                top_p=0.8,
+                top_k=20,
+                do_sample=True,
+            )
+        output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
+        summary = qwen_tokenizer.decode(output_ids, skip_special_tokens=True).strip()
+        summary = _strip_reasoning(summary)  # <-- remove any leaked <think>…</think>
+        return summary, None
+    except Exception as e:
+        return None, f"Error generating summary: {e}"
+def generate_speech(summary: str, voice: str) -> tuple[str | None, str | None, float]:
+    """Generate speech and return (filename, error, duration_seconds)."""
+    try:
+        if voice not in ALLOWED_VOICES:
+            voice = "af_heart"
+        generator = kokoro_pipeline(summary, voice=voice)
+        audio_chunks = []
+        total_duration = 0.0
+        for _, _, audio in generator:
+            audio_chunks.append(audio)
+            total_duration += len(audio) / 24000.0
+        if not audio_chunks:
+            return None, "No audio generated.", 0.0
+        combined = audio_chunks[0] if len(audio_chunks) == 1 else torch.cat(audio_chunks, dim=0)
+        ts = int(time.time())
+        filename = f"summary_{ts}.wav"
+        filepath = os.path.join("static", "audio", filename)
+        sf.write(filepath, combined.numpy(), 24000)
+        return filename, None, total_duration
+    except Exception as e:
+        return None, f"Error generating speech: {e}", 0.0
+# ---------------- Routes ----------------
+@app.route("/")
+def index():
+    return render_template("index.html")
+@app.route("/status")
+def status():
+    return jsonify(model_loading_status)
+@app.route("/process", methods=["POST"])
+def process_article():
+    if not model_loading_status["loaded"]:
+        return jsonify({"success": False, "error": "Models not loaded yet. Please wait."})
+    data = request.get_json(force=True, silent=True) or {}
+    url = (data.get("url") or "").strip()
+    generate_audio = bool(data.get("generate_audio", False))
+    voice = (data.get("voice") or "af_heart").strip()
+    if not url:
+        return jsonify({"success": False, "error": "Please provide a valid URL."})
+    # 1) Scrape
+    article_content, scrape_error = scrape_article_text(url)
+    if scrape_error:
+        return jsonify({"success": False, "error": scrape_error})
+    # 2) Summarize
+    summary, summary_error = summarize_with_qwen(article_content)
+    if summary_error:
+        return jsonify({"success": False, "error": summary_error})
+    resp = {
+        "success": True,
+        "summary": summary,
+        "article_length": len(article_content or ""),
+        "summary_length": len(summary or ""),
+        "compression_ratio": round(len(summary) / max(len(article_content), 1) * 100, 1),
+        "timestamp": datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
+    }
+    # 3) TTS
+    if generate_audio:
+        audio_filename, audio_error, duration = generate_speech(summary, voice)
+        if audio_error:
+            resp["audio_error"] = audio_error
+        else:
+            resp["audio_file"] = f"/static/audio/{audio_filename}"
+            resp["audio_duration"] = round(duration, 2)
+    return jsonify(resp)
+@app.route("/voices")
+def get_voices():
+    voices = [
+        {"id": "af_heart",   "name": "Female - Heart",   "grade": "A",  "description": "❤️ Warm female voice (best quality)"},
+        {"id": "af_bella",   "name": "Female - Bella",   "grade": "A-", "description": "🔥 Energetic female voice"},
+        {"id": "af_nicole",  "name": "Female - Nicole",  "grade": "B-", "description": "🎧 Professional female voice"},
+        {"id": "am_michael", "name": "Male - Michael",   "grade": "C+", "description": "Clear male voice"},
+        {"id": "am_fenrir",  "name": "Male - Fenrir",    "grade": "C+", "description": "Strong male voice"},
+        {"id": "af_sarah",   "name": "Female - Sarah",   "grade": "C+", "description": "Gentle female voice"},
+        {"id": "bf_emma",    "name": "British Female - Emma", "grade": "B-", "description": "🇬🇧 British accent"},
+        {"id": "bm_george",  "name": "British Male - George", "grade": "C",  "description": "🇬🇧 British male voice"},
+    ]
+    return jsonify(voices)
+# Kick off model loading when running under Gunicorn/containers
+if os.environ.get("RUNNING_GUNICORN", "0") == "1":
+    threading.Thread(target=load_models, daemon=True).start()
+# ---------------- Dev entrypoint ----------------
+if __name__ == "__main__":
+    import argparse
+    parser = argparse.ArgumentParser(description="AI Article Summarizer Web App")
+    parser.add_argument("--port", type=int, default=5001, help="Port to run the server on (default: 5001)")
+    parser.add_argument("--host", type=str, default="0.0.0.0", help="Host to bind to (default: 0.0.0.0)")
+    args = parser.parse_args()
+    # Load models in background thread
+    threading.Thread(target=load_models, daemon=True).start()
+    # Respect platform env PORT when present
+    port = int(os.environ.get("PORT", args.port))
+    print("🚀 Starting Article Summarizer Web App…")
+    print("📚 Models are loading in the background…")
+    print(f"🌐 Open http://localhost:{port} in your browser")
+    try:
+        app.run(debug=True, host=args.host, port=port)
+    except OSError as e:
+        if "Address already in use" in str(e):
+            print(f"❌ Port {port} is already in use!")
+            print("💡 Try a different port:")
+            print(f"   python app.py --port {port + 1}")
+            print("📱 Or disable AirPlay Receiver in System Settings → General → AirDrop & Handoff")
+        else:
+            raise

main.py ADDED Viewed

	@@ -0,0 +1,286 @@

+# #!/usr/bin/env python3
+# """
+# Article Summarizer with Text-to-Speech
+# Scrapes articles, summarizes with Qwen3-0.6B, and reads aloud with Kokoro TTS
+# """
+# import sys
+# import torch
+# import trafilatura
+# import soundfile as sf
+# import time
+# from transformers import AutoModelForCausalLM, AutoTokenizer
+# from kokoro import KPipeline
+# # --- Part 1: Web Scraping Function ---
+# def scrape_article_text(url: str) -> str | None:
+#     """
+#     Downloads a webpage and extracts the main article text, removing ads,
+#     menus, and other boilerplate.
+#     Args:
+#         url: The URL of the article to scrape.
+#     Returns:
+#         The cleaned article text as a string, or None if it fails.
+#     """
+#     print(f"🌐 Scraping article from: {url}")
+#     # fetch_url downloads the content of the URL
+#     downloaded = trafilatura.fetch_url(url)
+#     if downloaded is None:
+#         print("❌ Error: Failed to download the article content.")
+#         return None
+#     # extract the main text, ignoring comments and tables for a cleaner summary
+#     article_text = trafilatura.extract(downloaded, include_comments=False, include_tables=False)
+#     if article_text:
+#         print("✅ Successfully extracted article text.")
+#         return article_text
+#     else:
+#         print("❌ Error: Could not find main article text on the page.")
+#         return None
+# # --- Part 2: Summarization Function ---
+# def summarize_with_qwen(text: str, model, tokenizer) -> str:
+#     """
+#     Generates a summary for the given text using the Qwen3-0.6B model.
+#     Args:
+#         text: The article text to summarize.
+#         model: The pre-loaded transformer model.
+#         tokenizer: The pre-loaded tokenizer.
+#     Returns:
+#         The generated summary as a string.
+#     """
+#     print("🤖 Summarizing text with Qwen3-0.6B...")
+#     # 1. Create a detailed prompt for the summarization task
+#     prompt = f"""
+#     Please provide a concise and clear summary of the following article.
+#     Focus on the main points, key findings, and conclusions. The summary should be
+#     easy to understand for someone who has not read the original text.
+#     ARTICLE:
+#     {text}
+#     """
+#     messages = [{"role": "user", "content": prompt}]
+#     # 2. Apply the chat template. We set `enable_thinking=False` for direct summarization.
+#     # This is more efficient than the default reasoning mode for this task.
+#     text_input = tokenizer.apply_chat_template(
+#         messages,
+#         tokenize=False,
+#         add_generation_prompt=True,
+#         enable_thinking=False
+#     )
+#     # 3. Tokenize the formatted prompt and move it to the correct device (CPU or MPS on Mac)
+#     model_inputs = tokenizer([text_input], return_tensors="pt").to(model.device)
+#     # 4. Generate the summary using parameters recommended for non-thinking mode
+#     generated_ids = model.generate(
+#         **model_inputs,
+#         max_new_tokens=512,  # Limit summary length
+#         temperature=0.7,
+#         top_p=0.8,
+#         top_k=20
+#     )
+#     # 5. Slice the output to remove the input prompt, leaving only the generated response
+#     output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
+#     # 6. Decode the token IDs back into a readable string
+#     summary = tokenizer.decode(output_ids, skip_special_tokens=True).strip()
+#     print("✅ Summary generated successfully.")
+#     return summary
+# # --- Part 3: Text-to-Speech Function ---
+# def speak_summary_with_kokoro(summary: str, voice: str = "af_heart") -> str:
+#     """
+#     Converts the summary text to speech using Kokoro TTS and saves as audio file.
+#     Args:
+#         summary: The text summary to convert to speech.
+#         voice: The voice to use (default: "af_heart").
+#     Returns:
+#         The filename of the generated audio file.
+#     """
+#     print("🎵 Converting summary to speech with Kokoro TTS...")
+#     try:
+#         # Initialize Kokoro TTS pipeline
+#         pipeline = KPipeline(lang_code='a')  # 'a' for English
+#         # Generate speech
+#         generator = pipeline(summary, voice=voice)
+#         # Process audio chunks
+#         audio_chunks = []
+#         total_duration = 0
+#         for i, (gs, ps, audio) in enumerate(generator):
+#             audio_chunks.append(audio)
+#             chunk_duration = len(audio) / 24000
+#             total_duration += chunk_duration
+#             print(f"  📊 Generated chunk {i+1}: {chunk_duration:.2f}s")
+#         # Combine all audio chunks
+#         if len(audio_chunks) > 1:
+#             combined_audio = torch.cat(audio_chunks, dim=0)
+#         else:
+#             combined_audio = audio_chunks[0]
+#         # Generate filename with timestamp
+#         timestamp = int(time.time())
+#         filename = f"summary_audio_{timestamp}.wav"
+#         # Save audio file
+#         sf.write(filename, combined_audio.numpy(), 24000)
+#         print(f"✅ Audio generated successfully!")
+#         print(f"💾 Saved as: {filename}")
+#         print(f"⏱️  Duration: {total_duration:.2f} seconds")
+#         print(f"🎭 Voice used: {voice}")
+#         return filename
+#     except Exception as e:
+#         print(f"❌ Error generating speech: {e}")
+#         return None
+# # --- Part 4: Voice Selection Function ---
+# def select_voice() -> str:
+#     """
+#     Allows user to select from available voices or use default.
+#     Returns:
+#         Selected voice name.
+#     """
+#     available_voices = {
+#         '1': ('af_heart', 'Female - Heart (Grade A, default) ❤️'),
+#         '2': ('af_bella', 'Female - Bella (Grade A-) 🔥'),
+#         '3': ('af_nicole', 'Female - Nicole (Grade B-) 🎧'),
+#         '4': ('am_michael', 'Male - Michael (Grade C+)'),
+#         '5': ('am_fenrir', 'Male - Fenrir (Grade C+)'),
+#         '6': ('af_sarah', 'Female - Sarah (Grade C+)'),
+#         '7': ('bf_emma', 'British Female - Emma (Grade B-)'),
+#         '8': ('bm_george', 'British Male - George (Grade C)')
+#     }
+#     print("\n🎭 Available voices (sorted by quality):")
+#     for key, (voice_id, description) in available_voices.items():
+#         print(f"  {key}. {description}")
+#     print("  Enter: Use default voice (af_heart)")
+#     choice = input("\nSelect voice (1-8 or Enter): ").strip()
+#     if choice in available_voices:
+#         selected_voice, description = available_voices[choice]
+#         print(f"🎵 Selected: {description}")
+#         return selected_voice
+#     else:
+#         print("🎵 Using default voice: Female - Heart")
+#         return 'af_heart'
+# # --- Main Execution Block ---
+# if __name__ == "__main__":
+#     print("🚀 Article Summarizer with Text-to-Speech")
+#     print("=" * 50)
+#     # Check if a URL was provided as a command-line argument
+#     if len(sys.argv) < 2:
+#         print("Usage: python qwen_kokoro_summarizer.py <URL_OF_ARTICLE>")
+#         print("Example: python qwen_kokoro_summarizer.py https://example.com/article")
+#         sys.exit(1)
+#     article_url = sys.argv[1]
+#     # --- Load Qwen Model and Tokenizer ---
+#     print("\n📚 Setting up the Qwen3-0.6B model...")
+#     print("Note: The first run will download the model (~1.2 GB). Please be patient.")
+#     model_name = "Qwen/Qwen3-0.6B"
+#     try:
+#         tokenizer = AutoTokenizer.from_pretrained(model_name)
+#         model = AutoModelForCausalLM.from_pretrained(
+#             model_name,
+#             torch_dtype="auto",  # Automatically selects precision (e.g., float16)
+#             device_map="auto"    # Automatically uses MPS (Mac GPU) if available
+#         )
+#     except Exception as e:
+#         print(f"❌ Failed to load the Qwen model. Error: {e}")
+#         print("Please ensure you have a stable internet connection and sufficient disk space.")
+#         sys.exit(1)
+#     # Inform the user which device is being used
+#     device = next(model.parameters()).device
+#     print(f"✅ Qwen model loaded successfully on device: {str(device).upper()}")
+#     if "mps" in str(device):
+#         print("   (Running on Apple Silicon GPU)")
+#     # --- Run the Complete Process ---
+#     # Step 1: Scrape the article
+#     print(f"\n📰 Step 1: Scraping article")
+#     article_content = scrape_article_text(article_url)
+#     if not article_content:
+#         print("❌ Failed to scrape article. Exiting.")
+#         sys.exit(1)
+#     # Step 2: Summarize the content
+#     print(f"\n🤖 Step 2: Generating summary")
+#     summary = summarize_with_qwen(article_content, model, tokenizer)
+#     # Step 3: Display the summary
+#     print("\n" + "="*60)
+#     print("✨ GENERATED SUMMARY ✨")
+#     print("="*60)
+#     print(summary)
+#     print("="*60)
+#     # Step 4: Ask if user wants TTS
+#     print(f"\n🎵 Step 3: Text-to-Speech")
+#     tts_choice = input("Would you like to hear the summary read aloud? (y/N): ").strip().lower()
+#     if tts_choice in ['y', 'yes']:
+#         # Let user select voice
+#         selected_voice = select_voice()
+#         # Generate speech
+#         audio_filename = speak_summary_with_kokoro(summary, voice=selected_voice)
+#         if audio_filename:
+#             print(f"\n🎧 Audio saved as: {audio_filename}")
+#             print("🔊 You can now play this file to hear the summary!")
+#             # Optional: Try to play the audio automatically (macOS)
+#             try:
+#                 import subprocess
+#                 print("🎶 Attempting to play audio automatically...")
+#                 subprocess.run(['afplay', audio_filename], check=True)
+#                 print("✅ Audio playback completed!")
+#             except (subprocess.CalledProcessError, FileNotFoundError):
+#                 print("ℹ️  Auto-play not available. Please play the file manually.")
+#         else:
+#             print("❌ Failed to generate audio.")
+#     else:
+#         print("👍 Summary completed without audio generation.")
+#     print(f"\n🎉 Process completed successfully!")
+#     print(f"📝 Summary length: {len(summary)} characters")
+#     print(f"📊 Original article length: {len(article_content)} characters")
+#     print(f"📉 Compression ratio: {len(summary)/len(article_content)*100:.1f}%")

model_test.py ADDED Viewed

	@@ -0,0 +1,52 @@

+#!/usr/bin/env python3
+"""
+Simple Kokoro TTS Test Script
+Run this after installing dependencies
+"""
+from kokoro import KPipeline
+import soundfile as sf
+import time
+def main():
+    print("🎵 Starting Kokoro TTS test...")
+    # Initialize the model
+    print("📦 Loading model...")
+    start_time = time.time()
+    pipeline = KPipeline(lang_code='a')  # 'a' for English
+    load_time = time.time() - start_time
+    print(f"✅ Model loaded in {load_time:.2f} seconds")
+    # Test text
+    text = "Hello! This is a test of Kokoro text-to-speech. The model sounds quite natural!"
+    # Generate speech
+    print("🗣️ Generating speech...")
+    gen_start = time.time()
+    generator = pipeline(text, voice='af_heart')
+    # Process and save audio
+    for i, (gs, ps, audio) in enumerate(generator):
+        gen_time = time.time() - gen_start
+        duration = len(audio) / 24000  # seconds
+        # Save audio file
+        filename = f"kokoro_test_output_{i}.wav"
+        sf.write(filename, audio, 24000)
+        print(f"✅ Generated {duration:.2f}s of audio in {gen_time:.2f}s")
+        print(f"💾 Saved as: {filename}")
+        print(f"⚡ Real-time factor: {gen_time/duration:.2f}x")
+    print("🎉 Test completed successfully!")
+    print("🎧 Play the generated .wav file to hear the result")
+if __name__ == "__main__":
+    try:
+        main()
+    except Exception as e:
+        print(f"❌ Error: {e}")
+        print("\n🔧 Make sure you've installed all dependencies:")
+        print("   brew install espeak")
+        print("   pip install torch torchaudio kokoro>=0.9.2 soundfile")

requirements.txt ADDED Viewed

	@@ -0,0 +1,16 @@

+# === Runtime ===
+Flask==2.3.3
+gunicorn==21.2.0
+# === ML / TTS ===
+transformers>=4.41.0
+accelerate>=0.31.0
+safetensors>=0.4.3
+torch>=2.2.0
+trafilatura>=2.0.0
+soundfile>=0.12.1
+kokoro>=0.9.4
+numpy>=1.24.0
+scipy>=1.10.0
+librosa>=0.10.0.post2
+huggingface-hub>=0.23.0

setup_web_app.sh ADDED Viewed

	@@ -0,0 +1,59 @@

+#!/bin/bash
+echo "🚀 Setting up AI Article Summarizer Web App"
+echo "============================================="
+# Create project structure
+echo "📁 Creating project structure..."
+mkdir -p templates static/audio static/summaries
+# Create requirements.txt
+echo "📝 Creating requirements.txt..."
+cat > requirements.txt << EOF
+Flask==2.3.3
+torch>=2.0.0
+transformers>=4.30.0
+trafilatura>=1.6.0
+soundfile>=0.12.1
+kokoro>=0.9.2
+librosa>=0.10.0
+numpy>=1.24.0
+scipy>=1.10.0
+EOF
+# Check if virtual environment exists
+if [ ! -d "venv" ]; then
+    echo "🐍 Creating virtual environment..."
+    python3 -m venv venv
+fi
+echo "🔄 Activating virtual environment..."
+source venv/bin/activate
+echo "📦 Installing Python packages..."
+pip install --upgrade pip
+pip install -r requirements.txt
+# Install system dependencies (macOS)
+if [[ "$OSTYPE" == "darwin"* ]]; then
+    echo "🍎 Installing espeak for macOS..."
+    if ! command -v brew &> /dev/null; then
+        echo "❌ Homebrew not found. Please install Homebrew first:"
+        echo "   /bin/bash -c \"\$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)\""
+        exit 1
+    fi
+    brew install espeak
+elif [[ "$OSTYPE" == "linux-gnu"* ]]; then
+    echo "🐧 Installing espeak for Linux..."
+    sudo apt-get update && sudo apt-get install -y espeak-ng
+fi
+echo "✅ Setup complete!"
+echo ""
+echo "🌟 To run the web application:"
+echo "   1. Activate virtual environment: source venv/bin/activate"
+echo "   2. Run the app: python app.py"
+echo "   3. Open http://localhost:5000 in your browser"
+echo ""
+echo "📝 Note: The first run will download AI models (~1.2GB)"
+echo "⏱️  Model loading may take 1-2 minutes on first startup"

start.sh ADDED Viewed

	@@ -0,0 +1,17 @@

+#!/usr/bin/env bash
+set -euo pipefail
+# Ensure HF cache lives on a writable path (persistent if your platform supports volumes)
+export HF_HOME=${HF_HOME:-/root/.cache/huggingface}
+export TRANSFORMERS_CACHE=${TRANSFORMERS_CACHE:-$HF_HOME/transformers}
+export RUNNING_GUNICORN=1
+export PYTHONUNBUFFERED=1
+export TOKENIZERS_PARALLELISM=false
+export WEB_CONCURRENCY=1   # Keep single worker so you don't load models multiple times
+# Start gunicorn (threaded so /process stays responsive)
+exec gunicorn --bind 0.0.0.0:${PORT:-8080} \
+  --workers ${WEB_CONCURRENCY} \
+  --threads 4 \
+  --timeout 0 \
+  app:app

summarize_qwen.py ADDED Viewed

	@@ -0,0 +1,143 @@

+import sys
+import torch
+import trafilatura
+from transformers import AutoModelForCausalLM, AutoTokenizer
+# --- Part 1: Web Scraping Function ---
+def scrape_article_text(url: str) -> str | None:
+    """
+    Downloads a webpage and extracts the main article text, removing ads,
+    menus, and other boilerplate.
+    Args:
+        url: The URL of the article to scrape.
+    Returns:
+        The cleaned article text as a string, or None if it fails.
+    """
+    print(f" Scraping article from: {url}")
+    # fetch_url downloads the content of the URL
+    downloaded = trafilatura.fetch_url(url)
+    if downloaded is None:
+        print("❌ Error: Failed to download the article content.")
+        return None
+    # extract the main text, ignoring comments and tables for a cleaner summary
+    article_text = trafilatura.extract(downloaded, include_comments=False, include_tables=False)
+    if article_text:
+        print("✅ Successfully extracted article text.")
+        return article_text
+    else:
+        print("❌ Error: Could not find main article text on the page.")
+        return None
+# --- Part 2: Summarization Function ---
+def summarize_with_qwen(text: str, model, tokenizer) -> str:
+    """
+    Generates a summary for the given text using the Qwen3-0.6B model.
+    Args:
+        text: The article text to summarize.
+        model: The pre-loaded transformer model.
+        tokenizer: The pre-loaded tokenizer.
+    Returns:
+        The generated summary as a string.
+    """
+    print(" Summarizing text with Qwen3-0.6B...")
+    # 1. Create a detailed prompt for the summarization task
+    prompt = f"""
+    Please provide a concise and clear summary of the following article.
+    Focus on the main points, key findings, and conclusions. The summary should be
+    easy to understand for someone who has not read the original text.
+    ARTICLE:
+    {text}
+    """
+    messages = [{"role": "user", "content": prompt}]
+    # 2. Apply the chat template. We set `enable_thinking=False` for direct summarization.
+    # This is more efficient than the default reasoning mode for this task.
+    text_input = tokenizer.apply_chat_template(
+        messages,
+        tokenize=False,
+        add_generation_prompt=True,
+        enable_thinking=False
+    )
+    # 3. Tokenize the formatted prompt and move it to the correct device (CPU or MPS on Mac)
+    model_inputs = tokenizer([text_input], return_tensors="pt").to(model.device)
+    # 4. Generate the summary using parameters recommended for non-thinking mode
+    generated_ids = model.generate(
+        **model_inputs,
+        max_new_tokens=512,  # Limit summary length
+        temperature=0.7,
+        top_p=0.8,
+        top_k=20
+    )
+    # 5. Slice the output to remove the input prompt, leaving only the generated response
+    output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
+    # 6. Decode the token IDs back into a readable string
+    summary = tokenizer.decode(output_ids, skip_special_tokens=True).strip()
+    print("✅ Summary generated successfully.")
+    return summary
+# --- Main Execution Block ---
+if __name__ == "__main__":
+    # Check if a URL was provided as a command-line argument
+    if len(sys.argv) < 2:
+        print("Usage: python summarize_qwen.py <URL_OF_ARTICLE>")
+        sys.exit(1)
+    article_url = sys.argv[1]
+    # --- Load Model and Tokenizer ---
+    print("Setting up the Qwen3-0.6B model...")
+    print("Note: The first run will download the model (~1.2 GB). Please be patient.")
+    model_name = "Qwen/Qwen3-0.6B"
+    try:
+        tokenizer = AutoTokenizer.from_pretrained(model_name)
+        model = AutoModelForCausalLM.from_pretrained(
+            model_name,
+            torch_dtype="auto",  # Automatically selects precision (e.g., float16)
+            device_map="auto"    # Automatically uses MPS (Mac GPU) if available
+        )
+    except Exception as e:
+        print(f"❌ Failed to load the model. Error: {e}")
+        print("Please ensure you have a stable internet connection and sufficient disk space.")
+        sys.exit(1)
+    # Inform the user which device is being used
+    device = next(model.parameters()).device
+    print(f"✅ Model loaded successfully on device: {str(device).upper()}")
+    if "mps" in str(device):
+        print("   (Running on Apple Silicon GPU)")
+    # --- Run the Process ---
+    # Step 1: Scrape the article
+    article_content = scrape_article_text(article_url)
+    if article_content:
+        # Step 2: Summarize the content
+        final_summary = summarize_with_qwen(article_content, model, tokenizer)
+        # Step 3: Print the final result
+        print("\n" + "="*50)
+        print("✨ Article Summary ✨")
+        print("="*50)
+        print(final_summary)
+        print("="*50 + "\n")

templates/index.html ADDED Viewed

	@@ -0,0 +1,528 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+  <meta charset="UTF-8" />
+  <meta name="viewport" content="width=device-width, initial-scale=1" />
+  <meta name="color-scheme" content="dark" />
+  <title>AI Article Summarizer · Qwen + Kokoro</title>
+  <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
+  <link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700;800&display=swap" rel="stylesheet" />
+  <style>
+    :root{
+      --bg-0:#0b0f17;
+      --bg-1:#0f1624;
+      --bg-2:#121a2b;
+      --glass: rgba(255,255,255,.04);
+      --muted: #9aa4bf;
+      --text: #e7ecf8;
+      --accent-1:#6d6aff;
+      --accent-2:#7b5cff;
+      --accent-3:#00d4ff;
+      --ok:#21d19f;
+      --warn:#ffb84d;
+      --err:#ff6b6b;
+      --ring: 0 0 0 1px rgba(255,255,255,.07), 0 0 0 6px rgba(124, 58, 237, .12);
+      --shadow: 0 20px 60px rgba(0,0,0,.45), 0 8px 20px rgba(0,0,0,.35);
+      --radius-xl:22px;
+      --radius-lg:16px;
+      --radius-md:12px;
+      --radius-sm:10px;
+      --grad: conic-gradient(from 220deg at 50% 50%, var(--accent-1), var(--accent-2), var(--accent-3), var(--accent-1));
+    }
+    *{box-sizing:border-box}
+    html,body{height:100%}
+    body{
+      margin:0;
+      font-family:Inter, system-ui, -apple-system, Segoe UI, Roboto, Ubuntu, Cantarell, Noto Sans, Helvetica, Arial, "Apple Color Emoji", "Segoe UI Emoji";
+      color:var(--text);
+      background:
+        radial-gradient(1200px 600px at -10% -10%, rgba(109,106,255,.20), transparent 50%),
+        radial-gradient(900px 500px at 120% -10%, rgba(0,212,255,.16), transparent 55%),
+        radial-gradient(1200px 900px at 50% 120%, rgba(123,92,255,.18), transparent 60%),
+        linear-gradient(180deg, var(--bg-0), var(--bg-1) 50%, var(--bg-2));
+      overflow-y:auto;
+    }
+    /* Top progress bar */
+    .bar{
+      position:fixed; inset:0 0 auto 0; height:3px; z-index:9999;
+      background: linear-gradient(90deg, var(--accent-3), var(--accent-2), var(--accent-1));
+      background-size:200% 100%;
+      transform:scaleX(0); transform-origin:left;
+      box-shadow:0 0 18px rgba(0,212,255,.45);
+      transition:transform .2s ease-out;
+      animation:bar-move 2.2s linear infinite;
+    }
+    @keyframes bar-move{0%{background-position:0 0}100%{background-position:200% 0}}
+    .wrap{
+      max-width:1080px; margin:72px auto; padding:0 24px;
+    }
+    .hero{
+      display:flex; flex-direction:column; align-items:center; gap:14px; margin-bottom:28px; text-align:center;
+    }
+    .hero-badge{
+      display:inline-flex; align-items:center; gap:10px; padding:8px 12px; border-radius:999px;
+      background:linear-gradient(180deg, rgba(255,255,255,.06), rgba(255,255,255,.02));
+      border:1px solid rgba(255,255,255,.08);
+      backdrop-filter: blur(8px);
+      box-shadow: var(--shadow);
+    }
+    .dot{width:8px;height:8px;border-radius:50%; background:var(--warn); box-shadow:0 0 0 6px rgba(255,184,77,.14)}
+    .dot.ready{background:var(--ok); box-shadow:0 0 0 6px rgba(33,209,159,.14)}
+    .hero h1{font-size: clamp(28px, 5vw, 44px); margin:0; font-weight:800; letter-spacing:-.02em; line-height:1.05}
+    .grad-text{
+      background: linear-gradient(92deg, #f0f3ff, #bfc8ff 30%, #9ad8ff 60%, #c2b5ff 90%);
+      -webkit-background-clip:text; background-clip:text; -webkit-text-fill-color:transparent;
+    }
+    .hero p{margin:0; color:var(--muted); font-size:15.5px}
+    .panel{
+      position:relative;
+      background:linear-gradient(180deg, rgba(255,255,255,.06), rgba(255,255,255,.03));
+      border:1px solid rgba(255,255,255,.08);
+      border-radius: var(--radius-xl);
+      padding:24px;
+      box-shadow: var(--shadow);
+      overflow:hidden;
+    }
+    .panel::before{
+      content:"";
+      position:absolute; inset:-1px;
+      border-radius:inherit;
+      padding:1px;
+      background:linear-gradient(180deg, rgba(175,134,255,.35) 0%, rgba(0,212,255,.18) 100%);
+      -webkit-mask:linear-gradient(#000 0 0) content-box, linear-gradient(#000 0 0);
+      -webkit-mask-composite:xor; mask-composite: exclude;
+      pointer-events:none;
+    }
+    .form-grid{display:grid; grid-template-columns:1fr auto; gap:12px; align-items:center}
+    .input{
+      width:100%;
+      background:rgba(0,0,0,.35);
+      border:1px solid rgba(255,255,255,.12);
+      border-radius:var(--radius-lg);
+      padding:14px 16px;
+      color:var(--text);
+      font-size:15.5px;
+      outline:none;
+      transition:border .2s ease, box-shadow .2s ease, background .2s ease;
+    }
+    .input::placeholder{color:#7f8aad}
+    .input:focus{border-color:rgba(0,212,255,.55); box-shadow: var(--ring)}
+    .btn{
+      position:relative;
+      display:inline-flex; align-items:center; justify-content:center; gap:10px;
+      padding:14px 18px;
+      border-radius:var(--radius-lg);
+      border:1px solid rgba(255,255,255,.12);
+      color:#0b0f17; font-weight:700; letter-spacing:.02em;
+      background: linear-gradient(135deg, #7b5cff 0%, #00d4ff 100%);
+      box-shadow: 0 10px 30px rgba(0,212,255,.35), inset 0 1px 0 rgba(255,255,255,.15);
+      cursor:pointer; user-select:none;
+      transition: transform .08s ease, filter .15s ease, box-shadow .2s ease, opacity .2s ease;
+    }
+    .btn:hover{transform: translateY(-1px)}
+    .btn:active{transform: translateY(0)}
+    .btn:disabled{opacity:.55; cursor:not-allowed; filter:grayscale(.2)}
+    .row{display:flex; flex-wrap:wrap; gap:12px; align-items:center; margin-top:14px}
+    /* Switch */
+    .switch{
+      display:inline-flex; align-items:center; gap:12px; cursor:pointer; user-select:none;
+      padding:10px 12px; border-radius:999px; background:rgba(255,255,255,.04); border:1px solid rgba(255,255,255,.08);
+    }
+    .switch .track{
+      width:44px; height:24px; background:rgba(255,255,255,.12); border-radius:999px; position:relative; transition: background .2s ease;
+    }
+    .switch .thumb{
+      width:18px; height:18px; border-radius:50%; background:white; position:absolute; top:3px; left:3px;
+      box-shadow:0 4px 16px rgba(0,0,0,.45);
+      transition:left .18s ease, background .2s ease, transform .18s ease;
+    }
+    .switch input{display:none}
+    .switch input:checked + .track{background:linear-gradient(90deg, #00d4ff, #7b5cff)}
+    .switch input:checked + .track .thumb{left:23px; background:#0b0f17; transform:scale(1.05)}
+    /* Collapsible voice panel */
+    .collapse{
+      overflow:hidden; max-height:0; opacity:0; transform: translateY(-4px);
+      transition:max-height .35s ease, opacity .25s ease, transform .25s ease;
+    }
+    .collapse.open{max-height:520px; opacity:1; transform:none}
+    .voices{
+      display:grid; gap:12px; margin-top:12px;
+      grid-template-columns: repeat(auto-fill, minmax(220px, 1fr));
+    }
+    .voice{
+      position:relative; padding:14px; border-radius:var(--radius-md);
+      background:rgba(255,255,255,.03); border:1px solid rgba(255,255,255,.08);
+      transition: transform .12s ease, box-shadow .2s ease, border .2s ease, background .2s ease;
+      cursor:pointer;
+    }
+    .voice:hover{transform: translateY(-2px); box-shadow: var(--shadow); border-color: rgba(0,212,255,.25)}
+    .voice.selected{background:linear-gradient(180deg, rgba(0,212,255,.08), rgba(123,92,255,.08)); border-color: rgba(123,92,255,.55)}
+    .voice .name{font-weight:700; letter-spacing:.01em}
+    .voice .meta{color:var(--muted); font-size:12.5px; margin-top:6px; display:flex; gap:10px; align-items:center}
+    .voice .badge{
+      font-size:11px; padding:3px 8px; border-radius:999px; border:1px solid rgba(255,255,255,.14);
+      background:rgba(255,255,255,.05);
+    }
+    /* Results */
+    .results{margin-top:18px}
+    .chips{display:flex; flex-wrap:wrap; gap:10px}
+    .chip{
+      font-size:12.5px; color:#cdd6f6;
+      padding:8px 12px; border-radius:999px; border:1px solid rgba(255,255,255,.08); background:rgba(255,255,255,.03);
+    }
+    .toolbar{
+      display:flex; gap:10px; flex-wrap:wrap; margin-top:12px
+    }
+    .tbtn{
+      display:inline-flex; align-items:center; gap:8px; padding:8px 12px; border-radius:10px;
+      background:rgba(255,255,255,.04); border:1px solid rgba(255,255,255,.1); color:var(--text);
+      cursor:pointer; font-size:13px; transition: background .15s ease, transform .08s ease;
+    }
+    .tbtn:hover{background:rgba(255,255,255,.08)}
+    .tbtn:active{transform: translateY(1px)}
+    .summary{
+      margin-top:14px;
+      background:rgba(0,0,0,.35);
+      border:1px solid rgba(255,255,255,.1);
+      border-radius:var(--radius-lg);
+      padding:18px;
+      line-height:1.7;
+      font-size:15.5px;
+      white-space:pre-wrap;
+      min-height:120px;
+    }
+    /* Skeleton */
+    .skeleton{
+      position:relative; overflow:hidden; background:rgba(255,255,255,.06); border-radius:10px;
+    }
+    .skeleton::after{
+      content:""; position:absolute; inset:0;
+      background:linear-gradient(100deg, transparent, rgba(255,255,255,.10), transparent);
+      transform:translateX(-100%); animation:shine 1.2s infinite;
+    }
+    @keyframes shine{to{transform:translateX(100%)}}
+    /* Messages */
+    .msg{
+      margin-top:14px; padding:12px 14px; border-radius:12px; border:1px solid rgba(255,255,255,.08);
+      display:none; font-size:14px;
+    }
+    .msg.err{display:block; color:#ffd8d8; background:rgba(255,107,107,.08)}
+    .msg.ok{display:block; color:#d9fff4; background:rgba(33,209,159,.08)}
+    /* Audio card */
+    .audio{
+      margin-top:14px; padding:16px;
+      background:rgba(255,255,255,.03);
+      border:1px solid rgba(255,255,255,.08); border-radius:var(--radius-lg);
+    }
+    audio{width:100%; height:40px; outline:none}
+    /* Footer note */
+    .foot{margin-top:14px; text-align:center; color:#7f8aad; font-size:12.5px}
+    @media (max-width:720px){
+      .form-grid{grid-template-columns: 1fr}
+      .btn{width:100%}
+    }
+  </style>
+</head>
+<body>
+  <div class="bar" id="bar"></div>
+  <div class="wrap">
+    <header class="hero">
+      <div class="hero-badge" id="statusBadge">
+        <span class="dot" id="statusDot"></span>
+        <span id="statusText">Loading AI models…</span>
+      </div>
+      <h1><span class="grad-text">AI Article Summarizer</span></h1>
+      <p>Qwen3-0.6B summarization · Kokoro neural TTS · smooth, private, fast</p>
+    </header>
+    <section class="panel">
+      <form id="summarizerForm" autocomplete="on">
+        <div class="form-grid">
+          <input id="articleUrl" class="input" type="url" inputmode="url"
+                 placeholder="Paste an article URL (https://…)" required />
+          <button id="submitBtn" class="btn" type="submit">
+            ✨ Summarize
+          </button>
+        </div>
+        <div class="row">
+          <label class="switch" title="Generate audio with Kokoro TTS">
+            <input id="generateAudio" type="checkbox" />
+            <span class="track"><span class="thumb"></span></span>
+            <span>🎵 Text-to-Speech</span>
+          </label>
+          <span class="chip">Models: Qwen3-0.6B · Kokoro</span>
+          <span class="chip">On-device processing</span>
+        </div>
+        <div id="voiceSection" class="collapse" aria-hidden="true">
+          <div class="voices" id="voiceGrid">
+            <!-- Injected -->
+          </div>
+        </div>
+      </form>
+      <!-- Loading skeleton -->
+      <div id="loadingSection" style="display:none; margin-top:18px">
+        <div class="skeleton" style="height:18px; width:42%; margin-bottom:10px"></div>
+        <div class="skeleton" style="height:14px; width:90%; margin-bottom:8px"></div>
+        <div class="skeleton" style="height:14px; width:86%; margin-bottom:8px"></div>
+        <div class="skeleton" style="height:14px; width:88%; margin-bottom:8px"></div>
+        <div class="skeleton" style="height:14px; width:60%; margin-bottom:8px"></div>
+      </div>
+      <!-- Results -->
+      <div id="resultSection" class="results" style="display:none">
+        <div class="chips" id="stats"></div>
+        <div class="toolbar">
+          <button class="tbtn" id="copyBtn" type="button">📋 Copy summary</button>
+          <a class="tbtn" id="downloadAudioBtn" href="#" download style="display:none">⬇️ Download audio</a>
+        </div>
+        <div id="summaryContent" class="summary"></div>
+        <div id="audioSection" class="audio" style="display:none">
+          <div style="display:flex; justify-content:space-between; align-items:center; margin-bottom:6px">
+            <strong>🎧 Audio Playback</strong>
+            <span id="duration" style="color:var(--muted); font-size:12.5px"></span>
+          </div>
+          <audio id="audioPlayer" controls preload="none"></audio>
+        </div>
+      </div>
+      <div id="errorMessage" class="msg err"></div>
+      <div id="successMessage" class="msg ok"></div>
+    </section>
+    <p class="foot">Tip: turn on TTS and pick a voice you like. We’ll remember your last choice.</p>
+  </div>
+  <script>
+    // ---------------- State ----------------
+    let modelsReady = false;
+    let selectedVoice = localStorage.getItem("voiceId") || "af_heart";
+    const bar = document.getElementById("bar");
+    // --------------- Utilities --------------
+    const $ = (sel) => document.querySelector(sel);
+    function showBar(active) {
+      bar.style.transform = active ? "scaleX(1)" : "scaleX(0)";
+    }
+    function setStatus(ready, error){
+      const dot = $("#statusDot");
+      const text = $("#statusText");
+      const badge = $("#statusBadge");
+      if (error){
+        dot.classList.remove("ready");
+        text.textContent = "Model error: " + error;
+        badge.style.borderColor = "rgba(255,107,107,.45)";
+        return;
+      }
+      if (ready){
+        dot.classList.add("ready");
+        text.textContent = "Models ready";
+      } else {
+        dot.classList.remove("ready");
+        text.textContent = "Loading AI models…";
+      }
+    }
+    function chip(text){ const span = document.createElement("span"); span.className="chip"; span.textContent=text; return span; }
+    function fmt(x){ return new Intl.NumberFormat().format(x); }
+    // ------------- Model status poll ---------
+    async function checkModelStatus(){
+      try{
+        const res = await fetch("/status");
+        const s = await res.json();
+        modelsReady = !!s.loaded;
+        setStatus(modelsReady, s.error || null);
+        if (!modelsReady && !s.error) setTimeout(checkModelStatus, 1500);
+        if (modelsReady) { await loadVoices(); }
+      }catch(e){
+        setTimeout(checkModelStatus, 2000);
+      }
+    }
+    // ------------- Voice loading -------------
+    async function loadVoices(){
+      try{
+        const res = await fetch("/voices");
+        const voices = await res.json();
+        const grid = $("#voiceGrid");
+        grid.innerHTML = "";
+        voices.forEach(v=>{
+          const el = document.createElement("div");
+          el.className = "voice" + (v.id === selectedVoice ? " selected":"");
+          el.dataset.voice = v.id;
+          el.innerHTML = `
+            <div class="name">${v.name}</div>
+            <div class="meta">
+              <span class="badge">Grade ${v.grade}</span>
+              <span>${v.description || ""}</span>
+            </div>`;
+          el.addEventListener("click", ()=>{
+            document.querySelectorAll(".voice").forEach(x=>x.classList.remove("selected"));
+            el.classList.add("selected");
+            selectedVoice = v.id;
+            localStorage.setItem("voiceId", selectedVoice);
+          });
+          grid.appendChild(el);
+        });
+      }catch(e){
+        // ignore
+      }
+    }
+    // ------------- Collapsible voices --------
+    const generateAudio = $("#generateAudio");
+    const voiceSection = $("#voiceSection");
+    function toggleVoices(open){
+      voiceSection.classList.toggle("open", !!open);
+      voiceSection.setAttribute("aria-hidden", open ? "false" : "true");
+    }
+    generateAudio.addEventListener("change", e=> toggleVoices(e.target.checked));
+    toggleVoices(generateAudio.checked); // on load
+    // ------------- Form submit ----------------
+    const form = $("#summarizerForm");
+    const loading = $("#loadingSection");
+    const result = $("#resultSection");
+    const errorBox = $("#errorMessage");
+    const okBox = $("#successMessage");
+    const submitBtn = $("#submitBtn");
+    const urlInput = $("#articleUrl");
+    form.addEventListener("submit", async (e)=>{
+      e.preventDefault();
+      errorBox.style.display="none"; okBox.style.display="none";
+      if (!modelsReady){
+        errorBox.textContent = "Please wait for the AI models to finish loading.";
+        errorBox.style.display = "block";
+        return;
+      }
+      const url = urlInput.value.trim();
+      if (!url){ return; }
+      submitBtn.disabled = true;
+      showBar(true);
+      loading.style.display = "block";
+      result.style.display = "none";
+      try{
+        const res = await fetch("/process", {
+          method: "POST",
+          headers: {"Content-Type":"application/json"},
+          body: JSON.stringify({
+            url,
+            generate_audio: generateAudio.checked,
+            voice: selectedVoice
+          })
+        });
+        const data = await res.json();
+        loading.style.display = "none";
+        submitBtn.disabled = false;
+        showBar(false);
+        if (!data.success){
+          errorBox.textContent = data.error || "Something went wrong.";
+          errorBox.style.display = "block";
+          return;
+        }
+        renderResult(data);
+        okBox.textContent = "Done!";
+        okBox.style.display = "block";
+        setTimeout(()=> okBox.style.display="none", 1800);
+      }catch(err){
+        loading.style.display="none";
+        submitBtn.disabled=false;
+        showBar(false);
+        errorBox.textContent = "Network error: " + (err?.message || err);
+        errorBox.style.display = "block";
+      }
+    });
+    // ------------- Render results -------------
+    const stats = $("#stats");
+    const summaryEl = $("#summaryContent");
+    const audioWrap = $("#audioSection");
+    const audioEl = $("#audioPlayer");
+    const dlBtn = $("#downloadAudioBtn");
+    const durationLabel = $("#duration");
+    const copyBtn = $("#copyBtn");
+    function renderResult(r){
+      // Stats
+      stats.innerHTML = "";
+      stats.appendChild(chip(`📄 ${fmt(r.article_length)} → ${fmt(r.summary_length)} chars`));
+      stats.appendChild(chip(`📉 ${r.compression_ratio}% compression`));
+      stats.appendChild(chip(`🕒 ${r.timestamp}`));
+      // Summary
+      summaryEl.textContent = r.summary || "";
+      result.style.display = "block";
+      // Audio
+      if (r.audio_file){
+        audioEl.src = r.audio_file;
+        audioWrap.style.display = "block";
+        durationLabel.textContent = `${r.audio_duration}s`;
+        dlBtn.style.display = "inline-flex";
+        dlBtn.href = r.audio_file;
+        dlBtn.download = r.audio_file.split("/").pop() || "summary.wav";
+      } else {
+        audioWrap.style.display = "none";
+        dlBtn.style.display = "none";
+      }
+    }
+    // Copy summary
+    copyBtn.addEventListener("click", async ()=>{
+      try{
+        await navigator.clipboard.writeText(summaryEl.textContent || "");
+        copyBtn.textContent = "✅ Copied";
+        setTimeout(()=> copyBtn.textContent = "📋 Copy summary", 900);
+      }catch(e){
+        // ignore
+      }
+    });
+    // ------------- Quality of life -------------
+    // Paste on Cmd/Ctrl+V if input empty
+    window.addEventListener("paste", (e)=>{
+      if(document.activeElement !== urlInput && !urlInput.value){
+        const t = (e.clipboardData || window.clipboardData).getData("text");
+        if (t?.startsWith("http")){ urlInput.value = t; }
+      }
+    });
+    // Init
+    document.addEventListener("DOMContentLoaded", ()=>{
+      checkModelStatus();
+      // Restore voice toggle state hint
+      if (localStorage.getItem("voiceId")) selectedVoice = localStorage.getItem("voiceId");
+    });
+  </script>
+</body>
+</html>

templates/index0.html ADDED Viewed

	@@ -0,0 +1,551 @@

+<!DOCTYPE html>
+<html lang="en">
+<head>
+    <meta charset="UTF-8">
+    <meta name="viewport" content="width=device-width, initial-scale=1.0">
+    <title>AI Article Summarizer with Text-to-Speech</title>
+    <style>
+        * {
+            margin: 0;
+            padding: 0;
+            box-sizing: border-box;
+        }
+        body {
+            font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif;
+            background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
+            min-height: 100vh;
+            padding: 20px;
+        }
+        .container {
+            max-width: 1200px;
+            margin: 0 auto;
+            background: rgba(255, 255, 255, 0.95);
+            border-radius: 20px;
+            padding: 40px;
+            box-shadow: 0 20px 40px rgba(0, 0, 0, 0.1);
+            backdrop-filter: blur(10px);
+        }
+        .header {
+            text-align: center;
+            margin-bottom: 40px;
+        }
+        .header h1 {
+            font-size: 2.5rem;
+            color: #2d3748;
+            margin-bottom: 10px;
+            background: linear-gradient(135deg, #667eea, #764ba2);
+            -webkit-background-clip: text;
+            -webkit-text-fill-color: transparent;
+            background-clip: text;
+        }
+        .header p {
+            color: #718096;
+            font-size: 1.1rem;
+        }
+        .status-indicator {
+            display: inline-flex;
+            align-items: center;
+            padding: 8px 16px;
+            border-radius: 25px;
+            font-size: 0.9rem;
+            font-weight: 600;
+            margin: 20px auto;
+        }
+        .status-loading {
+            background: #fed7d7;
+            color: #c53030;
+        }
+        .status-ready {
+            background: #c6f6d5;
+            color: #38a169;
+        }
+        .form-section {
+            background: #f7fafc;
+            padding: 30px;
+            border-radius: 15px;
+            margin-bottom: 30px;
+            border: 1px solid #e2e8f0;
+        }
+        .form-group {
+            margin-bottom: 20px;
+        }
+        .form-group label {
+            display: block;
+            margin-bottom: 8px;
+            font-weight: 600;
+            color: #2d3748;
+        }
+        .form-group input[type="url"] {
+            width: 100%;
+            padding: 12px 16px;
+            border: 2px solid #e2e8f0;
+            border-radius: 8px;
+            font-size: 1rem;
+            transition: border-color 0.3s ease;
+        }
+        .form-group input[type="url"]:focus {
+            outline: none;
+            border-color: #667eea;
+            box-shadow: 0 0 0 3px rgba(102, 126, 234, 0.1);
+        }
+        .checkbox-group {
+            display: flex;
+            align-items: center;
+            gap: 10px;
+            margin-bottom: 15px;
+        }
+        .checkbox-group input[type="checkbox"] {
+            width: 18px;
+            height: 18px;
+            accent-color: #667eea;
+        }
+        .voice-selector {
+            display: none;
+            margin-top: 15px;
+            padding: 15px;
+            background: white;
+            border-radius: 8px;
+            border: 1px solid #e2e8f0;
+        }
+        .voice-grid {
+            display: grid;
+            grid-template-columns: repeat(auto-fit, minmax(200px, 1fr));
+            gap: 10px;
+            margin-top: 10px;
+        }
+        .voice-option {
+            padding: 10px;
+            border: 2px solid #e2e8f0;
+            border-radius: 8px;
+            cursor: pointer;
+            transition: all 0.3s ease;
+            text-align: center;
+        }
+        .voice-option:hover {
+            border-color: #667eea;
+            background: #f7fafc;
+        }
+        .voice-option.selected {
+            border-color: #667eea;
+            background: #ebf4ff;
+        }
+        .voice-option .name {
+            font-weight: 600;
+            color: #2d3748;
+        }
+        .voice-option .grade {
+            font-size: 0.8rem;
+            color: #718096;
+        }
+        .voice-option .description {
+            font-size: 0.85rem;
+            color: #4a5568;
+            margin-top: 2px;
+        }
+        .submit-btn {
+            width: 100%;
+            padding: 15px;
+            background: linear-gradient(135deg, #667eea, #764ba2);
+            color: white;
+            border: none;
+            border-radius: 8px;
+            font-size: 1.1rem;
+            font-weight: 600;
+            cursor: pointer;
+            transition: transform 0.2s ease;
+            disabled: opacity 0.6;
+        }
+        .submit-btn:hover:not(:disabled) {
+            transform: translateY(-2px);
+            box-shadow: 0 10px 20px rgba(102, 126, 234, 0.3);
+        }
+        .submit-btn:disabled {
+            opacity: 0.6;
+            cursor: not-allowed;
+            transform: none;
+        }
+        .loading {
+            display: none;
+            text-align: center;
+            padding: 30px;
+        }
+        .spinner {
+            display: inline-block;
+            width: 40px;
+            height: 40px;
+            border: 4px solid #f3f3f3;
+            border-top: 4px solid #667eea;
+            border-radius: 50%;
+            animation: spin 1s linear infinite;
+        }
+        @keyframes spin {
+            0% { transform: rotate(0deg); }
+            100% { transform: rotate(360deg); }
+        }
+        .result-section {
+            display: none;
+            background: #f7fafc;
+            padding: 30px;
+            border-radius: 15px;
+            margin-top: 30px;
+            border: 1px solid #e2e8f0;
+        }
+        .result-header {
+            display: flex;
+            justify-content: space-between;
+            align-items: center;
+            margin-bottom: 20px;
+            flex-wrap: wrap;
+            gap: 10px;
+        }
+        .result-title {
+            font-size: 1.5rem;
+            font-weight: 700;
+            color: #2d3748;
+        }
+        .stats {
+            display: flex;
+            gap: 20px;
+            font-size: 0.9rem;
+            color: #718096;
+            flex-wrap: wrap;
+        }
+        .summary-content {
+            background: white;
+            padding: 25px;
+            border-radius: 12px;
+            line-height: 1.6;
+            color: #2d3748;
+            font-size: 1.05rem;
+            border-left: 4px solid #667eea;
+            margin-bottom: 20px;
+        }
+        .audio-section {
+            display: none;
+            background: white;
+            padding: 20px;
+            border-radius: 12px;
+            text-align: center;
+        }
+        .audio-player {
+            width: 100%;
+            max-width: 500px;
+            margin: 15px auto;
+        }
+        .error-message {
+            background: #fed7d7;
+            color: #c53030;
+            padding: 15px;
+            border-radius: 8px;
+            margin: 20px 0;
+            border-left: 4px solid #c53030;
+        }
+        .success-message {
+            background: #c6f6d5;
+            color: #38a169;
+            padding: 15px;
+            border-radius: 8px;
+            margin: 20px 0;
+            border-left: 4px solid #38a169;
+        }
+        @media (max-width: 768px) {
+            .container {
+                padding: 20px;
+                margin: 10px;
+            }
+            .header h1 {
+                font-size: 2rem;
+            }
+            .voice-grid {
+                grid-template-columns: 1fr;
+            }
+            .result-header {
+                flex-direction: column;
+                align-items: flex-start;
+            }
+            .stats {
+                justify-content: space-between;
+                width: 100%;
+            }
+        }
+    </style>
+</head>
+<body>
+    <div class="container">
+        <div class="header">
+            <h1>🤖 AI Article Summarizer</h1>
+            <p>Powered by Qwen3-0.6B and Kokoro TTS</p>
+            <div id="modelStatus" class="status-indicator status-loading">
+                🔄 Loading AI models...
+            </div>
+        </div>
+        <form id="summarizerForm" class="form-section">
+            <div class="form-group">
+                <label for="articleUrl">📰 Article URL</label>
+                <input
+                    type="url"
+                    id="articleUrl"
+                    name="articleUrl"
+                    placeholder="https://example.com/article"
+                    required
+                >
+            </div>
+            <div class="checkbox-group">
+                <input type="checkbox" id="generateAudio" name="generateAudio">
+                <label for="generateAudio">🎵 Generate text-to-speech audio</label>
+            </div>
+            <div id="voiceSelector" class="voice-selector">
+                <label>🎭 Select Voice</label>
+                <div id="voiceGrid" class="voice-grid">
+                    <!-- Voices will be loaded here -->
+                </div>
+            </div>
+            <button type="submit" id="submitBtn" class="submit-btn">
+                ✨ Summarize Article
+            </button>
+        </form>
+        <div id="loadingSection" class="loading">
+            <div class="spinner"></div>
+            <p>Processing your article...</p>
+        </div>
+        <div id="resultSection" class="result-section">
+            <div class="result-header">
+                <h2 class="result-title">📋 Summary</h2>
+                <div id="stats" class="stats">
+                    <!-- Stats will be inserted here -->
+                </div>
+            </div>
+            <div id="summaryContent" class="summary-content">
+                <!-- Summary will be inserted here -->
+            </div>
+            <div id="audioSection" class="audio-section">
+                <h3>🎧 Audio Playback</h3>
+                <p>Listen to your summary:</p>
+                <audio id="audioPlayer" class="audio-player" controls>
+                    Your browser does not support the audio element.
+                </audio>
+            </div>
+        </div>
+        <div id="errorMessage" class="error-message" style="display: none;">
+            <!-- Error messages will be shown here -->
+        </div>
+    </div>
+    <script>
+        let selectedVoice = 'af_heart';
+        let modelsReady = false;
+        // Check model status on page load
+        async function checkModelStatus() {
+            try {
+                const response = await fetch('/status');
+                const status = await response.json();
+                const statusEl = document.getElementById('modelStatus');
+                if (status.loaded) {
+                    statusEl.textContent = '✅ AI models ready!';
+                    statusEl.className = 'status-indicator status-ready';
+                    modelsReady = true;
+                    loadVoices();
+                } else if (status.error) {
+                    statusEl.textContent = `❌ Error loading models: ${status.error}`;
+                    statusEl.className = 'status-indicator status-loading';
+                } else {
+                    statusEl.textContent = '🔄 Loading AI models...';
+                    setTimeout(checkModelStatus, 2000);
+                }
+            } catch (error) {
+                console.error('Error checking status:', error);
+                setTimeout(checkModelStatus, 5000);
+            }
+        }
+        // Load available voices
+        async function loadVoices() {
+            try {
+                const response = await fetch('/voices');
+                const voices = await response.json();
+                const voiceGrid = document.getElementById('voiceGrid');
+                voiceGrid.innerHTML = '';
+                voices.forEach((voice, index) => {
+                    const voiceEl = document.createElement('div');
+                    voiceEl.className = `voice-option${index === 0 ? ' selected' : ''}`;
+                    voiceEl.dataset.voice = voice.id;
+                    voiceEl.innerHTML = `
+                        <div class="name">${voice.name}</div>
+                        <div class="grade">Grade: ${voice.grade}</div>
+                        <div class="description">${voice.description}</div>
+                    `;
+                    voiceEl.addEventListener('click', () => selectVoice(voice.id, voiceEl));
+                    voiceGrid.appendChild(voiceEl);
+                });
+            } catch (error) {
+                console.error('Error loading voices:', error);
+            }
+        }
+        // Select voice
+        function selectVoice(voiceId, element) {
+            document.querySelectorAll('.voice-option').forEach(el => {
+                el.classList.remove('selected');
+            });
+            element.classList.add('selected');
+            selectedVoice = voiceId;
+        }
+        // Toggle voice selector
+        document.getElementById('generateAudio').addEventListener('change', function() {
+            const voiceSelector = document.getElementById('voiceSelector');
+            voiceSelector.style.display = this.checked ? 'block' : 'none';
+        });
+        // Handle form submission
+        document.getElementById('summarizerForm').addEventListener('submit', async function(e) {
+            e.preventDefault();
+            if (!modelsReady) {
+                showError('Please wait for the AI models to finish loading.');
+                return;
+            }
+            const url = document.getElementById('articleUrl').value;
+            const generateAudio = document.getElementById('generateAudio').checked;
+            // Show loading
+            document.getElementById('loadingSection').style.display = 'block';
+            document.getElementById('resultSection').style.display = 'none';
+            document.getElementById('errorMessage').style.display = 'none';
+            document.getElementById('submitBtn').disabled = true;
+            try {
+                const response = await fetch('/process', {
+                    method: 'POST',
+                    headers: {
+                        'Content-Type': 'application/json',
+                    },
+                    body: JSON.stringify({
+                        url: url,
+                        generate_audio: generateAudio,
+                        voice: selectedVoice
+                    })
+                });
+                const result = await response.json();
+                // Hide loading
+                document.getElementById('loadingSection').style.display = 'none';
+                document.getElementById('submitBtn').disabled = false;
+                if (result.success) {
+                    showResult(result);
+                } else {
+                    showError(result.error);
+                }
+            } catch (error) {
+                document.getElementById('loadingSection').style.display = 'none';
+                document.getElementById('submitBtn').disabled = false;
+                showError(`Network error: ${error.message}`);
+            }
+        });
+        // Show results
+        function showResult(result) {
+            document.getElementById('summaryContent').textContent = result.summary;
+            // Update stats
+            const stats = document.getElementById('stats');
+            stats.innerHTML = `
+                <span>📊 ${result.article_length} → ${result.summary_length} chars</span>
+                <span>📉 ${result.compression_ratio}% compression</span>
+                <span>🕒 ${result.timestamp}</span>
+            `;
+            // Show/hide audio section
+            const audioSection = document.getElementById('audioSection');
+            if (result.audio_file) {
+                const audioPlayer = document.getElementById('audioPlayer');
+                audioPlayer.src = result.audio_file;
+                audioSection.style.display = 'block';
+                // Add duration info
+                const durationInfo = audioSection.querySelector('p');
+                durationInfo.textContent = `Listen to your summary (${result.audio_duration}s):`;
+            } else {
+                audioSection.style.display = 'none';
+            }
+            document.getElementById('resultSection').style.display = 'block';
+        }
+        // Show error
+        function showError(message) {
+            const errorEl = document.getElementById('errorMessage');
+            errorEl.textContent = message;
+            errorEl.style.display = 'block';
+        }
+        // Initialize
+        document.addEventListener('DOMContentLoaded', function() {
+            checkModelStatus();
+        });
+    </script>
+</body>
+</html>