Text_Summarization / README.md
Codex
Add Space-only YouTube fallback strategies
e6f021c
metadata
title: Text Summarization
emoji: 📝
colorFrom: blue
colorTo: indigo
sdk: docker
app_port: 8501
pinned: false
license: mit
short_description: Summarize YouTube, web pages, and uploaded docs.

Text Summarization

This Space runs a Streamlit app for summarizing:

  • YouTube videos
  • website URLs
  • uploaded PDF, TXT, MD, CSV, and DOCX files

Required Secret

Add this secret in the Space settings:

  • GROQ_API_KEY

YouTube On Hugging Face Spaces

YouTube transcript loading may work locally but fail on Hugging Face Spaces because YouTube frequently blocks or rate-limits datacenter IP ranges. The app now retries transient HTTPS failures and supports proxy configuration through Space secrets:

  • YOUTUBE_HTTP_PROXY
  • YOUTUBE_HTTPS_PROXY

You can also use the standard HTTP_PROXY and HTTPS_PROXY environment variables if that matches your setup.

Space-Only YouTube Fallbacks

The Hugging Face Space version now supports multiple YouTube retrieval strategies:

  • Direct transcript fetch
  • External transcript API
  • Audio transcription via yt-dlp + Groq Whisper
  • Manual transcript paste/upload

Optional secrets for external transcript API

  • YOUTUBE_TRANSCRIPT_API_URL
  • YOUTUBE_TRANSCRIPT_API_KEY
  • YOUTUBE_TRANSCRIPT_API_METHOD (GET or POST, default GET)
  • YOUTUBE_TRANSCRIPT_API_KEY_HEADER (default Authorization)
  • YOUTUBE_TRANSCRIPT_API_TIMEOUT (default 45)

YOUTUBE_TRANSCRIPT_API_URL may contain placeholders such as {video_id}, {url}, and {language_code}.

Optional secrets for Groq audio transcription fallback

  • GROQ_AUDIO_TRANSCRIPTION_MODEL

Default model: whisper-large-v3-turbo