MSDSF25M005_Ver2 / README.md
Ansnaeem's picture
Update README.md
5109fe0 verified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: MSDSF25M005_Ver2— Website & YouTube Q&A
emoji: 🤖
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false

Version 2 — Website Scraper & YouTube Transcript Q&A

A Gradio app with two independent input modes:

Tab 1: Bot-Protected Website Scraper

  • Enter any URL in the text box
  • Fetches and parses the page using Bright Data Web Unlocker API + BeautifulSoup
  • Extracted content is used as context for Q&A with a Groq-hosted LLM

Tab 2: YouTube Transcript Q&A

  • Enter a valid YouTube Video ID (e.g., dQw4w9WgXcQ)
  • Uses youtube-transcript-api to fetch auto-generated or manual transcripts
  • Transcript is passed to Groq LLM for Q&A (e.g., "What is the main topic?", "Summarize in 3 bullet points")
  • Displays a clear error if no transcript is available

Setup (Hugging Face Spaces)

Add these secrets in your Space Settings → Repository secrets:

  • GROQ_API_KEY — Your Groq API key
  • BRIGHTDATA_API_KEY — Your Bright Data Web Unlocker API key (for Tab 1)

Create Hugging Face Space

  1. Go to huggingface.co/new-space
  2. Choose SDK: Gradio, Space name: scraper_bot_v2 (or any name)
  3. Clone the Space, then copy app.py, requirements.txt, and README.md from this folder into it
  4. In Space Settings → Repository secrets, add:
    • GROQ_API_KEY
    • BRIGHTDATA_API_KEY
  5. Push to the Space — it will build and deploy automatically

Local Run

pip install -r requirements.txt

Create a .env file with:

GROQ_API_KEY=your_groq_key
BRIGHTDATA_API_KEY=your_brightdata_key

Then:

python app.py