Spaces:
Sleeping
Sleeping
A newer version of the Gradio SDK is available: 6.13.0
metadata
title: MSDSF25M005_Ver2— Website & YouTube Q&A
emoji: 🤖
colorFrom: blue
colorTo: green
sdk: gradio
sdk_version: 4.44.0
app_file: app.py
pinned: false
Version 2 — Website Scraper & YouTube Transcript Q&A
A Gradio app with two independent input modes:
Tab 1: Bot-Protected Website Scraper
- Enter any URL in the text box
- Fetches and parses the page using Bright Data Web Unlocker API + BeautifulSoup
- Extracted content is used as context for Q&A with a Groq-hosted LLM
Tab 2: YouTube Transcript Q&A
- Enter a valid YouTube Video ID (e.g.,
dQw4w9WgXcQ) - Uses youtube-transcript-api to fetch auto-generated or manual transcripts
- Transcript is passed to Groq LLM for Q&A (e.g., "What is the main topic?", "Summarize in 3 bullet points")
- Displays a clear error if no transcript is available
Setup (Hugging Face Spaces)
Add these secrets in your Space Settings → Repository secrets:
GROQ_API_KEY— Your Groq API keyBRIGHTDATA_API_KEY— Your Bright Data Web Unlocker API key (for Tab 1)
Create Hugging Face Space
- Go to huggingface.co/new-space
- Choose SDK: Gradio, Space name:
scraper_bot_v2(or any name) - Clone the Space, then copy
app.py,requirements.txt, andREADME.mdfrom this folder into it - In Space Settings → Repository secrets, add:
GROQ_API_KEYBRIGHTDATA_API_KEY
- Push to the Space — it will build and deploy automatically
Local Run
pip install -r requirements.txt
Create a .env file with:
GROQ_API_KEY=your_groq_key
BRIGHTDATA_API_KEY=your_brightdata_key
Then:
python app.py