Scrape-Anythings / README.md
PHOROTHA913's picture
Upload 9 files
5c3dc0d verified

A newer version of the Streamlit SDK is available: 1.52.2

Upgrade
metadata
title: Scrape Anythings
emoji: 
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: 1.35.0
python_version: '3.9'
app_file: app.py

✨ Scrape Anythings

A user-friendly Streamlit web application for extracting data from any website, including special support for YouTube and Instagram.

🌟 Features

  • Scrape Any URL: Paste any website, YouTube, or Instagram URL to start.
  • Multiple Data Types: Extract text, images, links, tables, numbers, and metadata.
  • Social Media Support: Scrape YouTube video info & comments, and Instagram profile details & posts.
  • Rich Data Export: Download your data in JSON, CSV, TXT, and structured Excel (.xlsx) formats.
  • Modern UI: A clean and simple interface for a smooth user experience.

🚀 How to Deploy on Hugging Face Spaces

  1. Create a Hugging Face Account: If you don't have one, sign up at huggingface.co.

  2. Create a New Space:

    • Go to huggingface.co/new-space.
    • Enter a Space name (e.g., scrape-anythings).
    • Select Streamlit as the Space SDK.
    • Choose Create a new repository for this Space.
    • Click Create Space.
  3. Upload Your Files:

    • In your new Space, go to the Files tab.
    • Click Upload files.
    • Drag and drop all the files from your project folder:
      • app.py
      • scraper.py
      • youtube_scraper.py
      • instagram_scraper.py
      • instagram_scraper_v2.py
      • requirements.txt
      • README.md
    • Commit the files directly to the main branch.
  4. Done! Hugging Face will automatically build and launch your application. You can share the URL of your Space with anyone.

📋 How to Use the App

  1. Enter a URL: Paste the URL of the website, YouTube video, or Instagram profile you want to scrape.
  2. Select Data Types: Choose the data you want to extract.
  3. Click Scrape!: Let the app do the work.
  4. View & Download: See the results directly in the app and download them in your preferred format.
  • Real-time scraping status
  • Custom CSS selectors
  • Proxy support
  • Multi-language support

🤝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Streamlit team for the amazing web app framework
  • BeautifulSoup and Selenium communities
  • Hugging Face for hosting capabilities

Made with ❤️ for the AI/ML community