Spaces:
Sleeping
Sleeping
| title: Scrape Anythings | |
| emoji: ✨ | |
| colorFrom: blue | |
| colorTo: green | |
| sdk: streamlit | |
| sdk_version: "1.35.0" | |
| python_version: "3.9" | |
| app_file: app.py | |
| # ✨ Scrape Anythings | |
| A user-friendly Streamlit web application for extracting data from any website, including special support for YouTube and Instagram. | |
| ## 🌟 Features | |
| - **Scrape Any URL**: Paste any website, YouTube, or Instagram URL to start. | |
| - **Multiple Data Types**: Extract text, images, links, tables, numbers, and metadata. | |
| - **Social Media Support**: Scrape YouTube video info & comments, and Instagram profile details & posts. | |
| - **Rich Data Export**: Download your data in JSON, CSV, TXT, and structured Excel (.xlsx) formats. | |
| - **Modern UI**: A clean and simple interface for a smooth user experience. | |
| ## 🚀 How to Deploy on Hugging Face Spaces | |
| 1. **Create a Hugging Face Account**: If you don't have one, sign up at [huggingface.co](https://huggingface.co/). | |
| 2. **Create a New Space**: | |
| * Go to [huggingface.co/new-space](https://huggingface.co/new-space). | |
| * Enter a **Space name** (e.g., `scrape-anythings`). | |
| * Select **Streamlit** as the Space SDK. | |
| * Choose **Create a new repository for this Space**. | |
| * Click **Create Space**. | |
| 3. **Upload Your Files**: | |
| * In your new Space, go to the **Files** tab. | |
| * Click **Upload files**. | |
| * Drag and drop all the files from your project folder: | |
| * `app.py` | |
| * `scraper.py` | |
| * `youtube_scraper.py` | |
| * `instagram_scraper.py` | |
| * `instagram_scraper_v2.py` | |
| * `requirements.txt` | |
| * `README.md` | |
| * Commit the files directly to the `main` branch. | |
| 4. **Done!** Hugging Face will automatically build and launch your application. You can share the URL of your Space with anyone. | |
| ## 📋 How to Use the App | |
| 1. **Enter a URL**: Paste the URL of the website, YouTube video, or Instagram profile you want to scrape. | |
| 2. **Select Data Types**: Choose the data you want to extract. | |
| 3. **Click Scrape!**: Let the app do the work. | |
| 4. **View & Download**: See the results directly in the app and download them in your preferred format. | |
| - [ ] Real-time scraping status | |
| - [ ] Custom CSS selectors | |
| - [ ] Proxy support | |
| - [ ] Multi-language support | |
| ## 🤝 Contributing | |
| 1. Fork the repository | |
| 2. Create a feature branch | |
| 3. Make your changes | |
| 4. Test thoroughly | |
| 5. Submit a pull request | |
| ## 📄 License | |
| This project is licensed under the MIT License - see the LICENSE file for details. | |
| ## 🙏 Acknowledgments | |
| - Streamlit team for the amazing web app framework | |
| - BeautifulSoup and Selenium communities | |
| - Hugging Face for hosting capabilities | |
| --- | |
| **Made with ❤️ for the AI/ML community** |