File size: 2,713 Bytes
9016b90
a5c1e5d
5c3dc0d
 
 
 
 
 
 
9016b90
 
5c3dc0d
9016b90
5c3dc0d
9016b90
5c3dc0d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
title: Scrape Anythings
emoji: 
colorFrom: blue
colorTo: green
sdk: streamlit
sdk_version: "1.35.0"
python_version: "3.9"
app_file: app.py
---

# ✨ Scrape Anythings

A user-friendly Streamlit web application for extracting data from any website, including special support for YouTube and Instagram.

## 🌟 Features

- **Scrape Any URL**: Paste any website, YouTube, or Instagram URL to start.
- **Multiple Data Types**: Extract text, images, links, tables, numbers, and metadata.
- **Social Media Support**: Scrape YouTube video info & comments, and Instagram profile details & posts.
- **Rich Data Export**: Download your data in JSON, CSV, TXT, and structured Excel (.xlsx) formats.
- **Modern UI**: A clean and simple interface for a smooth user experience.

## 🚀 How to Deploy on Hugging Face Spaces

1.  **Create a Hugging Face Account**: If you don't have one, sign up at [huggingface.co](https://huggingface.co/).
2.  **Create a New Space**:
    *   Go to [huggingface.co/new-space](https://huggingface.co/new-space).
    *   Enter a **Space name** (e.g., `scrape-anythings`).
    *   Select **Streamlit** as the Space SDK.
    *   Choose **Create a new repository for this Space**.
    *   Click **Create Space**.
3.  **Upload Your Files**:
    *   In your new Space, go to the **Files** tab.
    *   Click **Upload files**.
    *   Drag and drop all the files from your project folder:
        *   `app.py`
        *   `scraper.py`
        *   `youtube_scraper.py`
        *   `instagram_scraper.py`
        *   `instagram_scraper_v2.py`
        *   `requirements.txt`
        *   `README.md`
    *   Commit the files directly to the `main` branch.

4.  **Done!** Hugging Face will automatically build and launch your application. You can share the URL of your Space with anyone.

## 📋 How to Use the App

1.  **Enter a URL**: Paste the URL of the website, YouTube video, or Instagram profile you want to scrape.
2.  **Select Data Types**: Choose the data you want to extract.
3.  **Click Scrape!**: Let the app do the work.
4.  **View & Download**: See the results directly in the app and download them in your preferred format.

- [ ] Real-time scraping status
- [ ] Custom CSS selectors
- [ ] Proxy support
- [ ] Multi-language support

## 🤝 Contributing

1. Fork the repository
2. Create a feature branch
3. Make your changes
4. Test thoroughly
5. Submit a pull request

## 📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

## 🙏 Acknowledgments

- Streamlit team for the amazing web app framework
- BeautifulSoup and Selenium communities
- Hugging Face for hosting capabilities

---

**Made with ❤️ for the AI/ML community**