Spaces:
Runtime error
Runtime error
File size: 7,379 Bytes
c66a865 c1f04cf | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 | ---
title: BrowserPilot
emoji: π€
colorFrom: blue
colorTo: purple
sdk: docker
python_version: 3.10
---
# BrowserPilot
> Ever wished you could tell your browser "Hey, go grab all the product prices from that e-commerce site" and it would just... do it? That's exactly what this does, but smarter.
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/)
[](http://makeapullrequest.com)
## What's This All About?
Tired of writing complex scrapers that break every time a website changes its layout? Yeah, me too.
This AI-powered browser actually *sees* web pages like you do. It doesn't care if Amazon redesigns their product pages or if LinkedIn adds new anti-bot measures. Just tell it what you want in plain English, and it figures out how to get it.
Think of it as having a really smart intern who never gets tired, never makes mistakes, and can handle any website you throw at them - even the ones with annoying CAPTCHAs.
## See It In Action
Trust me, it's pretty cool watching an AI navigate websites like a human
https://github.com/user-attachments/assets/39d2ed68-e121-49b9-817e-2eb5edc25627
## Why You'll Love This
### It Actually "Sees" Websites
- Uses Google's Gemini AI to look at pages like you do
- Automatically figures out if it's looking at Amazon, LinkedIn, or your random blog
- Clicks the right buttons even when websites change their design
- Works on literally any website (yes, even the weird ones)
### Handles the Annoying Stuff
- Gets blocked by Cloudflare? No problem, switches proxies automatically
- Encounters a CAPTCHA? Solves it with AI vision
- Website thinks it's a bot? Laughs in artificial intelligence
- Proxy goes down? Switches to a backup faster than you can blink
### Gives You Data How You Want It
- Say "save as PDF" and boom, you get a PDF
- Ask for CSV and it structures everything perfectly
- Want JSON? It knows what you mean
- Organizes everything with timestamps and metadata (because details matter)
### Watch It Work Live
- Stream the browser view in real-time (it's oddly satisfying)
- Click and type remotely if you need to step in
- Multiple people can watch the same session
- Perfect for debugging or just showing off
## Getting Started (It's Actually Pretty Easy)
### π³ Quick Start with Docker (Recommended)
The easiest way to run BrowserPilot is with Docker:
```bash
# Clone and start with Docker Compose
git clone https://github.com/ai-naymul/BrowserPilot.git
cd BrowserPilot
echo 'GOOGLE_API_KEY=your_actual_api_key_here' > .env
docker-compose up -d
```
Open `http://localhost:8000` and you're ready to go! π
[π Full Docker Documentation](README.docker.md)
### π» Manual Installation
### What You'll Need
- Python 3.8 or newer (check with `python --version`)
- A Google AI API key (free to get, just sign up at ai.google.dev)
- Some proxies if you're planning to scrape heavily (optional but recommended)
### Let's Get This Running
1. **Grab the code**
```bash
git clone https://github.com/ai-naymul/BrowserPilot.git
cd BrowserPilot
```
2. **Install the good stuff**
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
uv pip install -r requirements.txt
```
3. **Add your secrets**
```bash
# Create a .env file (don't worry, it's gitignored)
echo 'GOOGLE_API_KEY=your_actual_api_key_here' > .env
echo 'SCRAPER_PROXIES=[{"server": "http://proxy1:port", "username": "user", "password": "pass"}]' >> .env
```
4. **Fire it up**
```bash
python -m uvicorn backend.main:app --reload
```
5. **See the magic**
Open `http://localhost:8000` and start telling it what to do
## Real Examples (Because Everyone Loves Examples)
### Just Getting Started
```javascript
"Go to Hacker News and save the top stories as JSON"
```
That's it. Seriously. It'll figure out the rest.
### Shopping for Data
```javascript
"Search Amazon for wireless headphones under $100 and export the results to CSV"
```
It'll navigate, search, filter, and organize everything nicely for you.
### Social Media Intel
```javascript
"Go to LinkedIn, find AI engineers in San Francisco, and save their profiles"
```
Don't worry, it handles all the login prompts and infinite scroll nonsense.
### The Wild West
```javascript
"Visit this random e-commerce site and grab all the product prices"
```
Even works on sites you've never seen before. That's the beauty of AI vision.
## Core Components
### Smart Browser Controller
- Automatic anti-bot detection using AI vision
- Proxy rotation on detection/blocking
- CAPTCHA solving capabilities
- Browser restart with new proxies
### Vision Model Integration
- Dynamic website analysis
- Anti-bot system detection
- Element interaction decisions
- CAPTCHA recognition and solving
### Universal Extractor
- AI-powered content extraction
- Multiple output format support
- Structured data organization
- Metadata preservation
### Proxy Management
- Health tracking and statistics
- Performance-based selection
- Site-specific blocking lists
- Automatic failure recovery
## The Cool Technical Stuff
### Smart Format Detection
Just talk to it naturally:
- "save as PDF" β Gets you a beautiful PDF
- "export to CSV" β Perfectly structured spreadsheet
- "give me JSON" β Clean, organized data structure
### Anti-Bot Ninja Mode
- Spots Cloudflare challenges before they even load
- Solves CAPTCHAs like a human (but faster)
- Detects rate limits and backs off gracefully
- Switches identities when websites get suspicious
### Dashboard That Actually Helps
- See which proxies are working (and which ones suck)
- Watch your browser sessions live
- Track how much you're spending on AI tokens
- Performance stats that make sense
## Configuration
### Proxy Configuration
```json
{
"SCRAPER_PROXIES": [
{
"server": "http://proxy1.example.com:8080",
"username": "user1",
"password": "pass1",
"location": "US"
},
{
"server": "http://proxy2.example.com:8080",
"username": "user2",
"password": "pass2",
"location": "EU"
}
]
}
```
### Environment Variables
```bash
# Required
GOOGLE_API_KEY=your_gemini_api_key_here
# Optional
SCRAPER_PROXIES=your_proxy_configuration
```
## Contributors
<a href="https://github.com/your-username/your-repo/graphs/contributors">
<img src="https://contrib.rocks/image?repo=ai-naymul/BrowserPilot" />
</a>
## π€ Want to Help Make This Better?
Found a bug? Have a crazy idea? Want to add support for your favorite website? I'd love the help!
Here's how to jump in:
1. Fork this repo (there's a button for that)
2. Create a branch with a name that makes sense (`git checkout -b fix-amazon-pagination`)
3. Make your changes (and please test them!)
4. Commit with a message that explains what you did
5. Push it up and open a pull request
For big changes, maybe open an issue first so we can chat about it.
## π Acknowledgments
- [Playwright](https://playwright.dev/) for browser automation
- [Google Gemini](https://ai.google.dev/) for vision AI capabilities
- [FastAPI](https://fastapi.tiangolo.com/) for the backend framework
- Open source community for inspiration and tools
|