browserpilot / README.md
ncolex's picture
Add HF Spaces metadata
c66a865 verified
---
title: BrowserPilot
emoji: πŸ€–
colorFrom: blue
colorTo: purple
sdk: docker
python_version: 3.10
---
# BrowserPilot
> Ever wished you could tell your browser "Hey, go grab all the product prices from that e-commerce site" and it would just... do it? That's exactly what this does, but smarter.
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](http://makeapullrequest.com)
## What's This All About?
Tired of writing complex scrapers that break every time a website changes its layout? Yeah, me too.
This AI-powered browser actually *sees* web pages like you do. It doesn't care if Amazon redesigns their product pages or if LinkedIn adds new anti-bot measures. Just tell it what you want in plain English, and it figures out how to get it.
Think of it as having a really smart intern who never gets tired, never makes mistakes, and can handle any website you throw at them - even the ones with annoying CAPTCHAs.
## See It In Action
Trust me, it's pretty cool watching an AI navigate websites like a human
https://github.com/user-attachments/assets/39d2ed68-e121-49b9-817e-2eb5edc25627
## Why You'll Love This
### It Actually "Sees" Websites
- Uses Google's Gemini AI to look at pages like you do
- Automatically figures out if it's looking at Amazon, LinkedIn, or your random blog
- Clicks the right buttons even when websites change their design
- Works on literally any website (yes, even the weird ones)
### Handles the Annoying Stuff
- Gets blocked by Cloudflare? No problem, switches proxies automatically
- Encounters a CAPTCHA? Solves it with AI vision
- Website thinks it's a bot? Laughs in artificial intelligence
- Proxy goes down? Switches to a backup faster than you can blink
### Gives You Data How You Want It
- Say "save as PDF" and boom, you get a PDF
- Ask for CSV and it structures everything perfectly
- Want JSON? It knows what you mean
- Organizes everything with timestamps and metadata (because details matter)
### Watch It Work Live
- Stream the browser view in real-time (it's oddly satisfying)
- Click and type remotely if you need to step in
- Multiple people can watch the same session
- Perfect for debugging or just showing off
## Getting Started (It's Actually Pretty Easy)
### 🐳 Quick Start with Docker (Recommended)
The easiest way to run BrowserPilot is with Docker:
```bash
# Clone and start with Docker Compose
git clone https://github.com/ai-naymul/BrowserPilot.git
cd BrowserPilot
echo 'GOOGLE_API_KEY=your_actual_api_key_here' > .env
docker-compose up -d
```
Open `http://localhost:8000` and you're ready to go! πŸš€
[πŸ“– Full Docker Documentation](README.docker.md)
### πŸ’» Manual Installation
### What You'll Need
- Python 3.8 or newer (check with `python --version`)
- A Google AI API key (free to get, just sign up at ai.google.dev)
- Some proxies if you're planning to scrape heavily (optional but recommended)
### Let's Get This Running
1. **Grab the code**
```bash
git clone https://github.com/ai-naymul/BrowserPilot.git
cd BrowserPilot
```
2. **Install the good stuff**
```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
uv pip install -r requirements.txt
```
3. **Add your secrets**
```bash
# Create a .env file (don't worry, it's gitignored)
echo 'GOOGLE_API_KEY=your_actual_api_key_here' > .env
echo 'SCRAPER_PROXIES=[{"server": "http://proxy1:port", "username": "user", "password": "pass"}]' >> .env
```
4. **Fire it up**
```bash
python -m uvicorn backend.main:app --reload
```
5. **See the magic**
Open `http://localhost:8000` and start telling it what to do
## Real Examples (Because Everyone Loves Examples)
### Just Getting Started
```javascript
"Go to Hacker News and save the top stories as JSON"
```
That's it. Seriously. It'll figure out the rest.
### Shopping for Data
```javascript
"Search Amazon for wireless headphones under $100 and export the results to CSV"
```
It'll navigate, search, filter, and organize everything nicely for you.
### Social Media Intel
```javascript
"Go to LinkedIn, find AI engineers in San Francisco, and save their profiles"
```
Don't worry, it handles all the login prompts and infinite scroll nonsense.
### The Wild West
```javascript
"Visit this random e-commerce site and grab all the product prices"
```
Even works on sites you've never seen before. That's the beauty of AI vision.
## Core Components
### Smart Browser Controller
- Automatic anti-bot detection using AI vision
- Proxy rotation on detection/blocking
- CAPTCHA solving capabilities
- Browser restart with new proxies
### Vision Model Integration
- Dynamic website analysis
- Anti-bot system detection
- Element interaction decisions
- CAPTCHA recognition and solving
### Universal Extractor
- AI-powered content extraction
- Multiple output format support
- Structured data organization
- Metadata preservation
### Proxy Management
- Health tracking and statistics
- Performance-based selection
- Site-specific blocking lists
- Automatic failure recovery
## The Cool Technical Stuff
### Smart Format Detection
Just talk to it naturally:
- "save as PDF" β†’ Gets you a beautiful PDF
- "export to CSV" β†’ Perfectly structured spreadsheet
- "give me JSON" β†’ Clean, organized data structure
### Anti-Bot Ninja Mode
- Spots Cloudflare challenges before they even load
- Solves CAPTCHAs like a human (but faster)
- Detects rate limits and backs off gracefully
- Switches identities when websites get suspicious
### Dashboard That Actually Helps
- See which proxies are working (and which ones suck)
- Watch your browser sessions live
- Track how much you're spending on AI tokens
- Performance stats that make sense
## Configuration
### Proxy Configuration
```json
{
"SCRAPER_PROXIES": [
{
"server": "http://proxy1.example.com:8080",
"username": "user1",
"password": "pass1",
"location": "US"
},
{
"server": "http://proxy2.example.com:8080",
"username": "user2",
"password": "pass2",
"location": "EU"
}
]
}
```
### Environment Variables
```bash
# Required
GOOGLE_API_KEY=your_gemini_api_key_here
# Optional
SCRAPER_PROXIES=your_proxy_configuration
```
## Contributors
<a href="https://github.com/your-username/your-repo/graphs/contributors">
<img src="https://contrib.rocks/image?repo=ai-naymul/BrowserPilot" />
</a>
## 🀝 Want to Help Make This Better?
Found a bug? Have a crazy idea? Want to add support for your favorite website? I'd love the help!
Here's how to jump in:
1. Fork this repo (there's a button for that)
2. Create a branch with a name that makes sense (`git checkout -b fix-amazon-pagination`)
3. Make your changes (and please test them!)
4. Commit with a message that explains what you did
5. Push it up and open a pull request
For big changes, maybe open an issue first so we can chat about it.
## πŸ™ Acknowledgments
- [Playwright](https://playwright.dev/) for browser automation
- [Google Gemini](https://ai.google.dev/) for vision AI capabilities
- [FastAPI](https://fastapi.tiangolo.com/) for the backend framework
- Open source community for inspiration and tools