Spaces:

ncolex
/

browserpilot

Runtime error

File size: 7,379 Bytes

---
title: BrowserPilot
emoji: 🤖
colorFrom: blue
colorTo: purple
sdk: docker
python_version: 3.10
---

# BrowserPilot

> Ever wished you could tell your browser "Hey, go grab all the product prices from that e-commerce site" and it would just... do it? That's exactly what this does, but smarter.

[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](http://makeapullrequest.com)

## What's This All About?

Tired of writing complex scrapers that break every time a website changes its layout? Yeah, me too. 

This AI-powered browser actually *sees* web pages like you do. It doesn't care if Amazon redesigns their product pages or if LinkedIn adds new anti-bot measures. Just tell it what you want in plain English, and it figures out how to get it.

Think of it as having a really smart intern who never gets tired, never makes mistakes, and can handle any website you throw at them - even the ones with annoying CAPTCHAs.

## See It In Action

Trust me, it's pretty cool watching an AI navigate websites like a human


https://github.com/user-attachments/assets/39d2ed68-e121-49b9-817e-2eb5edc25627


## Why You'll Love This

### It Actually "Sees" Websites
- Uses Google's Gemini AI to look at pages like you do
- Automatically figures out if it's looking at Amazon, LinkedIn, or your random blog
- Clicks the right buttons even when websites change their design
- Works on literally any website (yes, even the weird ones)

### Handles the Annoying Stuff
- Gets blocked by Cloudflare? No problem, switches proxies automatically
- Encounters a CAPTCHA? Solves it with AI vision
- Website thinks it's a bot? Laughs in artificial intelligence
- Proxy goes down? Switches to a backup faster than you can blink

### Gives You Data How You Want It
- Say "save as PDF" and boom, you get a PDF
- Ask for CSV and it structures everything perfectly
- Want JSON? It knows what you mean
- Organizes everything with timestamps and metadata (because details matter)

### Watch It Work Live
- Stream the browser view in real-time (it's oddly satisfying)
- Click and type remotely if you need to step in
- Multiple people can watch the same session
- Perfect for debugging or just showing off

## Getting Started (It's Actually Pretty Easy)

### 🐳 Quick Start with Docker (Recommended)

The easiest way to run BrowserPilot is with Docker:

```bash
# Clone and start with Docker Compose
git clone https://github.com/ai-naymul/BrowserPilot.git
cd BrowserPilot
echo 'GOOGLE_API_KEY=your_actual_api_key_here' > .env
docker-compose up -d
```

Open `http://localhost:8000` and you're ready to go! 🚀

[📖 Full Docker Documentation](README.docker.md)

### 💻 Manual Installation

### What You'll Need
- Python 3.8 or newer (check with `python --version`)
- A Google AI API key (free to get, just sign up at ai.google.dev)
- Some proxies if you're planning to scrape heavily (optional but recommended)

### Let's Get This Running

1. **Grab the code**
   ```bash
   git clone https://github.com/ai-naymul/BrowserPilot.git
   cd BrowserPilot
   ```

2. **Install the good stuff**
   ```bash
   curl -LsSf https://astral.sh/uv/install.sh | sh
   uv pip install -r requirements.txt
   ```

3. **Add your secrets**
   ```bash
   # Create a .env file (don't worry, it's gitignored)
   echo 'GOOGLE_API_KEY=your_actual_api_key_here' > .env
   echo 'SCRAPER_PROXIES=[{"server": "http://proxy1:port", "username": "user", "password": "pass"}]' >> .env
   ```

4. **Fire it up**
   ```bash
   python -m uvicorn backend.main:app --reload
   ```

5. **See the magic**
   Open `http://localhost:8000` and start telling it what to do

## Real Examples (Because Everyone Loves Examples)

### Just Getting Started
```javascript
"Go to Hacker News and save the top stories as JSON"
```
That's it. Seriously. It'll figure out the rest.

### Shopping for Data
```javascript
"Search Amazon for wireless headphones under $100 and export the results to CSV"
```
It'll navigate, search, filter, and organize everything nicely for you.

### Social Media Intel
```javascript
"Go to LinkedIn, find AI engineers in San Francisco, and save their profiles"
```
Don't worry, it handles all the login prompts and infinite scroll nonsense.

### The Wild West
```javascript
"Visit this random e-commerce site and grab all the product prices"
```
Even works on sites you've never seen before. That's the beauty of AI vision.

## Core Components

### Smart Browser Controller
- Automatic anti-bot detection using AI vision
- Proxy rotation on detection/blocking
- CAPTCHA solving capabilities
- Browser restart with new proxies

### Vision Model Integration
- Dynamic website analysis
- Anti-bot system detection
- Element interaction decisions
- CAPTCHA recognition and solving

### Universal Extractor
- AI-powered content extraction
- Multiple output format support
- Structured data organization
- Metadata preservation

### Proxy Management
- Health tracking and statistics
- Performance-based selection
- Site-specific blocking lists
- Automatic failure recovery

## The Cool Technical Stuff

### Smart Format Detection
Just talk to it naturally:
- "save as PDF" → Gets you a beautiful PDF
- "export to CSV" → Perfectly structured spreadsheet
- "give me JSON" → Clean, organized data structure

### Anti-Bot Ninja Mode
- Spots Cloudflare challenges before they even load
- Solves CAPTCHAs like a human (but faster)
- Detects rate limits and backs off gracefully
- Switches identities when websites get suspicious

### Dashboard That Actually Helps
- See which proxies are working (and which ones suck)
- Watch your browser sessions live
- Track how much you're spending on AI tokens
- Performance stats that make sense

## Configuration

### Proxy Configuration
```json
{
  "SCRAPER_PROXIES": [
    {
      "server": "http://proxy1.example.com:8080",
      "username": "user1",
      "password": "pass1",
      "location": "US"
    },
    {
      "server": "http://proxy2.example.com:8080",
      "username": "user2",
      "password": "pass2",
      "location": "EU"
    }
  ]
}
```

### Environment Variables
```bash
# Required
GOOGLE_API_KEY=your_gemini_api_key_here

# Optional
SCRAPER_PROXIES=your_proxy_configuration
```

## Contributors

<a href="https://github.com/your-username/your-repo/graphs/contributors">
  <img src="https://contrib.rocks/image?repo=ai-naymul/BrowserPilot" />
</a>


## 🤝 Want to Help Make This Better?

Found a bug? Have a crazy idea? Want to add support for your favorite website? I'd love the help!

Here's how to jump in:
1. Fork this repo (there's a button for that)
2. Create a branch with a name that makes sense (`git checkout -b fix-amazon-pagination`)
3. Make your changes (and please test them!)
4. Commit with a message that explains what you did
5. Push it up and open a pull request

For big changes, maybe open an issue first so we can chat about it.

## 🙏 Acknowledgments

- [Playwright](https://playwright.dev/) for browser automation
- [Google Gemini](https://ai.google.dev/) for vision AI capabilities
- [FastAPI](https://fastapi.tiangolo.com/) for the backend framework
- Open source community for inspiration and tools