File size: 2,057 Bytes
504b397
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
# Getting Started

## Requirements

- Python 3.10+
- Chromium available through Playwright
- Network access to:
  - target sites you want to solve against
  - your configured OpenAI-compatible model endpoint

## Installation

```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
playwright install --with-deps chromium
```

## Environment variables

| Variable | Description | Default |
| --- | --- | --- |
| `CLIENT_KEY` | Client auth key used as `clientKey` | unset |
| `CAPTCHA_BASE_URL` | OpenAI-compatible API base URL | `https://your-openai-compatible-endpoint/v1` |
| `CAPTCHA_API_KEY` | API key for your model provider | unset |
| `CAPTCHA_MODEL` | Strong text model | `gpt-5.4` |
| `CAPTCHA_MULTIMODAL_MODEL` | Multimodal model | `qwen3.5-2b` |
| `CAPTCHA_RETRIES` | Retry count | `3` |
| `CAPTCHA_TIMEOUT` | Model timeout in seconds | `30` |
| `BROWSER_HEADLESS` | Run Chromium headless | `true` |
| `BROWSER_TIMEOUT` | Browser timeout in seconds | `30` |
| `SERVER_HOST` | Bind host | `0.0.0.0` |
| `SERVER_PORT` | Bind port | `8000` |

## Start the service

```bash
export CLIENT_KEY="your-client-key"
export CAPTCHA_BASE_URL="https://your-openai-compatible-endpoint/v1"
export CAPTCHA_API_KEY="your-api-key"
export CAPTCHA_MODEL="gpt-5.4"
export CAPTCHA_MULTIMODAL_MODEL="qwen3.5-2b"
python main.py
```

## Verify startup

### Root endpoint

```bash
curl http://localhost:8000/
```

### Health endpoint

```bash
curl http://localhost:8000/api/v1/health
```

The health response should include the registered task types and current runtime model settings.

## Local and self-hosted model support

The image recognition path is built around **OpenAI-compatible APIs**. In practice, this means you can point `CAPTCHA_BASE_URL` at a hosted provider or a self-hosted/local multimodal gateway, as long as it exposes compatible chat-completions semantics and supports image input.

The project intentionally documents this in generic compatibility terms rather than claiming full validation for every provider stack.