File size: 4,095 Bytes
504b397
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
# Image Classification Usage

Image classification tasks send one or more captcha images to an OpenAI-compatible vision model and return the indices of matching cells or a boolean answer. No browser automation is involved — these are pure vision model API calls.

## Supported task types

| Task type | Description |
|-----------|-------------|
| `HCaptchaClassification` | hCaptcha 3x3 grid — returns matching cell indices |
| `ReCaptchaV2Classification` | reCAPTCHA v2 3x3 / 4x4 grid — returns matching cell indices |
| `FunCaptchaClassification` | FunCaptcha 2x3 grid — returns the correct cell index |
| `AwsClassification` | AWS CAPTCHA image selection |

## Solution fields

| Task type | Solution field | Example |
|-----------|---------------|---------|
| `HCaptchaClassification` | `objects` or `answer` | `[0, 2, 5]` or `true` |
| `ReCaptchaV2Classification` | `objects` | `[0, 3, 6]` |
| `FunCaptchaClassification` | `objects` | `[4]` |
| `AwsClassification` | `objects` | `[1]` |

## HCaptchaClassification

### Request shape

```json
{
  "clientKey": "your-client-key",
  "task": {
    "type": "HCaptchaClassification",
    "queries": ["<base64-image-1>", "<base64-image-2>", "<base64-image-3>"],
    "question": "Please click each image containing a bicycle"
  }
}
```

The `queries` field accepts a list of base64-encoded images (one per grid cell). The `question` field is the challenge prompt displayed to the user.

### Response

```json
{
  "errorId": 0,
  "status": "ready",
  "solution": {
    "objects": [1, 4]
  }
}
```

## ReCaptchaV2Classification

### Request shape

```json
{
  "clientKey": "your-client-key",
  "task": {
    "type": "ReCaptchaV2Classification",
    "image": "<base64-encoded-grid-image>",
    "question": "Select all images with traffic lights"
  }
}
```

The `image` field is a single base64-encoded image of the full reCAPTCHA grid (3×3 = 9 cells or 4×4 = 16 cells). Cells are numbered 0–8 (or 0–15), left-to-right, top-to-bottom.

### Response

```json
{
  "errorId": 0,
  "status": "ready",
  "solution": {
    "objects": [0, 3, 6]
  }
}
```

## FunCaptchaClassification

### Request shape

```json
{
  "clientKey": "your-client-key",
  "task": {
    "type": "FunCaptchaClassification",
    "image": "<base64-encoded-grid-image>",
    "question": "Pick the image that shows a boat facing left"
  }
}
```

The grid is typically 2×3 (6 cells). Usually one answer is expected.

### Response

```json
{
  "errorId": 0,
  "status": "ready",
  "solution": {
    "objects": [3]
  }
}
```

## AwsClassification

### Request shape

```json
{
  "clientKey": "your-client-key",
  "task": {
    "type": "AwsClassification",
    "image": "<base64-encoded-image>",
    "question": "Select the image that matches"
  }
}
```

### Response

```json
{
  "errorId": 0,
  "status": "ready",
  "solution": {
    "objects": [1]
  }
}
```

## Create and poll (generic example)

```bash
# Step 1: create task
TASK_ID=$(curl -s -X POST http://localhost:8000/createTask \
  -H "Content-Type: application/json" \
  -d '{
    "clientKey": "your-client-key",
    "task": {
      "type": "ReCaptchaV2Classification",
      "image": "'$(base64 -w0 captcha.png)'",
      "question": "Select all images with traffic lights"
    }
  }' | python -c "import sys,json; print(json.load(sys.stdin)['taskId'])")

# Step 2: poll result
curl -s -X POST http://localhost:8000/getTaskResult \
  -H "Content-Type: application/json" \
  -d "{\"clientKey\":\"your-client-key\",\"taskId\":\"$TASK_ID\"}"
```

## Operational notes

- All classification tasks are **synchronous from the model's perspective** — the `asyncio.create_task` wrapper means the HTTP response is immediate, but the actual model call happens in the background.
- Model accuracy depends entirely on the vision model configured via `CAPTCHA_MULTIMODAL_MODEL` (default: `qwen3.5-2b`).
- For best results with classification, the `CAPTCHA_MODEL` (`gpt-5.4`) can be substituted by setting `CAPTCHA_MULTIMODAL_MODEL=gpt-5.4`.
- Images should not be pre-resized — the solver handles normalization internally.