File size: 4,095 Bytes
504b397 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 | # Image Classification Usage
Image classification tasks send one or more captcha images to an OpenAI-compatible vision model and return the indices of matching cells or a boolean answer. No browser automation is involved — these are pure vision model API calls.
## Supported task types
| Task type | Description |
|-----------|-------------|
| `HCaptchaClassification` | hCaptcha 3x3 grid — returns matching cell indices |
| `ReCaptchaV2Classification` | reCAPTCHA v2 3x3 / 4x4 grid — returns matching cell indices |
| `FunCaptchaClassification` | FunCaptcha 2x3 grid — returns the correct cell index |
| `AwsClassification` | AWS CAPTCHA image selection |
## Solution fields
| Task type | Solution field | Example |
|-----------|---------------|---------|
| `HCaptchaClassification` | `objects` or `answer` | `[0, 2, 5]` or `true` |
| `ReCaptchaV2Classification` | `objects` | `[0, 3, 6]` |
| `FunCaptchaClassification` | `objects` | `[4]` |
| `AwsClassification` | `objects` | `[1]` |
## HCaptchaClassification
### Request shape
```json
{
"clientKey": "your-client-key",
"task": {
"type": "HCaptchaClassification",
"queries": ["<base64-image-1>", "<base64-image-2>", "<base64-image-3>"],
"question": "Please click each image containing a bicycle"
}
}
```
The `queries` field accepts a list of base64-encoded images (one per grid cell). The `question` field is the challenge prompt displayed to the user.
### Response
```json
{
"errorId": 0,
"status": "ready",
"solution": {
"objects": [1, 4]
}
}
```
## ReCaptchaV2Classification
### Request shape
```json
{
"clientKey": "your-client-key",
"task": {
"type": "ReCaptchaV2Classification",
"image": "<base64-encoded-grid-image>",
"question": "Select all images with traffic lights"
}
}
```
The `image` field is a single base64-encoded image of the full reCAPTCHA grid (3×3 = 9 cells or 4×4 = 16 cells). Cells are numbered 0–8 (or 0–15), left-to-right, top-to-bottom.
### Response
```json
{
"errorId": 0,
"status": "ready",
"solution": {
"objects": [0, 3, 6]
}
}
```
## FunCaptchaClassification
### Request shape
```json
{
"clientKey": "your-client-key",
"task": {
"type": "FunCaptchaClassification",
"image": "<base64-encoded-grid-image>",
"question": "Pick the image that shows a boat facing left"
}
}
```
The grid is typically 2×3 (6 cells). Usually one answer is expected.
### Response
```json
{
"errorId": 0,
"status": "ready",
"solution": {
"objects": [3]
}
}
```
## AwsClassification
### Request shape
```json
{
"clientKey": "your-client-key",
"task": {
"type": "AwsClassification",
"image": "<base64-encoded-image>",
"question": "Select the image that matches"
}
}
```
### Response
```json
{
"errorId": 0,
"status": "ready",
"solution": {
"objects": [1]
}
}
```
## Create and poll (generic example)
```bash
# Step 1: create task
TASK_ID=$(curl -s -X POST http://localhost:8000/createTask \
-H "Content-Type: application/json" \
-d '{
"clientKey": "your-client-key",
"task": {
"type": "ReCaptchaV2Classification",
"image": "'$(base64 -w0 captcha.png)'",
"question": "Select all images with traffic lights"
}
}' | python -c "import sys,json; print(json.load(sys.stdin)['taskId'])")
# Step 2: poll result
curl -s -X POST http://localhost:8000/getTaskResult \
-H "Content-Type: application/json" \
-d "{\"clientKey\":\"your-client-key\",\"taskId\":\"$TASK_ID\"}"
```
## Operational notes
- All classification tasks are **synchronous from the model's perspective** — the `asyncio.create_task` wrapper means the HTTP response is immediate, but the actual model call happens in the background.
- Model accuracy depends entirely on the vision model configured via `CAPTCHA_MULTIMODAL_MODEL` (default: `qwen3.5-2b`).
- For best results with classification, the `CAPTCHA_MODEL` (`gpt-5.4`) can be substituted by setting `CAPTCHA_MULTIMODAL_MODEL=gpt-5.4`.
- Images should not be pre-resized — the solver handles normalization internally.
|