Spaces:
Paused
Paused
| # Image Classification Usage | |
| Image classification tasks send one or more captcha images to an OpenAI-compatible vision model and return the indices of matching cells or a boolean answer. No browser automation is involved β these are pure vision model API calls. | |
| ## Supported task types | |
| | Task type | Description | | |
| |-----------|-------------| | |
| | `HCaptchaClassification` | hCaptcha 3x3 grid β returns matching cell indices | | |
| | `ReCaptchaV2Classification` | reCAPTCHA v2 3x3 / 4x4 grid β returns matching cell indices | | |
| | `FunCaptchaClassification` | FunCaptcha 2x3 grid β returns the correct cell index | | |
| | `AwsClassification` | AWS CAPTCHA image selection | | |
| ## Solution fields | |
| | Task type | Solution field | Example | | |
| |-----------|---------------|---------| | |
| | `HCaptchaClassification` | `objects` or `answer` | `[0, 2, 5]` or `true` | | |
| | `ReCaptchaV2Classification` | `objects` | `[0, 3, 6]` | | |
| | `FunCaptchaClassification` | `objects` | `[4]` | | |
| | `AwsClassification` | `objects` | `[1]` | | |
| ## HCaptchaClassification | |
| ### Request shape | |
| ```json | |
| { | |
| "clientKey": "your-client-key", | |
| "task": { | |
| "type": "HCaptchaClassification", | |
| "queries": ["<base64-image-1>", "<base64-image-2>", "<base64-image-3>"], | |
| "question": "Please click each image containing a bicycle" | |
| } | |
| } | |
| ``` | |
| The `queries` field accepts a list of base64-encoded images (one per grid cell). The `question` field is the challenge prompt displayed to the user. | |
| ### Response | |
| ```json | |
| { | |
| "errorId": 0, | |
| "status": "ready", | |
| "solution": { | |
| "objects": [1, 4] | |
| } | |
| } | |
| ``` | |
| ## ReCaptchaV2Classification | |
| ### Request shape | |
| ```json | |
| { | |
| "clientKey": "your-client-key", | |
| "task": { | |
| "type": "ReCaptchaV2Classification", | |
| "image": "<base64-encoded-grid-image>", | |
| "question": "Select all images with traffic lights" | |
| } | |
| } | |
| ``` | |
| The `image` field is a single base64-encoded image of the full reCAPTCHA grid (3Γ3 = 9 cells or 4Γ4 = 16 cells). Cells are numbered 0β8 (or 0β15), left-to-right, top-to-bottom. | |
| ### Response | |
| ```json | |
| { | |
| "errorId": 0, | |
| "status": "ready", | |
| "solution": { | |
| "objects": [0, 3, 6] | |
| } | |
| } | |
| ``` | |
| ## FunCaptchaClassification | |
| ### Request shape | |
| ```json | |
| { | |
| "clientKey": "your-client-key", | |
| "task": { | |
| "type": "FunCaptchaClassification", | |
| "image": "<base64-encoded-grid-image>", | |
| "question": "Pick the image that shows a boat facing left" | |
| } | |
| } | |
| ``` | |
| The grid is typically 2Γ3 (6 cells). Usually one answer is expected. | |
| ### Response | |
| ```json | |
| { | |
| "errorId": 0, | |
| "status": "ready", | |
| "solution": { | |
| "objects": [3] | |
| } | |
| } | |
| ``` | |
| ## AwsClassification | |
| ### Request shape | |
| ```json | |
| { | |
| "clientKey": "your-client-key", | |
| "task": { | |
| "type": "AwsClassification", | |
| "image": "<base64-encoded-image>", | |
| "question": "Select the image that matches" | |
| } | |
| } | |
| ``` | |
| ### Response | |
| ```json | |
| { | |
| "errorId": 0, | |
| "status": "ready", | |
| "solution": { | |
| "objects": [1] | |
| } | |
| } | |
| ``` | |
| ## Create and poll (generic example) | |
| ```bash | |
| # Step 1: create task | |
| TASK_ID=$(curl -s -X POST http://localhost:8000/createTask \ | |
| -H "Content-Type: application/json" \ | |
| -d '{ | |
| "clientKey": "your-client-key", | |
| "task": { | |
| "type": "ReCaptchaV2Classification", | |
| "image": "'$(base64 -w0 captcha.png)'", | |
| "question": "Select all images with traffic lights" | |
| } | |
| }' | python -c "import sys,json; print(json.load(sys.stdin)['taskId'])") | |
| # Step 2: poll result | |
| curl -s -X POST http://localhost:8000/getTaskResult \ | |
| -H "Content-Type: application/json" \ | |
| -d "{\"clientKey\":\"your-client-key\",\"taskId\":\"$TASK_ID\"}" | |
| ``` | |
| ## Operational notes | |
| - All classification tasks are **synchronous from the model's perspective** β the `asyncio.create_task` wrapper means the HTTP response is immediate, but the actual model call happens in the background. | |
| - Model accuracy depends entirely on the vision model configured via `CAPTCHA_MULTIMODAL_MODEL` (default: `qwen3.5-2b`). | |
| - For best results with classification, the `CAPTCHA_MODEL` (`gpt-5.4`) can be substituted by setting `CAPTCHA_MULTIMODAL_MODEL=gpt-5.4`. | |
| - Images should not be pre-resized β the solver handles normalization internally. | |