Graf-J commited on
Commit
fa98216
·
verified ·
1 Parent(s): 802e393

Initial Commit

Browse files
README.md ADDED
@@ -0,0 +1,212 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - ocr
4
+ - pytorch
5
+ license: mit
6
+ datasets:
7
+ - hammer888/captcha-data
8
+ metrics:
9
+ - accuracy
10
+ - cer
11
+ pipeline_tag: image-to-text
12
+ library_name: transformers
13
+ ---
14
+
15
+ <div align="center">
16
+
17
+ # ✨ DeepCaptcha-Conv-Transformer: Sequential Vision for OCR
18
+ ### Convolutional Transformer Base
19
+
20
+ [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
21
+ [![Python 3.13+](https://img.shields.io/badge/python-3.13+-blue.svg)](https://www.python.org/downloads/release/python-3130/)
22
+ [![Hugging Face Model](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-orange)](https://huggingface.co/Graf-J/captcha-crnn-finetuned)
23
+
24
+ ---
25
+
26
+ <img src="images/CAPTCHA.png" alt="Captcha Example" width="500">
27
+
28
+ *Advanced sequence recognition using a Convolutional Transformer Encoder with Connectionist Temporal Classification (CTC) loss.*
29
+
30
+ </div>
31
+
32
+ ---
33
+
34
+ ## 📋 Model Details
35
+ - **Task:** Alphanumeric Captcha Recognition
36
+ - **Input:** Images
37
+ - **Output:** String sequences (Length 1–8 characters)
38
+ - **Vocabulary:** Alphanumeric (`a-z`, `A-Z`, `0-9`)
39
+ - **Architecture:** Convolutional Transformer Encoder (CNN + Transformer Encoder)
40
+
41
+ ---
42
+
43
+ ## 📊 Performance Metrics
44
+
45
+ ### **Test Set Results**
46
+
47
+ | Dataset | Sequence Accuracy | Character Error Rate (CER) |
48
+ | --- | --- | --- |
49
+ | **[hammer888/captcha-data](https://huggingface.co/datasets/hammer888/captcha-data)** | `97.38%` | `0.57%` |
50
+
51
+ ### **Hardware & Efficiency**
52
+ | Metric | Value |
53
+ | --- | --- |
54
+ | **Model Parameters** | `12,279,551` |
55
+ | **Model Size (Disk)** | `51.7 MB` |
56
+ | **Throughput (Images/sec)** | `733.00 – 751.11` |
57
+ | **Compute Hardware** | **NVIDIA RTX A6000** |
58
+
59
+ ---
60
+
61
+ ## 🧪 Try It With Sample Images
62
+
63
+ The following are images sampled of the test set of the [hammer888/captcha-data](https://huggingface.co/datasets/hammer888/captcha-data) dataset. Click any image below to download it and test the model locally.
64
+
65
+ <div align="center">
66
+ <table>
67
+ <tr>
68
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/46CN5W.jpg"><img src="images/46CN5W.jpg" width="120"/></a></td>
69
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/5820.jpg"><img src="images/5820.jpg" width="120"/></a></td>
70
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/6521.jpg"><img src="images/6521.jpg" width="120"/></a></td>
71
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/abfsh.jpg"><img src="images/abfsh.jpg" width="120"/></a></td>
72
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/67qas.jpg"><img src="images/67qas.jpg" width="120"/></a></td>
73
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/75ke.jpg"><img src="images/75ke.jpg" width="120"/></a></td>
74
+ </tr>
75
+ <tr>
76
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/8JKM.jpg"><img src="images/8JKM.jpg" width="120"/></a></td>
77
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/8jpwt0.jpg"><img src="images/8jpwt0.jpg" width="120"/></a></td>
78
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/B1QAZ6.jpg"><img src="images/B1QAZ6.jpg" width="120"/></a></td>
79
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/CCX8.jpg"><img src="images/CCX8.jpg" width="120"/></a></td>
80
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/EPOD.jpg"><img src="images/EPOD.jpg" width="120"/></a></td>
81
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/ER6Y.jpg"><img src="images/ER6Y.jpg" width="120"/></a></td>
82
+ </tr>
83
+ <tr>
84
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/EWSP.jpg"><img src="images/EWSP.jpg" width="120"/></a></td>
85
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/GIOGp.jpg"><img src="images/GIOGp.jpg" width="120"/></a></td>
86
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/HCDS.jpg"><img src="images/HCDS.jpg" width="120"/></a></td>
87
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/JBWkEs.jpg"><img src="images/JBWkEs.jpg" width="120"/></a></td>
88
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/kJtOfk.jpg"><img src="images/kJtOfk.jpg" width="120"/></a></td>
89
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/MFMH.jpg"><img src="images/MFMH.jpg" width="120"/></a></td>
90
+ </tr>
91
+ <tr>
92
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/NJSEX.jpg"><img src="images/NJSEX.jpg" width="120"/></a></td>
93
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/R6AB.jpg"><img src="images/R6AB.jpg" width="120"/></a></td>
94
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/TVHF.jpg"><img src="images/TVHF.jpg" width="120"/></a></td>
95
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/Vb4cG.jpg"><img src="images/Vb4cG.jpg" width="120"/></a></td>
96
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/XaNqQx.jpg"><img src="images/XaNqQx.jpg" width="120"/></a></td>
97
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/YULM.jpg"><img src="images/YULM.jpg" width="120"/></a></td>
98
+ </tr>
99
+ <tr>
100
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/b6yc.jpg"><img src="images/b6yc.jpg" width="120"/></a></td>
101
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/bCWaLR.jpg"><img src="images/bCWaLR.jpg" width="120"/></a></td>
102
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/d3no.jpg"><img src="images/d3no.jpg" width="120"/></a></td>
103
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/3eplzv.jpg"><img src="images/3eplzv.jpg" width="120"/></a></td>
104
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/iq1sZo.jpg"><img src="images/iq1sZo.jpg" width="120"/></a></td>
105
+ <td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/KKh8Q.jpg"><img src="images/KKh8Q.jpg" width="120"/></a></td>
106
+ </tr>
107
+ </table>
108
+ </div>
109
+
110
+ ---
111
+
112
+ ## 🚀 Quick Start (Pipeline - Recommended)
113
+
114
+ The easiest way to perform inference is using the custom Hugging Face pipeline.
115
+
116
+ ```python
117
+ from transformers import pipeline
118
+ from PIL import Image
119
+
120
+ # Initialize the pipeline
121
+ pipe = pipeline(
122
+ task="captcha-recognition",
123
+ model="Graf-J/captcha-conv-transformer-base",
124
+ trust_remote_code=True
125
+ )
126
+
127
+ # Load and predict
128
+ img = Image.open("path/to/image.png")
129
+ result = pipe(img)
130
+ print(f"Decoded Text: {result['prediction']}")
131
+
132
+ ```
133
+
134
+ ## 🔬 Advanced Usage (Raw Logits & Custom Decoding)
135
+
136
+ Use this method if you need access to the raw logits or internal hidden states.
137
+
138
+ ```python
139
+ import torch
140
+ from PIL import Image
141
+ from transformers import AutoModel, AutoProcessor
142
+
143
+ # Load Model & Custom Processor
144
+ repo_id = "Graf-J/captcha-conv-transformer-base"
145
+ processor = AutoProcessor.from_pretrained(repo_id, trust_remote_code=True)
146
+ model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
147
+
148
+ model.eval()
149
+
150
+ # Load and process image
151
+ img = Image.open("path/to/image.png")
152
+ inputs = processor(img)
153
+
154
+ # Inference
155
+ with torch.no_grad():
156
+ outputs = model(inputs["pixel_values"])
157
+ logits = outputs.logits
158
+
159
+ # Decode the prediction via CTC logic
160
+ prediction = processor.batch_decode(logits)[0]
161
+ print(f"Prediction: '{prediction}'")
162
+
163
+ ```
164
+
165
+ ---
166
+
167
+ ## ⚙️ Training
168
+ The base model was trained on a refined version of the [hammer888/captcha-data](https://huggingface.co/datasets/hammer888/captcha-data) (1,365,874 images). This dataset underwent a specialized cleaning process where multiple pre-trained models were used to identify and prune inconsistent data. Specifically, images where models were "confidently incorrect" regarding casing (upper/lower-case errors) were removed to ensure high-fidelity ground truth for the final training run.
169
+
170
+ ### **Parameters**
171
+ - **Optimizer:** Adam (lr=0.0005)
172
+ - **Scheduler:** ReduceLROnPlateau (factor=0.5, patience=3)
173
+ - **Batch Size:** 128
174
+ - **Loss Function:** CTCLoss
175
+ - **Augmentations:** ElasticTransform, Random Rotation, Grayscale Resize
176
+
177
+ ---
178
+
179
+ ## 🔍 Error Analysis
180
+
181
+ The following confusion matrices illustrate the character-level performance across the alphanumeric vocabulary for the test dataset of the images generated via Python.
182
+
183
+ ### **Full Confusion Matrix**
184
+ ![Full-Confusion-Matrix](images/confusion-matrix.png)
185
+
186
+ ### **Misclassification Deep Dive**
187
+
188
+ This matrix highlights only the misclassification patterns, stripping away correct predictions to visualize which character pairs (such as '0' vs 'O' or '1' vs 'l') the model most frequently confuses. While the dataset underwent a specialized cleaning process to minimize noisy labels, the confusion matrix reveals a residual pattern of misclassifications between visually similar upper and lowercase characters.
189
+
190
+ ![Full-Confusion-Matrix](images/confusion-matrix-no-diagonal.png)
191
+
192
+ ---
193
+
194
+ ## ⚖️ **License & Citation**
195
+
196
+ This project is licensed under the **MIT License**. If you use this model in your research, portfolio, or applications, please attribute the author.
197
+
198
+
199
+
200
+
201
+
202
+
203
+
204
+
205
+
206
+
207
+
208
+
209
+
210
+
211
+
212
+
__pycache__/configuration_captcha.cpython-313.pyc ADDED
Binary file (1.11 kB). View file
 
__pycache__/modeling_captcha.cpython-313.pyc ADDED
Binary file (5.05 kB). View file
 
__pycache__/processing_captcha.cpython-313.pyc ADDED
Binary file (3.1 kB). View file
 
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "CaptchaConvolutionalTransformer"
4
+ ],
5
+ "d_model": 1280,
6
+ "dim_feedforward": 2048,
7
+ "dropout": 0.1,
8
+ "dtype": "float32",
9
+ "model_type": "captcha_convolutional_transformer",
10
+ "nhead": 8,
11
+ "num_chars": 63,
12
+ "num_layers": 1,
13
+ "transformers_version": "5.1.0",
14
+ "auto_map": {
15
+ "AutoConfig": "configuration_captcha.CaptchaConfig",
16
+ "AutoModel": "modeling_captcha.CaptchaConvolutionalTransformer",
17
+ "AutoProcessor": "processing_captcha.CaptchaProcessor"
18
+ },
19
+ "custom_pipelines": {
20
+ "captcha-recognition": {
21
+ "impl": "pipeline.CaptchaPipeline",
22
+ "pt": ["AutoModel"],
23
+ "type": "multimodal"
24
+ }
25
+ }
26
+ }
configuration_captcha.py ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from transformers import PretrainedConfig
2
+
3
+ class CaptchaConfig(PretrainedConfig):
4
+ model_type = "captcha_convolutional_transformer"
5
+ def __init__(
6
+ self,
7
+ num_chars=63,
8
+ d_model=1280,
9
+ nhead=8,
10
+ num_layers=1,
11
+ dim_feedforward=2048,
12
+ dropout=0.1,
13
+ **kwargs
14
+ ):
15
+ self.num_chars = num_chars
16
+ self.d_model = d_model
17
+ self.nhead = nhead
18
+ self.num_layers = num_layers
19
+ self.dim_feedforward = dim_feedforward
20
+ self.dropout = dropout
21
+ super().__init__(**kwargs)
images/3eplzv.jpg ADDED
images/46CN5W.jpg ADDED
images/5820.jpg ADDED
images/6521.jpg ADDED
images/67qas.jpg ADDED
images/75ke.jpg ADDED
images/8JKM.jpg ADDED
images/8jpwt0.jpg ADDED
images/B1QAZ6.jpg ADDED
images/CAPTCHA.png ADDED
images/CCX8.jpg ADDED
images/EPOD.jpg ADDED
images/ER6Y.jpg ADDED
images/EWSP.jpg ADDED
images/GIOGp.jpg ADDED
images/HCDS.jpg ADDED
images/JBWkEs.jpg ADDED
images/KKh8Q.jpg ADDED
images/MFMH.jpg ADDED
images/NJSEX.jpg ADDED
images/R6AB.jpg ADDED
images/TVHF.jpg ADDED
images/Vb4cG.jpg ADDED
images/XaNqQx.jpg ADDED
images/YULM.jpg ADDED
images/abfsh.jpg ADDED
images/b6yc.jpg ADDED
images/bCWaLR.jpg ADDED
images/confusion-matrix-no-diagonal.png ADDED
images/confusion-matrix.png ADDED
images/d3no.jpg ADDED
images/iq1sZo.jpg ADDED
images/kJtOfk.jpg ADDED
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:83da1e97cd334935d9c29b25d64d87fa1c993ab16966512f00710392d4f8bfc9
3
+ size 51685900
modeling_captcha.py ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import math
2
+ import torch
3
+ import torch.nn as nn
4
+ from transformers import PreTrainedModel
5
+ from transformers.modeling_outputs import SequenceClassifierOutput
6
+ from .configuration_captcha import CaptchaConfig
7
+
8
+ class PositionalEncoding(nn.Module):
9
+ def __init__(self, d_model, max_len=500):
10
+ super().__init__()
11
+ pe = torch.zeros(max_len, d_model)
12
+ position = torch.arange(0, max_len).unsqueeze(1)
13
+ div_term = torch.exp(
14
+ torch.arange(0, d_model, 2) * (-math.log(10000.0) / d_model)
15
+ )
16
+ pe[:, 0::2] = torch.sin(position * div_term)
17
+ pe[:, 1::2] = torch.cos(position * div_term)
18
+ pe = pe.unsqueeze(0)
19
+ self.register_buffer("pe", pe)
20
+
21
+ def forward(self, x):
22
+ return x + self.pe[:, : x.size(1)]
23
+
24
+ class CaptchaConvolutionalTransformer(PreTrainedModel):
25
+ config_class = CaptchaConfig
26
+
27
+ def __init__(self, config):
28
+ super().__init__(config)
29
+
30
+ # CNN Feature Extractor
31
+ self.conv = nn.Sequential(
32
+ nn.Conv2d(1, 32, kernel_size=3, padding=1),
33
+ nn.BatchNorm2d(32),
34
+ nn.SiLU(),
35
+ nn.MaxPool2d(2, 2),
36
+
37
+ nn.Conv2d(32, 64, kernel_size=3, padding=1),
38
+ nn.BatchNorm2d(64),
39
+ nn.SiLU(),
40
+ nn.MaxPool2d(2, 2),
41
+
42
+ nn.Conv2d(64, 128, kernel_size=3, padding=1),
43
+ nn.BatchNorm2d(128),
44
+ nn.SiLU(),
45
+ nn.MaxPool2d(kernel_size=(2, 1)),
46
+
47
+ nn.Conv2d(128, 256, kernel_size=3, padding=1),
48
+ nn.BatchNorm2d(256),
49
+ nn.SiLU(),
50
+ )
51
+
52
+ # Positional Encoding
53
+ self.positional_encoding = PositionalEncoding(config.d_model)
54
+
55
+ # Transformer Encoder
56
+ encoder_layer = nn.TransformerEncoderLayer(
57
+ d_model=config.d_model,
58
+ nhead=config.nhead,
59
+ dim_feedforward=config.dim_feedforward,
60
+ dropout=config.dropout,
61
+ activation="gelu",
62
+ batch_first=True,
63
+ norm_first=True,
64
+ )
65
+
66
+ self.transformer = nn.TransformerEncoder(
67
+ encoder_layer,
68
+ num_layers=config.num_layers,
69
+ )
70
+
71
+ # Classification Head
72
+ self.classifier = nn.Linear(config.d_model, config.num_chars)
73
+
74
+ # Initialize weights and apply final processing
75
+ self.post_init()
76
+
77
+ def forward(self, pixel_values, labels=None):
78
+ """
79
+ pixel_values: (batch, 1, H, W)
80
+ """
81
+ # Extract features
82
+ x = self.conv(pixel_values) # (B, 256, H_final, W_final)
83
+
84
+ # Prepare sequence: Permute to (Batch, Width, Channels, Height)
85
+ x = x.permute(0, 3, 1, 2)
86
+ b, t, c, h = x.size()
87
+
88
+ # Flatten Channels and Height into the d_model dimension
89
+ x = x.reshape(b, t, c * h) # (B, T, d_model)
90
+
91
+ # Apply Transformer logic
92
+ x = self.positional_encoding(x)
93
+ x = self.transformer(x)
94
+
95
+ # Map to character logits
96
+ logits = self.classifier(x) # (B, T, num_chars)
97
+
98
+ # Return an output object
99
+ return SequenceClassifierOutput(logits=logits)
pipeline.py ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from transformers import Pipeline
2
+ import torch
3
+
4
+ class CaptchaPipeline(Pipeline):
5
+ def _sanitize_parameters(self, **kwargs):
6
+ return {}, {}, {}
7
+
8
+ def preprocess(self, image):
9
+ return self.processor(image)
10
+
11
+ def _forward(self, model_inputs):
12
+ with torch.no_grad():
13
+ outputs = self.model(model_inputs["pixel_values"])
14
+ return outputs
15
+
16
+ def postprocess(self, model_outputs):
17
+ logits = model_outputs.logits
18
+ prediction = self.processor.batch_decode(logits)[0]
19
+ return {"prediction": prediction}
processing_captcha.py ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import string
2
+ import torch
3
+ import torchvision.transforms.functional as F
4
+ from transformers.processing_utils import ProcessorMixin
5
+
6
+ class CaptchaProcessor(ProcessorMixin):
7
+ attributes = []
8
+ def __init__(self, vocab=None, **kwargs):
9
+ super().__init__(**kwargs)
10
+ self.vocab = vocab or (string.ascii_lowercase + string.ascii_uppercase + string.digits)
11
+ self.idx_to_char = {i + 1: c for i, c in enumerate(self.vocab)}
12
+ self.idx_to_char[0] = ""
13
+
14
+ def __call__(self, images):
15
+ """
16
+ Converts PIL images to the tensor format the CRNN expects.
17
+ """
18
+ if not isinstance(images, list):
19
+ images = [images]
20
+
21
+ processed_images = []
22
+ for img in images:
23
+ # Convert to Grayscale
24
+ img = img.convert("L")
25
+ # Resize to your model's expected input (Width, Height)
26
+ img = img.resize((150, 40))
27
+ # Convert to Tensor and Scale to [0, 1]
28
+ img_tensor = F.to_tensor(img)
29
+ processed_images.append(img_tensor)
30
+
31
+ return {"pixel_values": torch.stack(processed_images)}
32
+
33
+ def batch_decode(self, logits):
34
+ """
35
+ CTC decoding logic.
36
+ """
37
+ tokens = torch.argmax(logits, dim=-1)
38
+ if len(tokens.shape) == 1:
39
+ tokens = tokens.unsqueeze(0)
40
+
41
+ decoded_strings = []
42
+ for batch_item in tokens:
43
+ char_list = []
44
+ for i in range(len(batch_item)):
45
+ token = batch_item[i].item()
46
+ if token != 0:
47
+ if i > 0 and batch_item[i] == batch_item[i - 1]:
48
+ continue
49
+ char_list.append(self.idx_to_char.get(token, ""))
50
+ decoded_strings.append("".join(char_list))
51
+ return decoded_strings
processor_config.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "processor_class": "CaptchaProcessor",
3
+ "vocab": "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789",
4
+ "auto_map": {
5
+ "AutoProcessor": "processing_captcha.CaptchaProcessor"
6
+ }
7
+ }