Graf-J commited on Feb 23

Commit

cba2240

verified ·

1 Parent(s): ca6d2c3

Initial Commit

Browse files

Files changed (45) hide show

README.md +211 -0
__pycache__/configuration_captcha.cpython-313.pyc +0 -0
__pycache__/modeling_captcha.cpython-313.pyc +0 -0
__pycache__/processing_captcha.cpython-313.pyc +0 -0
config.json +21 -0
configuration_captcha.py +7 -0
images/3eplzv.jpg +0 -0
images/46CN5W.jpg +0 -0
images/5820.jpg +0 -0
images/6521.jpg +0 -0
images/67qas.jpg +0 -0
images/75ke.jpg +0 -0
images/8JKM.jpg +0 -0
images/8jpwt0.jpg +0 -0
images/B1QAZ6.jpg +0 -0
images/CAPTCHA.png +0 -0
images/CCX8.jpg +0 -0
images/EPOD.jpg +0 -0
images/ER6Y.jpg +0 -0
images/EWSP.jpg +0 -0
images/GIOGp.jpg +0 -0
images/HCDS.jpg +0 -0
images/JBWkEs.jpg +0 -0
images/KKh8Q.jpg +0 -0
images/MFMH.jpg +0 -0
images/NJSEX.jpg +0 -0
images/R6AB.jpg +0 -0
images/TVHF.jpg +0 -0
images/Vb4cG.jpg +0 -0
images/XaNqQx.jpg +0 -0
images/YULM.jpg +0 -0
images/abfsh.jpg +0 -0
images/b6yc.jpg +0 -0
images/bCWaLR.jpg +0 -0
images/confusion-matrix-no-diagonal.png +0 -0
images/confusion-matrix.png +0 -0
images/d3no.jpg +0 -0
images/iq1sZo.jpg +0 -0
images/kJtOfk.jpg +0 -0
images/prediction.png +0 -0
model.safetensors +3 -0
modeling_captcha.py +41 -0
pipeline.py +19 -0
processing_captcha.py +51 -0
processor_config.json +7 -0

README.md ADDED Viewed

	@@ -0,0 +1,211 @@

+---
+tags:
+  - ocr
+  - pytorch
+license: mit
+datasets:
+  - hammer888/captcha-data
+metrics:
+  - accuracy
+  - cer
+pipeline_tag: image-to-text
+library_name: transformers
+---
+<div align="center">
+# ✨ DeepCaptcha-CRNN: Sequential Vision for OCR
+### CRNN Base
+[![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
+[![Python 3.13+](https://img.shields.io/badge/python-3.13+-blue.svg)](https://www.python.org/downloads/release/python-3130/)
+[![Hugging Face Model](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-orange)](https://huggingface.co/Graf-J/captcha-crnn-finetuned)
+---
+<img src="images/CAPTCHA.png" alt="Captcha Example" width="500">
+*Advanced sequence recognition using a Convolutional Recurrent Neural Network (CRNN) with Connectionist Temporal Classification (CTC) loss.*
+</div>
+---
+## 📋 Model Details
+- **Task:** Alphanumeric Captcha Recognition
+- **Input:** Images
+- **Output:** String sequences (Length 1–8 characters)
+- **Vocabulary:** Alphanumeric (`a-z`, `A-Z`, `0-9`)
+- **Architecture:** CRNN (CNN + Bi-LSTM)
+---
+## 📊 Performance Metrics
+### **Test Set Results**
+| Dataset | Sequence Accuracy | Character Error Rate (CER) |
+| --- | --- | --- |
+| **[hammer888/captcha-data](https://huggingface.co/datasets/hammer888/captcha-data)** | `96.81%` | `0.70%` |
+### **Hardware & Efficiency**
+| Metric | Value |
+| --- | --- |
+| **Model Parameters** | `3,570,943` |
+| **Model Size (Disk)** | `14.3 MB` |
+| **Throughput (Images/sec)** | `447.26 – 467.29` |
+| **Compute Hardware** | **NVIDIA RTX A6000** |
+---
+## 🧪 Try It With Sample Images
+The following are images sampled of the test set of the [hammer888/captcha-data](https://huggingface.co/datasets/hammer888/captcha-data) dataset. Click any image below to download it and test the model locally.
+<div align="center">
+<table>
+<tr>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/46CN5W.jpg"><img src="images/46CN5W.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/5820.jpg"><img src="images/5820.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/6521.jpg"><img src="images/6521.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/abfsh.jpg"><img src="images/abfsh.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/67qas.jpg"><img src="images/67qas.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/75ke.jpg"><img src="images/75ke.jpg" width="120"/></a></td>
+</tr>
+<tr>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/8JKM.jpg"><img src="images/8JKM.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/8jpwt0.jpg"><img src="images/8jpwt0.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/B1QAZ6.jpg"><img src="images/B1QAZ6.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/CCX8.jpg"><img src="images/CCX8.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/EPOD.jpg"><img src="images/EPOD.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/ER6Y.jpg"><img src="images/ER6Y.jpg" width="120"/></a></td>
+</tr>
+<tr>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/EWSP.jpg"><img src="images/EWSP.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/GIOGp.jpg"><img src="images/GIOGp.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/HCDS.jpg"><img src="images/HCDS.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/JBWkEs.jpg"><img src="images/JBWkEs.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/kJtOfk.jpg"><img src="images/kJtOfk.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/MFMH.jpg"><img src="images/MFMH.jpg" width="120"/></a></td>
+</tr>
+<tr>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/NJSEX.jpg"><img src="images/NJSEX.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/R6AB.jpg"><img src="images/R6AB.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/TVHF.jpg"><img src="images/TVHF.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/Vb4cG.jpg"><img src="images/Vb4cG.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/XaNqQx.jpg"><img src="images/XaNqQx.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/YULM.jpg"><img src="images/YULM.jpg" width="120"/></a></td>
+</tr>
+<tr>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/b6yc.jpg"><img src="images/b6yc.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/bCWaLR.jpg"><img src="images/bCWaLR.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/d3no.jpg"><img src="images/d3no.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/3eplzv.jpg"><img src="images/3eplzv.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/iq1sZo.jpg"><img src="images/iq1sZo.jpg" width="120"/></a></td>
+<td><a href="https://huggingface.co/Graf-J/captcha-crnn-finetuned/resolve/main/images/KKh8Q.jpg"><img src="images/KKh8Q.jpg" width="120"/></a></td>
+</tr>
+</table>
+</div>
+---
+## 🚀 Quick Start (Pipeline - Recommended)
+The easiest way to perform inference is using the custom Hugging Face pipeline.
+```python
+from transformers import pipeline
+from PIL import Image
+# Initialize the pipeline
+pipe = pipeline(
+    task="captcha-recognition",
+    model="Graf-J/captcha-crnn-base",
+    trust_remote_code=True
+)
+# Load and predict
+img = Image.open("path/to/image.png")
+result = pipe(img)
+print(f"Decoded Text: {result['prediction']}")
+```
+## 🔬 Advanced Usage (Raw Logits & Custom Decoding)
+Use this method if you need access to the raw logits or internal hidden states.
+```python
+import torch
+from PIL import Image
+from transformers import AutoModel, AutoProcessor
+# Load Model & Custom Processor
+repo_id = "Graf-J/captcha-crnn-base"
+processor = AutoProcessor.from_pretrained(repo_id, trust_remote_code=True)
+model = AutoModel.from_pretrained(repo_id, trust_remote_code=True)
+model.eval()
+# Load and process image
+img = Image.open("path/to/image.png")
+inputs = processor(img)
+# Inference
+with torch.no_grad():
+    outputs = model(inputs["pixel_values"])
+    logits = outputs.logits
+# Decode the prediction via CTC logic
+prediction = processor.batch_decode(logits)[0]
+print(f"Prediction: '{prediction}'")
+```
+---
+## ⚙️ Training
+The base model was trained on a refined version of the [hammer888/captcha-data](https://huggingface.co/datasets/hammer888/captcha-data) (1,365,874 images). This dataset underwent a specialized cleaning process where multiple pre-trained models were used to identify and prune inconsistent data. Specifically, images where models were "confidently incorrect" regarding casing (upper/lower-case errors) were removed to ensure high-fidelity ground truth for the final training run.
+### **Parameters**
+- **Optimizer:** Adam (lr=0.002)
+- **Scheduler:** ReduceLROnPlateau (factor=0.5, patience=3)
+- **Batch Size:** 128
+- **Loss Function:** CTCLoss
+- **Augmentations:** ElasticTransform, Random Rotation, Grayscale Resize
+---
+## 🔍 Error Analysis
+The following confusion matrices illustrate the character-level performance across the alphanumeric vocabulary for the test dataset of the images generated via Python.
+### **Full Confusion Matrix**
+![Full-Confusion-Matrix](images/confusion-matrix.png)
+### **Misclassification Deep Dive**
+This matrix highlights only the misclassification patterns, stripping away correct predictions to visualize which character pairs (such as '0' vs 'O' or '1' vs 'l') the model most frequently confuses.
+![Full-Confusion-Matrix](images/confusion-matrix-no-diagonal.png)
+---
+## ⚖️ **License & Citation**
+This project is licensed under the **MIT License**. If you use this model in your research, portfolio, or applications, please attribute the author.

__pycache__/configuration_captcha.cpython-313.pyc ADDED Viewed

Binary file (806 Bytes). View file

__pycache__/modeling_captcha.cpython-313.pyc ADDED Viewed

Binary file (2.92 kB). View file

__pycache__/processing_captcha.cpython-313.pyc ADDED Viewed

Binary file (3.08 kB). View file

config.json ADDED Viewed

	@@ -0,0 +1,21 @@

+{
+  "architectures": [
+    "CaptchaCRNN"
+  ],
+  "dtype": "float32",
+  "model_type": "captcha_crnn",
+  "num_chars": 63,
+  "transformers_version": "5.1.0",
+  "auto_map": {
+    "AutoConfig": "configuration_captcha.CaptchaConfig",
+    "AutoModel": "modeling_captcha.CaptchaCRNN",
+    "AutoProcessor": "processing_captcha.CaptchaProcessor"
+  },
+  "custom_pipelines": {
+    "captcha-recognition": {
+      "impl": "pipeline.CaptchaPipeline",
+      "pt": ["AutoModel"],
+      "type": "multimodal"
+    }
+  }
+}

configuration_captcha.py ADDED Viewed

	@@ -0,0 +1,7 @@

+from transformers import PretrainedConfig
+class CaptchaConfig(PretrainedConfig):
+    model_type = "captcha_crnn"
+    def __init__(self, num_chars=63, **kwargs):
+        super().__init__(**kwargs)
+        self.num_chars = num_chars

images/3eplzv.jpg ADDED Viewed

images/46CN5W.jpg ADDED Viewed

images/5820.jpg ADDED Viewed

images/6521.jpg ADDED Viewed

images/67qas.jpg ADDED Viewed

images/75ke.jpg ADDED Viewed

images/8JKM.jpg ADDED Viewed

images/8jpwt0.jpg ADDED Viewed

images/B1QAZ6.jpg ADDED Viewed

images/CAPTCHA.png ADDED Viewed

images/CCX8.jpg ADDED Viewed

images/EPOD.jpg ADDED Viewed

images/ER6Y.jpg ADDED Viewed

images/EWSP.jpg ADDED Viewed

images/GIOGp.jpg ADDED Viewed

images/HCDS.jpg ADDED Viewed

images/JBWkEs.jpg ADDED Viewed

images/KKh8Q.jpg ADDED Viewed

images/MFMH.jpg ADDED Viewed

images/NJSEX.jpg ADDED Viewed

images/R6AB.jpg ADDED Viewed

images/TVHF.jpg ADDED Viewed

images/Vb4cG.jpg ADDED Viewed

images/XaNqQx.jpg ADDED Viewed

images/YULM.jpg ADDED Viewed

images/abfsh.jpg ADDED Viewed

images/b6yc.jpg ADDED Viewed

images/bCWaLR.jpg ADDED Viewed

images/confusion-matrix-no-diagonal.png ADDED Viewed

images/confusion-matrix.png ADDED Viewed

images/d3no.jpg ADDED Viewed

images/iq1sZo.jpg ADDED Viewed

images/kJtOfk.jpg ADDED Viewed

images/prediction.png ADDED Viewed

model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ff93abaec8ddbf2a5979a2c350178ba71c7bf8ca78873ab187a21bd2678df35b
+size 14290964

modeling_captcha.py ADDED Viewed

	@@ -0,0 +1,41 @@

+import torch
+import torch.nn as nn
+from transformers import PreTrainedModel
+from transformers.modeling_outputs import SequenceClassifierOutput
+from .configuration_captcha import CaptchaConfig
+class CaptchaCRNN(PreTrainedModel):
+    config_class = CaptchaConfig
+    def __init__(self, config):
+        super().__init__(config)
+        self.conv_layer = nn.Sequential(
+            nn.Conv2d(1, 32, kernel_size=3, padding=1),
+            nn.BatchNorm2d(32),
+            nn.SiLU(),
+            nn.MaxPool2d(2, 2),
+            nn.Conv2d(32, 64, kernel_size=3, padding=1),
+            nn.BatchNorm2d(64),
+            nn.SiLU(),
+            nn.MaxPool2d(2, 2),
+            nn.Conv2d(64, 128, kernel_size=3, padding=1),
+            nn.BatchNorm2d(128),
+            nn.SiLU(),
+            nn.MaxPool2d(kernel_size=(2, 1)),
+            nn.Conv2d(128, 256, kernel_size=3, padding=1),
+            nn.BatchNorm2d(256),
+            nn.SiLU()
+        )
+        self.lstm = nn.LSTM(input_size=1280, hidden_size=256, bidirectional=True, batch_first=True)
+        self.classifier = nn.Linear(512, config.num_chars)
+        self.post_init()
+    def forward(self, x, labels=None):
+        x = self.conv_layer(x)
+        x = x.permute(0, 3, 1, 2)
+        batch, width, channels, height = x.size()
+        x = x.view(batch, width, -1)
+        x, _ = self.lstm(x)
+        logits = self.classifier(x)
+        return SequenceClassifierOutput(logits=logits)

pipeline.py ADDED Viewed

	@@ -0,0 +1,19 @@

+from transformers import Pipeline
+import torch
+class CaptchaPipeline(Pipeline):
+    def _sanitize_parameters(self, **kwargs):
+        return {}, {}, {}
+    def preprocess(self, image):
+        return self.processor(image)
+    def _forward(self, model_inputs):
+        with torch.no_grad():
+            outputs = self.model(model_inputs["pixel_values"])
+        return outputs
+    def postprocess(self, model_outputs):
+        logits = model_outputs.logits
+        prediction = self.processor.batch_decode(logits)[0]
+        return {"prediction": prediction}

processing_captcha.py ADDED Viewed

	@@ -0,0 +1,51 @@

+import string
+import torch
+import torchvision.transforms.functional as F
+from transformers.processing_utils import ProcessorMixin
+class CaptchaProcessor(ProcessorMixin):
+    attributes = []
+    def __init__(self, vocab=None, **kwargs):
+        super().__init__(**kwargs)
+        self.vocab = vocab or (string.ascii_lowercase + string.ascii_uppercase + string.digits)
+        self.idx_to_char = {i + 1: c for i, c in enumerate(self.vocab)}
+        self.idx_to_char[0] = ""
+    def __call__(self, images):
+        """
+        Converts PIL images to the tensor format the CRNN expects.
+        """
+        if not isinstance(images, list):
+            images = [images]
+        processed_images = []
+        for img in images:
+            # Convert to Grayscale
+            img = img.convert("L")
+            # Resize to your model's expected input (Width, Height)
+            img = img.resize((150, 40))
+            # Convert to Tensor and Scale to [0, 1]
+            img_tensor = F.to_tensor(img)
+            processed_images.append(img_tensor)
+        return {"pixel_values": torch.stack(processed_images)}
+    def batch_decode(self, logits):
+        """
+        CTC decoding logic.
+        """
+        tokens = torch.argmax(logits, dim=-1)
+        if len(tokens.shape) == 1:
+            tokens = tokens.unsqueeze(0)
+        decoded_strings = []
+        for batch_item in tokens:
+            char_list = []
+            for i in range(len(batch_item)):
+                token = batch_item[i].item()
+                if token != 0:
+                    if i > 0 and batch_item[i] == batch_item[i - 1]:
+                        continue
+                    char_list.append(self.idx_to_char.get(token, ""))
+            decoded_strings.append("".join(char_list))
+        return decoded_strings

processor_config.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "processor_class": "CaptchaProcessor",
+  "vocab": "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789",
+  "auto_map": {
+    "AutoProcessor": "processing_captcha.CaptchaProcessor"
+  }
+}