| --- |
| license: apache-2.0 |
| language: |
| - en |
| metrics: |
| - accuracy |
| library_name: adapter-transformers |
| pipeline_tag: image-to-text |
| --- |
| # Model Card for Pixelated Captcha Digit Detection |
|
|
| ## Model Details |
|
|
| - **License:** Apache-2.0 |
| - **Developed by:** Saidi Souhaieb |
| - **Finetuned from model:** YOLOv8 |
|
|
| ## Uses |
|
|
| This model is designed to detect pixelated captcha digits by showing bounding boxes and extracting the coordinates of the detections. |
|
|
| ## How to Get Started with the Model |
|
|
| ```python |
| import torch |
| import torch.nn as nn |
| import torch.optim as optim |
| from torch.utils.data import DataLoader |
| import torchvision.transforms as transforms |
| from torchvision.datasets import ImageFolder |
| from tqdm import tqdm |
| from PIL import Image |
| import torch.nn.functional as F |
| import os |
| |
| class CNN(nn.Module): |
| def __init__(self): |
| super(CNN, self).__init__() |
| self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1) |
| self.conv2 = nn.Conv2d(16, 32, kernel_size=3, stride=1, padding=1) |
| self.conv3 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1) |
| self.pool = nn.MaxPool2d(2, 2) |
| self.fc1 = nn.Linear(64 * 4 * 4, 500) |
| self.fc2 = nn.Linear(500, 10) # 10 classes for example |
| |
| def forward(self, x): |
| x = self.pool(F.relu(self.conv1(x))) |
| x = self.pool(F.relu(self.conv2(x))) |
| x = self.pool(F.relu(self.conv3(x))) |
| x = x.view(-1, 64 * 4 * 4) |
| x = F.relu(self.fc1(x)) |
| x = self.fc2(x) |
| return x |
| |
| transform = transforms.Compose([ |
| transforms.Resize((32, 32)), # Adjust the size accordingly |
| transforms.ToTensor(), |
| transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) |
| ]) |
| |
| transform = transforms.Compose([ |
| transforms.Resize((32, 32)), # Adjust the size accordingly |
| transforms.ToTensor(), |
| transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) |
| ]) |
| |
| model = CNN() |
| model.load_state_dict(torch.load('models/99acc_model.pth')) |
| |
| def predict_number(folder_path): |
| """ |
| Predict the numbers in the images in the folder |
| """ |
| predict_numbers = [] |
| for file in os.listdir(folder_path): |
| input_image = Image.open(f"temp/{file}").convert('RGB') |
| # Load and preprocess the input image |
| input_tensor = transform(input_image) |
| input_batch = input_tensor.unsqueeze(0) # Add a batch dimension |
| |
| # Perform inference |
| with torch.no_grad(): |
| output = model(input_batch) |
| |
| # Get the predicted class label |
| _, predicted = torch.max(output, 1) |
| |
| # Print the predicted class label |
| print("Predicted class label:", predicted.item(), "file", file) |
| predict_numbers.append(predicted.item()) |
| |
| return predict_numbers |
| |
| ``` |
|
|
| ## Training Details |
|
|
| ### Training Data |
|
|
| Pixel Digit Captcha Data [] |
|
|
| ## Model Card Authors |
|
|
| [Saidi Souhaieb] |