---
tags:
- ocr
- pytorch
license: mit
datasets:
- hammer888/captcha-data
metrics:
- accuracy
- cer
pipeline_tag: image-to-text
library_name: transformers
---
# โจ DeepCaptcha-Conv-Transformer: Sequential Vision for OCR
### Convolutional Transformer Base
[](https://opensource.org/licenses/MIT)
[](https://www.python.org/downloads/release/python-3130/)
[](https://huggingface.co/Graf-J/captcha-crnn-finetuned)
---

*Advanced sequence recognition using a Convolutional Transformer Encoder with Connectionist Temporal Classification (CTC) loss.*
---
## ๐ Model Details
- **Task:** Alphanumeric Captcha Recognition
- **Input:** Images
- **Output:** String sequences (Length 1โ8 characters)
- **Vocabulary:** Alphanumeric (`a-z`, `A-Z`, `0-9`)
- **Architecture:** Convolutional Transformer Encoder (CNN + Transformer Encoder)
---
## ๐ Performance Metrics
This project features four models exploring the trade-offs between recurrent (LSTM) and attention-based (Transformer) architectures, as well as the effects of fine-tuning on capchas generated by the [Python Captcha Library](https://captcha.lepture.com/).
| Metric | **CRNN (Base)** | **CRNN (Finetuned)** | **Conv-Transformer (Base)** | **Conv-Transformer (Finetuned)** |
|--------|-----------------|----------------------|-----------------------------|----------------------------------|
| Architecture | CRNN | CRNN | Convolutional Transformer | Convolutional Transformer |
| Training Data | [hammer888/captcha-data](https://huggingface.co/datasets/hammer888/captcha-data) | [hammer888/captcha-data](https://huggingface.co/datasets/hammer888/captcha-data)