parseq-GGUF / README.md
cstr's picture
Upload README.md with huggingface_hub
ba808dd verified
|
Raw
History Blame Contribute Delete
2.09 kB
---
license: apache-2.0
tags:
- gguf
- ocr
- scene-text
- parseq
- crispembed
base_model: baudm/parseq
---
# PARSeq — Scene Text Recognition (GGUF)
GGUF conversions of [PARSeq](https://github.com/baudm/parseq) (ECCV 2022) for use with [CrispEmbed](https://github.com/CrispStrobe/CrispEmbed).
PARSeq is a scene text recognition model that reads text from natural images (signs, labels, documents). It recognizes 94 printable ASCII characters (digits, letters, punctuation).
## Architecture
- **Encoder**: 12-layer pre-LN ViT (patch 4×8, input 32×128 RGB, 128 tokens, GELU FFN)
- **Decoder**: 1-layer two-stream Transformer (XLNet-style position queries + context self-attention, then cross-attention to encoder memory)
- **Head**: Linear → 95 classes (94 printable ASCII chars + EOS)
- **Inference**: Autoregressive greedy decode (max 25 characters)
## Variants
| File | Variant | Params | Size | Notes |
|------|---------|--------|------|-------|
| `parseq-f32.gguf` | Base | 24M | 91 MB | Full precision |
| `parseq-q8_0.gguf` | Base | 24M | 24 MB | Best quantized |
| `parseq-q4_k.gguf` | Base | 24M | 13 MB | Smallest base |
| `parseq-tiny-f16.gguf` | Tiny | 6M | 12 MB | Half precision |
| `parseq-tiny-q8_0.gguf` | Tiny | 6M | 6 MB | Smallest overall |
All quantization levels produce identical output on test images.
## Usage
```bash
# CLI
crispembed -m parseq-q8_0.gguf --ocr image.png
# Auto-download
crispembed -m parseq --auto-download --ocr image.png
```
```python
from crispembed import CrispMathOcr
ocr = CrispMathOcr("parseq-q8_0.gguf")
text = ocr.recognize("sign.png")
```
## Benchmark (94-char, PARSeq-base)
| Dataset | Accuracy |
|---------|----------|
| IIIT5k | 99.1% |
| SVT | 97.9% |
| IC13-1015 | 98.1% |
| IC15-2077 | 89.2% |
| SVTP | 96.9% |
| CUTE80 | 98.6% |
## Source
- Paper: [Scene Text Recognition with Permuted Autoregressive Sequence Models](https://arxiv.org/abs/2207.06966) (ECCV 2022)
- Code: [baudm/parseq](https://github.com/baudm/parseq) (Apache-2.0)
- Converted with `models/convert-parseq-to-gguf.py` from CrispEmbed