---
license: apache-2.0
language:
  - en
tags:
  - chemistry
  - ocr
  - chemical-structure
  - document-understanding
  - image-to-text
  - vision-language-model
  - patent-analysis
  - smoldocling
  - idefics3
library_name: transformers
pipeline_tag: image-to-text
base_model: ds4sd/SmolDocling-256M-preview
---

<p align="center">
  <img src="ChemicalOCR_banner.png" alt="ChemicalOCR Banner" width="100%">
</p>


**ChemicalOCR** is a compact vision-language model fine-tuned specifically for optical character recognition (OCR) in chemical structure images. It extracts text and bounding boxes from molecular drawings, enabling the recognition of atom labels, abbreviations, and descriptive text within chemical structures.

## Overview

ChemicalOCR is fine-tuned from [SmolDocling](https://huggingface.co/ds4sd/SmolDocling-256M-preview), a lightweight 256M-parameter model originally developed for end-to-end document conversion. By specializing for chemical images, ChemicalOCR substantially outperforms general-purpose OCR systems on chemical structure recognition tasks.

### What it does

- Extracts **character-level text and bounding boxes** from chemical structure images
- Recognizes **atom labels**, **chemical abbreviations**, and **descriptive text** near or below molecular structures

## Architecture

- **Base model**: SmolDocling-256M (Idefics3ForConditionalGeneration)
- **Parameters**: 256M
- **Input**: Chemical structure image
- **Output**: Text detections with bounding boxes

## Performance

ChemicalOCR substantially outperforms existing OCR models on chemical structure benchmarks:

| Benchmark | ChemicalOCR (Ours) | PaddleOCR v5 | EasyOCR |
|---|---|---|---|
| **M2S** (103 images) — F1@IoU₀.₅ | **87.2** | 7.7 | 10.2 |
| **USPTO-M** (74 images) — F1@IoU₀.₅ | **93.0** | 1.2 | 18.0 |
| **IP5-M** (1000 images) — F1@IoU₀.₅ | **86.5** | 1.9 | 18.4 |

### Detailed Metrics

| Benchmark | Precision | Recall | F1 | Accuracy@IoU₀.₅ |
|---|---|---|---|---|
| M2S | 86.9 | 87.4 | 87.2 | 32.0 |
| USPTO-M | 93.5 | 92.6 | 93.0 | 63.5 |
| IP5-M | 85.6 | 87.4 | 86.5 | 69.5 |

## Usage

```python
from transformers import AutoProcessor, AutoModelForVision2Seq
from PIL import Image

model = AutoModelForVision2Seq.from_pretrained("docling-project/ChemicalOCR")
processor = AutoProcessor.from_pretrained("docling-project/ChemicalOCR")

image = Image.open("chemical_structure.png")
inputs = processor(images=image, return_tensors="pt")
outputs = model.generate(**inputs)
result = processor.decode(outputs[0], skip_special_tokens=True)
```

## Part of MarkushGrapher-2

ChemicalOCR is a key component of [MarkushGrapher-2](https://huggingface.co/docling-project/MarkushGrapher-2), an end-to-end model for multimodal recognition of chemical structures from patent documents. In the MarkushGrapher-2 pipeline, ChemicalOCR provides the text and layout information that is fused with visual features for Markush structure recognition.


## Citation

```bibtex
@inproceedings{strohmeyer2026markushgrapher2,
  title     = {MarkushGrapher-2: End-to-end Multimodal Recognition of Chemical Structures},
  author    = {Strohmeyer, Tim and Morin, Lucas and Meijer, Gerhard Ingmar and Weber, Valery and Nassar, Ahmed and Staar, Peter W. J.},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}
```

## License

This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).