FlowFigTabMiner YOLO Models
Five custom-trained YOLOv11 object detection models for extracting structured data from scientific flow chemistry publications. These models are core components of the FlowFigTabMiner pipeline.
Models
| Directory | Backbone | Task | Classes | Input Size |
|---|---|---|---|---|
fig-seg/ |
YOLOv11m | Figure macro segmentation | 6: caption, legend, subfigure_marker, target_image, x_axis_title, y_axis_title | 1024 px |
fig-sca/ |
YOLOv11m | Figure micro detection (scatter) | 4: data_point, data_value, x_tick_label, y_tick_label | 1024 px |
tab-seg/ |
YOLOv11m | Table segmentation | 4: table_body, table_caption, table_note, table_scheme | 1024 px |
tab-mol/ |
YOLOv11s | Molecular structure detection | 1: Structure | 1024 px |
tab-scheme-seg/ |
YOLOv11n | Reaction scheme segmentation | 4: arrow, molecule, table-condition, table-mark | 1024 px |
Performance
| Model | Precision | Recall | F1 | mAP50 | mAP50-95 |
|---|---|---|---|---|---|
| fig-seg | 88.2% | 91.7% | 89.9% | 91.7% | 66.9% |
| fig-sca | 92.1% | 91.9% | 92.0% | 92.5% | 69.3% |
| tab-seg | 91.2% | 89.9% | 90.5% | 94.8% | 77.7% |
| tab-mol | 96.2% | 95.4% | 95.8% | 95.6% | 81.2% |
| tab-scheme-seg | 93.9% | 94.3% | 94.1% | 96.5% | 72.1% |
tab-scheme-seg Per-Class Performance
| Class | Images | Instances | Precision | Recall | mAP50 | mAP50-95 |
|---|---|---|---|---|---|---|
| arrow | 16 | 19 | 100% | 88.4% | 93.7% | 63.4% |
| molecule | 17 | 67 | 97.0% | 96.8% | 99.1% | 91.1% |
| table-condition | 15 | 31 | 85.5% | 95.1% | 96.1% | 73.7% |
| table-mark | 15 | 69 | 93.0% | 96.7% | 97.1% | 60.1% |
Training Details
All models were trained on manually annotated images from flow chemistry publications.
| Model | Epochs | Batch Size | Optimizer | LR | Key Augmentation |
|---|---|---|---|---|---|
| fig-seg | 150 | 32 | AdamW | 0.01 | mosaic=0, mixup=0 |
| fig-sca | 200 | 24 | AdamW | 0.01 | mosaic=0, mixup=0.15 |
| tab-seg | 200 | 24 | AdamW | 0.001 | mosaic=0.5, mixup=0.15 |
| tab-mol | 200 | 32 | AdamW | 0.001 | mosaic=0.5, cos_lr=True |
| tab-scheme-seg | 300 | auto (-1) | AdamW | 0.0005 | mosaic=1.0, mixup=0.1, rect=True |
Full training configurations are provided in args.yaml within each model directory.
Usage
from ultralytics import YOLO
# Load a model
model = YOLO("fig-seg/best.pt")
# Run inference on a figure image
results = model("path/to/figure.png", imgsz=1024)
# Access detections
for box in results[0].boxes:
cls = int(box.cls)
conf = float(box.conf)
xyxy = box.xyxy[0].tolist()
print(f"Class: {cls}, Confidence: {conf:.3f}, Box: {xyxy}")
Pipeline Architecture
These models work together in the FlowFigTabMiner pipeline:
- fig-seg isolates the chart region from captions, legends, and axis titles
- fig-sca detects data points and tick labels within the cleaned chart
- Coordinate mapping converts pixel positions to physical values using OCR on tick labels
- tab-seg separates table body, caption, footnotes, and reaction schemes
- tab-mol detects molecular structure images for SMILES conversion via MolNexTR
- tab-scheme-seg segments reaction scheme diagrams into arrows, molecules, and conditions
Other Models in FlowFigTabMiner (not included here)
These third-party pretrained models are also used in the pipeline and should be obtained from their original sources:
| Model | Source | Purpose |
|---|---|---|
| TF-ID (Florence-2-base) | yifeihu/TF-ID-base | Figure/table detection in PDF pages |
| Table Transformer (TATR) | microsoft/table-transformer-structure-recognition-v1.1-all | Table row/column/header detection |
| MolNexTR | CYF200127/MolNexTR | Molecular image to SMILES conversion |
| PaddleOCR | PaddlePaddle/PaddleOCR | Text recognition (PP-OCRv4/v5) |
Citation
If you use these models, please cite:
@article{zhao2025flowfigtabminer,
title={FlowFigTabMiner: Multimodal Extraction of Structured Flow Chemistry Data from Figures, Tables, and Text Enables Organolithium Lifetime Prediction},
author={Zhao, Wenyuan and Zhong, Xianzhu and Wang, Simeng and Nagaki, Aiichiro},
year={2025}
}
License
Apache 2.0
- Downloads last month
- 14