FlowFigTabMiner YOLO Models

Five custom-trained YOLOv11 object detection models for extracting structured data from scientific flow chemistry publications. These models are core components of the FlowFigTabMiner pipeline.

Models

Directory	Backbone	Task	Classes	Input Size
`fig-seg/`	YOLOv11m	Figure macro segmentation	6: caption, legend, subfigure_marker, target_image, x_axis_title, y_axis_title	1024 px
`fig-sca/`	YOLOv11m	Figure micro detection (scatter)	4: data_point, data_value, x_tick_label, y_tick_label	1024 px
`tab-seg/`	YOLOv11m	Table segmentation	4: table_body, table_caption, table_note, table_scheme	1024 px
`tab-mol/`	YOLOv11s	Molecular structure detection	1: Structure	1024 px
`tab-scheme-seg/`	YOLOv11n	Reaction scheme segmentation	4: arrow, molecule, table-condition, table-mark	1024 px

Performance

Model	Precision	Recall	F1	mAP50	mAP50-95
fig-seg	88.2%	91.7%	89.9%	91.7%	66.9%
fig-sca	92.1%	91.9%	92.0%	92.5%	69.3%
tab-seg	91.2%	89.9%	90.5%	94.8%	77.7%
tab-mol	96.2%	95.4%	95.8%	95.6%	81.2%
tab-scheme-seg	93.9%	94.3%	94.1%	96.5%	72.1%

tab-scheme-seg Per-Class Performance

Class	Images	Instances	Precision	Recall	mAP50	mAP50-95
arrow	16	19	100%	88.4%	93.7%	63.4%
molecule	17	67	97.0%	96.8%	99.1%	91.1%
table-condition	15	31	85.5%	95.1%	96.1%	73.7%
table-mark	15	69	93.0%	96.7%	97.1%	60.1%

Training Details

All models were trained on manually annotated images from flow chemistry publications.

Model	Epochs	Batch Size	Optimizer	LR	Key Augmentation
fig-seg	150	32	AdamW	0.01	mosaic=0, mixup=0
fig-sca	200	24	AdamW	0.01	mosaic=0, mixup=0.15
tab-seg	200	24	AdamW	0.001	mosaic=0.5, mixup=0.15
tab-mol	200	32	AdamW	0.001	mosaic=0.5, cos_lr=True
tab-scheme-seg	300	auto (-1)	AdamW	0.0005	mosaic=1.0, mixup=0.1, rect=True

Full training configurations are provided in args.yaml within each model directory.

Usage

from ultralytics import YOLO

# Load a model
model = YOLO("fig-seg/best.pt")

# Run inference on a figure image
results = model("path/to/figure.png", imgsz=1024)

# Access detections
for box in results[0].boxes:
    cls = int(box.cls)
    conf = float(box.conf)
    xyxy = box.xyxy[0].tolist()
    print(f"Class: {cls}, Confidence: {conf:.3f}, Box: {xyxy}")

Pipeline Architecture

These models work together in the FlowFigTabMiner pipeline:

fig-seg isolates the chart region from captions, legends, and axis titles
fig-sca detects data points and tick labels within the cleaned chart
Coordinate mapping converts pixel positions to physical values using OCR on tick labels
tab-seg separates table body, caption, footnotes, and reaction schemes
tab-mol detects molecular structure images for SMILES conversion via MolNexTR
tab-scheme-seg segments reaction scheme diagrams into arrows, molecules, and conditions

Other Models in FlowFigTabMiner (not included here)

These third-party pretrained models are also used in the pipeline and should be obtained from their original sources:

Model	Source	Purpose
TF-ID (Florence-2-base)	yifeihu/TF-ID-base	Figure/table detection in PDF pages
Table Transformer (TATR)	microsoft/table-transformer-structure-recognition-v1.1-all	Table row/column/header detection
MolNexTR	CYF200127/MolNexTR	Molecular image to SMILES conversion
PaddleOCR	PaddlePaddle/PaddleOCR	Text recognition (PP-OCRv4/v5)

Citation

If you use these models, please cite:

@article{zhao2025flowfigtabminer,
  title={FlowFigTabMiner: Multimodal Extraction of Structured Flow Chemistry Data from Figures, Tables, and Text Enables Organolithium Lifetime Prediction},
  author={Zhao, Wenyuan and Zhong, Xianzhu and Wang, Simeng and Nagaki, Aiichiro},
  year={2025}
}

License

Apache 2.0

Downloads last month: 14