FlowFigTabMiner YOLO Models

Five custom-trained YOLOv11 object detection models for extracting structured data from scientific flow chemistry publications. These models are core components of the FlowFigTabMiner pipeline.

Models

Directory Backbone Task Classes Input Size
fig-seg/ YOLOv11m Figure macro segmentation 6: caption, legend, subfigure_marker, target_image, x_axis_title, y_axis_title 1024 px
fig-sca/ YOLOv11m Figure micro detection (scatter) 4: data_point, data_value, x_tick_label, y_tick_label 1024 px
tab-seg/ YOLOv11m Table segmentation 4: table_body, table_caption, table_note, table_scheme 1024 px
tab-mol/ YOLOv11s Molecular structure detection 1: Structure 1024 px
tab-scheme-seg/ YOLOv11n Reaction scheme segmentation 4: arrow, molecule, table-condition, table-mark 1024 px

Performance

Model Precision Recall F1 mAP50 mAP50-95
fig-seg 88.2% 91.7% 89.9% 91.7% 66.9%
fig-sca 92.1% 91.9% 92.0% 92.5% 69.3%
tab-seg 91.2% 89.9% 90.5% 94.8% 77.7%
tab-mol 96.2% 95.4% 95.8% 95.6% 81.2%
tab-scheme-seg 93.9% 94.3% 94.1% 96.5% 72.1%

tab-scheme-seg Per-Class Performance

Class Images Instances Precision Recall mAP50 mAP50-95
arrow 16 19 100% 88.4% 93.7% 63.4%
molecule 17 67 97.0% 96.8% 99.1% 91.1%
table-condition 15 31 85.5% 95.1% 96.1% 73.7%
table-mark 15 69 93.0% 96.7% 97.1% 60.1%

Training Details

All models were trained on manually annotated images from flow chemistry publications.

Model Epochs Batch Size Optimizer LR Key Augmentation
fig-seg 150 32 AdamW 0.01 mosaic=0, mixup=0
fig-sca 200 24 AdamW 0.01 mosaic=0, mixup=0.15
tab-seg 200 24 AdamW 0.001 mosaic=0.5, mixup=0.15
tab-mol 200 32 AdamW 0.001 mosaic=0.5, cos_lr=True
tab-scheme-seg 300 auto (-1) AdamW 0.0005 mosaic=1.0, mixup=0.1, rect=True

Full training configurations are provided in args.yaml within each model directory.

Usage

from ultralytics import YOLO

# Load a model
model = YOLO("fig-seg/best.pt")

# Run inference on a figure image
results = model("path/to/figure.png", imgsz=1024)

# Access detections
for box in results[0].boxes:
    cls = int(box.cls)
    conf = float(box.conf)
    xyxy = box.xyxy[0].tolist()
    print(f"Class: {cls}, Confidence: {conf:.3f}, Box: {xyxy}")

Pipeline Architecture

These models work together in the FlowFigTabMiner pipeline:

  1. fig-seg isolates the chart region from captions, legends, and axis titles
  2. fig-sca detects data points and tick labels within the cleaned chart
  3. Coordinate mapping converts pixel positions to physical values using OCR on tick labels
  4. tab-seg separates table body, caption, footnotes, and reaction schemes
  5. tab-mol detects molecular structure images for SMILES conversion via MolNexTR
  6. tab-scheme-seg segments reaction scheme diagrams into arrows, molecules, and conditions

Other Models in FlowFigTabMiner (not included here)

These third-party pretrained models are also used in the pipeline and should be obtained from their original sources:

Model Source Purpose
TF-ID (Florence-2-base) yifeihu/TF-ID-base Figure/table detection in PDF pages
Table Transformer (TATR) microsoft/table-transformer-structure-recognition-v1.1-all Table row/column/header detection
MolNexTR CYF200127/MolNexTR Molecular image to SMILES conversion
PaddleOCR PaddlePaddle/PaddleOCR Text recognition (PP-OCRv4/v5)

Citation

If you use these models, please cite:

@article{zhao2025flowfigtabminer,
  title={FlowFigTabMiner: Multimodal Extraction of Structured Flow Chemistry Data from Figures, Tables, and Text Enables Organolithium Lifetime Prediction},
  author={Zhao, Wenyuan and Zhong, Xianzhu and Wang, Simeng and Nagaki, Aiichiro},
  year={2025}
}

License

Apache 2.0

Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support