|
|
--- |
|
|
license: cc-by-nc-sa-4.0 |
|
|
datasets: |
|
|
- UniParser/MolDet-Bench |
|
|
base_model: |
|
|
- UniParser/MolDet |
|
|
- Ultralytics/YOLO11 |
|
|
tags: |
|
|
- chemistry |
|
|
--- |
|
|
|
|
|
|
|
|
# Molecule Detection YOLO in MolParser2.0 |
|
|
|
|
|
Compared to [MolDet](https://huggingface.co/UniParser/MolDet), our new **MolDetv2** model leverages more manually annotated training data, with further optimizations specifically for reducing molecular false detections and improving bounding box regression, achieving stronger performance with a smaller model. |
|
|
|
|
|
|
|
|
## [MolDet-General] universal molecule structure detection |
|
|
|
|
|
YOLO11-n weights trained on more than 100k human annotated image crops & synthesis molecule images. |
|
|
|
|
|
 |
|
|
|
|
|
features: |
|
|
* 640x640 input resolution |
|
|
* support handwritten molecules detection |
|
|
* **multiscale input** (inputs can be single/multiple molecular cutouts, reaction or table cutouts, or single-page PDF images) |
|
|
* *update: MolDetv2 substantially reduces false positives on formulas, ball-and-stick diagrams, etc.* |
|
|
|
|
|
usage: |
|
|
```python |
|
|
from ultralytics import YOLO |
|
|
model = YOLO("/path/to/moldet_v2_yolo11n_640_general.pt") # for cpu only inference: using `moldet_v2_yolo11n_640_general.onnx` for faster speed |
|
|
model.predict("path/to/image.png", save=True, imgsz=640, conf=0.5) |
|
|
``` |
|
|
For further usage instructions, please refer to the [official Ultralytics documentation](https://docs.ultralytics.com/modes/predict/). |
|
|
|
|
|
|
|
|
|
|
|
## [MolDet-Doc] document molecule structure detection |
|
|
|
|
|
YOLO11-n weights trained on more than 60k human annotated PDF pages (patents, papers, and books) and 10k synthesis PDF pages with molecule images. |
|
|
|
|
|
 |
|
|
|
|
|
features: |
|
|
* 960x960 input resolution |
|
|
* prefer **single page PDF image** input |
|
|
* better in small molecule detection |
|
|
* *update: MolDetv2 substantially reduces false positives on formulas, ball-and-stick diagrams, and graphical symbols, with tighter bounding box alignment to molecular edges.* |
|
|
|
|
|
usage: |
|
|
```python |
|
|
from ultralytics import YOLO |
|
|
import fitz # MuPDF |
|
|
pdf = fitz.open("doc.pdf") |
|
|
model = YOLO("/path/to/moldet_v2_yolo11n_960_doc.pt") # for cpu only inference: using `moldet_v2_yolo11n_960_doc.onnx` for faster speed |
|
|
bboxes = [] |
|
|
for i, p in enumerate(pdf): |
|
|
img = f"page_{i}.png"; p.get_pixmap().save(img) |
|
|
for r in model.predict(img, imgsz=960, conf=0.5): |
|
|
for box in r.boxes: |
|
|
bboxes.append({"page":img, "conf":float(box.conf), "bbox":box.xyxy[0].tolist()}) |
|
|
``` |
|
|
For further usage instructions, please refer to the [official Ultralytics documentation](https://docs.ultralytics.com/modes/predict/). |
|
|
|
|
|
|
|
|
## π BenchMark Results |
|
|
|
|
|
Please refer to [MolDet-Bench](https://huggingface.co/datasets/UniParser/MolDet-Bench) |
|
|
|
|
|
|
|
|
## π License |
|
|
|
|
|
MolDet & MolDetv2 model weights are provided for **non-commercial use only**. |
|
|
|
|
|
For commercial use, please contact: [fangxi@dp.tech](mailto:fangxi@dp.tech) or add a discussion in HuggingFace. |
|
|
|
|
|
|
|
|
## π Citation |
|
|
|
|
|
If you use this model in your work, please cite: |
|
|
|
|
|
``` |
|
|
Comming soon! |
|
|
``` |