UniParser
/

MolDetv2

Model card Files Files and versions

MolDetv2 / README.md

AI4Industry's picture

Update README.md

4406d8b verified 23 days ago

|

history blame contribute delete

3.15 kB

	---
	license: cc-by-nc-sa-4.0
	datasets:
	- UniParser/MolDet-Bench
	base_model:
	- UniParser/MolDet
	- Ultralytics/YOLO11
	tags:
	- chemistry
	---


	# Molecule Detection YOLO in MolParser2.0

	Compared to [MolDet](https://huggingface.co/UniParser/MolDet), our new MolDetv2 model leverages more manually annotated training data, with further optimizations specifically for reducing molecular false detections and improving bounding box regression, achieving stronger performance with a smaller model.


	## [MolDet-General] universal molecule structure detection

	YOLO11-n weights trained on more than 100k human annotated image crops & synthesis molecule images.

	![image](https://cdn-uploads.huggingface.co/production/uploads/65f7f16fb6941db5c2e7c4bf/iZqZ8rUsD6jacIJr8Hbag.png)

	features:
	* 640x640 input resolution
	* support handwritten molecules detection
	* multiscale input (inputs can be single/multiple molecular cutouts, reaction or table cutouts, or single-page PDF images)
	* update: MolDetv2 substantially reduces false positives on formulas, ball-and-stick diagrams, etc.

	usage:
	```python
	from ultralytics import YOLO
	model = YOLO("/path/to/moldet_v2_yolo11n_640_general.pt") # for cpu only inference: using `moldet_v2_yolo11n_640_general.onnx` for faster speed
	model.predict("path/to/image.png", save=True, imgsz=640, conf=0.5)
	```
	For further usage instructions, please refer to the [official Ultralytics documentation](https://docs.ultralytics.com/modes/predict/).



	## [MolDet-Doc] document molecule structure detection

	YOLO11-n weights trained on more than 60k human annotated PDF pages (patents, papers, and books) and 10k synthesis PDF pages with molecule images.

	![image](https://cdn-uploads.huggingface.co/production/uploads/65f7f16fb6941db5c2e7c4bf/rKZjaZ0EingRtxdIe5Ptz.png)

	features:
	* 960x960 input resolution
	* prefer single page PDF image input
	* better in small molecule detection
	* update: MolDetv2 substantially reduces false positives on formulas, ball-and-stick diagrams, and graphical symbols, with tighter bounding box alignment to molecular edges.

	usage:
	```python
	from ultralytics import YOLO
	import fitz # MuPDF
	pdf = fitz.open("doc.pdf")
	model = YOLO("/path/to/moldet_v2_yolo11n_960_doc.pt") # for cpu only inference: using `moldet_v2_yolo11n_960_doc.onnx` for faster speed
	bboxes = []
	for i, p in enumerate(pdf):
	img = f"page_{i}.png"; p.get_pixmap().save(img)
	for r in model.predict(img, imgsz=960, conf=0.5):
	for box in r.boxes:
	bboxes.append({"page":img, "conf":float(box.conf), "bbox":box.xyxy[0].tolist()})
	```
	For further usage instructions, please refer to the [official Ultralytics documentation](https://docs.ultralytics.com/modes/predict/).


	## 📊 BenchMark Results

	Please refer to [MolDet-Bench](https://huggingface.co/datasets/UniParser/MolDet-Bench)


	## 📜 License

	MolDet & MolDetv2 model weights are provided for non-commercial use only.

	For commercial use, please contact: [fangxi@dp.tech](mailto:fangxi@dp.tech) or add a discussion in HuggingFace.


	## 📖 Citation

	If you use this model in your work, please cite:

	```
	Comming soon!
	```