Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,75 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: cc-by-nc-sa-4.0
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: cc-by-nc-sa-4.0
|
| 3 |
+
datasets:
|
| 4 |
+
- UniParser/MolDet-Bench
|
| 5 |
+
base_model:
|
| 6 |
+
- UniParser/MolDet
|
| 7 |
+
- Ultralytics/YOLO11
|
| 8 |
+
tags:
|
| 9 |
+
- chemistry
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
# Molecule Detection YOLO in MolParser2.0
|
| 14 |
+
|
| 15 |
+
Compared to [MolDet](https://huggingface.co/UniParser/MolDet), **MolDetv2** leverages more manually annotated training data, with further optimizations specifically for reducing molecular false detections and improving bounding box regression, achieving stronger performance with a smaller model.
|
| 16 |
+
|
| 17 |
+
## [MolDet-General] General molecule structure detection models
|
| 18 |
+
|
| 19 |
+
YOLO11-n weights trained on more than 100k human annotated image crops & synthesis molecule images.
|
| 20 |
+
|
| 21 |
+
* 640x640 input resolution
|
| 22 |
+
* support handwritten molecules detection
|
| 23 |
+
* **multiscale input** (inputs can be single/multiple molecular cutouts, reaction or table cutouts, or single-page PDF images)
|
| 24 |
+
* *update: MolDetv2 substantially reduces false positives on formulas, ball-and-stick diagrams, etc.*
|
| 25 |
+
|
| 26 |
+
usage:
|
| 27 |
+
```python
|
| 28 |
+
from ultralytics import YOLO
|
| 29 |
+
model = YOLO("/path/to/moldet_v2_yolo11n_640_general.pt") # for cpu only inference: using `moldet_v2_yolo11n_640_general.onnx` for faster speed
|
| 30 |
+
model.predict("path/to/image.png", save=True, imgsz=640, conf=0.5)
|
| 31 |
+
```
|
| 32 |
+
For further usage instructions, please refer to the [official Ultralytics documentation](https://docs.ultralytics.com/modes/predict/).
|
| 33 |
+
|
| 34 |
+
## [MolDet-Doc] PDF molecule structure detection models
|
| 35 |
+
|
| 36 |
+
YOLO11-n weights trained on more than 60k human annotated PDF pages (patents, papers, and books) and 10k synthesis PDF pages with molecule images.
|
| 37 |
+
|
| 38 |
+
* 960x960 input resolution
|
| 39 |
+
* prefer **single page PDF image** input
|
| 40 |
+
* better in small molecule detection
|
| 41 |
+
* *update: MolDetv2 substantially reduces false positives on formulas, ball-and-stick diagrams, and graphical symbols, with tighter bounding box alignment to molecular edges.*
|
| 42 |
+
|
| 43 |
+
usage:
|
| 44 |
+
```python
|
| 45 |
+
from ultralytics import YOLO
|
| 46 |
+
import fitz # MuPDF
|
| 47 |
+
pdf = fitz.open("doc.pdf")
|
| 48 |
+
model = YOLO("/path/to/moldet_v2_yolo11n_960_doc.pt") # for cpu only inference: using `moldet_v2_yolo11n_960_doc.onnx` for faster speed
|
| 49 |
+
bboxes = []
|
| 50 |
+
for i, p in enumerate(pdf):
|
| 51 |
+
img = f"page_{i}.png"; p.get_pixmap().save(img)
|
| 52 |
+
for r in model.predict(img, imgsz=960, conf=0.5):
|
| 53 |
+
for box in r.boxes:
|
| 54 |
+
bboxes.append({"page":img, "conf":float(box.conf), "bbox":box.xyxy[0].tolist()})
|
| 55 |
+
```
|
| 56 |
+
For further usage instructions, please refer to the [official Ultralytics documentation](https://docs.ultralytics.com/modes/predict/).
|
| 57 |
+
|
| 58 |
+
## 📊 BenchMark Results
|
| 59 |
+
|
| 60 |
+
Please refer to [MolDet-Bench](https://huggingface.co/datasets/UniParser/MolDet-Bench)
|
| 61 |
+
|
| 62 |
+

|
| 63 |
+
|
| 64 |
+
|
| 65 |
+

|
| 66 |
+

|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
## 📖 Citation
|
| 70 |
+
|
| 71 |
+
If you use this model in your work, please cite:
|
| 72 |
+
|
| 73 |
+
```
|
| 74 |
+
comming soon!
|
| 75 |
+
```
|