update README
Browse files- .gitattributes +1 -0
- BioMedGPT-Mol.png +3 -0
- README.md +32 -30
.gitattributes
CHANGED
|
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
BioMedGPT-Mol.png filter=lfs diff=lfs merge=lfs -text
|
BioMedGPT-Mol.png
ADDED
|
Git LFS Details
|
README.md
CHANGED
|
@@ -2,47 +2,47 @@
|
|
| 2 |
|
| 3 |

|
| 4 |
|
| 5 |
-
BioMedGPT-Mol is a molecular language model jointly released by PharMolix Inc. and the Institute of AI Industry Research (AIR), Tsinghua University. It is built for both molecular understanding and generation, supporting a wide range of tasks including chemical name conversion, molecular captioning, property prediction, reaction modeling, molecule editing, and property optimization. Trained with a well-structured multi-task curriculum, BioMedGPT-Mol
|
| 6 |
|
| 7 |
### Get started
|
| 8 |
* Download the model and config files.
|
| 9 |
|
| 10 |
* Evaluation on Benchmarks
|
| 11 |
-
* The
|
| 12 |
-
If you use the dataset for evaluation, please consider citing:
|
| 13 |
```
|
| 14 |
@article{yu2024llasmol,
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
}
|
| 20 |
|
| 21 |
@article{li2024tomg,
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
}
|
| 27 |
|
| 28 |
@article{dey2025mathtt,
|
| 29 |
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
}
|
| 34 |
|
| 35 |
-
@article{
|
| 36 |
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
}
|
| 41 |
```
|
| 42 |
-
* Update the
|
| 43 |
```bash
|
| 44 |
- logs
|
| 45 |
-
----
|
| 46 |
-------- mumoinstruct
|
| 47 |
------------ logs
|
| 48 |
------------ results
|
|
@@ -55,31 +55,33 @@ BioMedGPT-Mol is a molecular language model jointly released by PharMolix Inc. a
|
|
| 55 |
```
|
| 56 |
```bash
|
| 57 |
# SMolInstruction
|
| 58 |
-
bash evaluation/
|
| 59 |
|
| 60 |
# OpenMolInstuct
|
| 61 |
-
bash evaluation/
|
| 62 |
|
| 63 |
# MuMoInstruct
|
| 64 |
-
bash evaluation/
|
| 65 |
```
|
| 66 |
-
* Update the
|
| 67 |
```bash
|
| 68 |
# SMolInstruction
|
| 69 |
-
bash evaluation/
|
| 70 |
|
| 71 |
# OpenMolInstuct
|
| 72 |
-
bash evaluation/
|
| 73 |
|
| 74 |
# MuMoInstruct
|
| 75 |
-
bash evaluation/
|
| 76 |
```
|
| 77 |
|
|
|
|
|
|
|
| 78 |
### Cite Us
|
| 79 |
If you find our open-sourced models helpful to your research, please consider citing:
|
| 80 |
|
| 81 |
```
|
| 82 |
-
@article{
|
| 83 |
title={BioMedGPT-Mol: Multi-task Learning for Molecular Understanding and Generation},
|
| 84 |
author={Zuo, Chenyang and Fan, Siqi and Nie, Zaiqing},
|
| 85 |
journal={arXiv preprint arXiv:2512.04629},
|
|
|
|
| 2 |
|
| 3 |

|
| 4 |
|
| 5 |
+
BioMedGPT-Mol is a multimodal molecular language model jointly released by PharMolix Inc. and the Institute of AI Industry Research (AIR), Tsinghua University. It is built for both molecular understanding and generation, supporting a wide range of tasks including chemical name conversion, molecular captioning, property prediction, reaction modeling, molecule editing, and property optimization. Trained with a well-structured multi-task curriculum, BioMedGPT-Mol shows remarkable performance across diverse molecule-centric discovery benchmarks. More technical details can be found in the [technical report](https://arxiv.org/pdf/2512.04629).
|
| 6 |
|
| 7 |
### Get started
|
| 8 |
* Download the model and config files.
|
| 9 |
|
| 10 |
* Evaluation on Benchmarks
|
| 11 |
+
* The test set is available in [testset](./evaluation/datasets/).
|
| 12 |
+
If you use the dataset for evaluation, please consider citing the related papers:
|
| 13 |
```
|
| 14 |
@article{yu2024llasmol,
|
| 15 |
+
title={Llasmol: Advancing large language models for chemistry with a large-scale, comprehensive, high-quality instruction tuning dataset},
|
| 16 |
+
author={Yu, Botao and Baker, Frazier N and Chen, Ziqi and Ning, Xia and Sun, Huan},
|
| 17 |
+
journal={arXiv preprint arXiv:2402.09391},
|
| 18 |
+
year={2024}
|
| 19 |
}
|
| 20 |
|
| 21 |
@article{li2024tomg,
|
| 22 |
+
title={TOMG-Bench: Evaluating LLMs on text-based open molecule generation},
|
| 23 |
+
author={Li, Jiatong and Li, Junxian and Liu, Yunqing and Zhou, Dongzhan and Li, Qing},
|
| 24 |
+
journal={arXiv preprint arXiv:2412.14642},
|
| 25 |
+
year={2024}
|
| 26 |
}
|
| 27 |
|
| 28 |
@article{dey2025mathtt,
|
| 29 |
+
title={$$\backslash$mathtt $\{$GeLLM\^{} 3O$\}$ $: Generalizing Large Language Models for Multi-property Molecule Optimization},
|
| 30 |
+
author={Dey, Vishal and Hu, Xiao and Ning, Xia},
|
| 31 |
+
journal={arXiv preprint arXiv:2502.13398},
|
| 32 |
+
year={2025}
|
| 33 |
}
|
| 34 |
|
| 35 |
+
@article{biomedgpt-mol,
|
| 36 |
+
title={BioMedGPT-Mol: Multi-task Learning for Molecular Understanding and Generation},
|
| 37 |
+
author={Zuo, Chenyang and Fan, Siqi and Nie, Zaiqing},
|
| 38 |
+
journal={arXiv preprint arXiv:2512.04629},
|
| 39 |
+
year={2025}
|
| 40 |
}
|
| 41 |
```
|
| 42 |
+
* Update the configuration and run inference using the provided scripts, and the outputs will be saved in the `logs` directory.
|
| 43 |
```bash
|
| 44 |
- logs
|
| 45 |
+
---- biomedgpt_mol
|
| 46 |
-------- mumoinstruct
|
| 47 |
------------ logs
|
| 48 |
------------ results
|
|
|
|
| 55 |
```
|
| 56 |
```bash
|
| 57 |
# SMolInstruction
|
| 58 |
+
bash evaluation/scripts/inference_smolinstruct.sh
|
| 59 |
|
| 60 |
# OpenMolInstuct
|
| 61 |
+
bash evaluation/scripts/inference_openmolinst.sh
|
| 62 |
|
| 63 |
# MuMoInstruct
|
| 64 |
+
bash evaluation/scripts/inference_mumoinstruct.sh
|
| 65 |
```
|
| 66 |
+
* Update the configuration accordingly and execute the evaluation scripts. The computed metrics will be stored as `metrics.json` in the results directory, e.g., `/logs/biomedgpt_mol/mumoinstruct/results/metrics.json`.
|
| 67 |
```bash
|
| 68 |
# SMolInstruction
|
| 69 |
+
bash evaluation/scripts/evaluate_smolinstruct.sh
|
| 70 |
|
| 71 |
# OpenMolInstuct
|
| 72 |
+
bash evaluation/scripts/evaluate_openmolinst.sh
|
| 73 |
|
| 74 |
# MuMoInstruct
|
| 75 |
+
bash evaluation/scripts/evaluate_mumoinstruct.sh
|
| 76 |
```
|
| 77 |
|
| 78 |
+
* 🔥Explore our [OpenBioMed](https://github.com/PharMolix/OpenBioMed) platform for more discovery tasks.
|
| 79 |
+
|
| 80 |
### Cite Us
|
| 81 |
If you find our open-sourced models helpful to your research, please consider citing:
|
| 82 |
|
| 83 |
```
|
| 84 |
+
@article{biomedgpt-mol,
|
| 85 |
title={BioMedGPT-Mol: Multi-task Learning for Molecular Understanding and Generation},
|
| 86 |
author={Zuo, Chenyang and Fan, Siqi and Nie, Zaiqing},
|
| 87 |
journal={arXiv preprint arXiv:2512.04629},
|