update README

Browse files

Files changed (3) hide show

.gitattributes +1 -0
BioMedGPT-Mol.png +3 -0
README.md +32 -30

.gitattributes CHANGED Viewed

@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text

 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
+BioMedGPT-Mol.png filter=lfs diff=lfs merge=lfs -text

BioMedGPT-Mol.png ADDED Viewed

Git LFS Details

SHA256: 04b6e530e36891578315e0a7e858c73ed038f4a57156da5e8369b3dbe13e5f61
Pointer size: 131 Bytes
Size of remote file: 214 kB

README.md CHANGED Viewed

@@ -2,47 +2,47 @@
 ![BioMedGPT-Mol](./BioMedGPT-Mol.png)
-BioMedGPT-Mol is a molecular language model jointly released by PharMolix Inc. and the Institute of AI Industry Research (AIR), Tsinghua University. It is built for both molecular understanding and generation, supporting a wide range of tasks including chemical name conversion, molecular captioning, property prediction, reaction modeling, molecule editing, and property optimization. Trained with a well-structured multi-task curriculum, BioMedGPT-Mol demonstrates remarkable performance across diverse molecule-centric discovery benchmarks.
 ### Get started
 * Download the model and config files.
 * Evaluation on Benchmarks
-    * The testset data is provided in [testset](./evaluation/datasets/).
-        If you use the dataset for evaluation, please consider citing:
         ```
         @article{yu2024llasmol,
-        title={Llasmol: Advancing large language models for chemistry with a large-scale, comprehensive, high-quality instruction tuning dataset},
-        author={Yu, Botao and Baker, Frazier N and Chen, Ziqi and Ning, Xia and Sun, Huan},
-        journal={arXiv preprint arXiv:2402.09391},
-        year={2024}
         }
         @article{li2024tomg,
-        title={TOMG-Bench: Evaluating LLMs on text-based open molecule generation},
-        author={Li, Jiatong and Li, Junxian and Liu, Yunqing and Zhou, Dongzhan and Li, Qing},
-        journal={arXiv preprint arXiv:2412.14642},
-        year={2024}
         }
         @article{dey2025mathtt,
-        title={$$\backslash$mathtt $\{$GeLLM\^{} 3O$\}$ $: Generalizing Large Language Models for Multi-property Molecule Optimization},
-        author={Dey, Vishal and Hu, Xiao and Ning, Xia},
-        journal={arXiv preprint arXiv:2502.13398},
-        year={2025}
         }
-        @article{zuo2025biomedgptmol,
-        title={BioMedGPT-Mol: Multi-task Learning for Molecular Understanding and Generation},
-        author={Zuo, Chenyang and Fan, Siqi and Nie, Zaiqing},
-        journal={arXiv preprint arXiv:2512.04629},
-        year={2025}
         }
         ```
-    * Update the config and inference with scripts. The results will be saved in the 'logs' directory.
         ```bash
         - logs
-        ---- biomedgpt-mol
         -------- mumoinstruct
         ------------ logs
         ------------ results
@@ -55,31 +55,33 @@ BioMedGPT-Mol is a molecular language model jointly released by PharMolix Inc. a
         ```
         ```bash
         # SMolInstruction
-        bash evaluation/biomedgpt-mol/scripts/inference_smolinstruct.sh
         # OpenMolInstuct
-        bash evaluation/biomedgpt-mol/scripts/inference_openmolinst.sh
         # MuMoInstruct
-        bash evaluation/biomedgpt-mol/scripts/inference_mumoinstruct.sh
         ```
-    * Update the config and evaluate with scripts. The metrics will be saved as `metrics.json` in the result directory, e.g., `/logs/biomedgpt-mol/mumoinstruct/results/metrics.json`.
          ```bash
         # SMolInstruction
-        bash evaluation/biomedgpt-mol/scripts/evaluate_smolinstruct.sh
         # OpenMolInstuct
-        bash evaluation/biomedgpt-mol/scripts/evaluate_openmolinst.sh
         # MuMoInstruct
-        bash evaluation/biomedgpt-mol/scripts/evaluate_mumoinstruct.sh
         ```
 ### Cite Us
 If you find our open-sourced models helpful to your research, please consider citing:
 ```
-@article{zuo2025biomedgptmol,
   title={BioMedGPT-Mol: Multi-task Learning for Molecular Understanding and Generation},
   author={Zuo, Chenyang and Fan, Siqi and Nie, Zaiqing},
   journal={arXiv preprint arXiv:2512.04629},

 ![BioMedGPT-Mol](./BioMedGPT-Mol.png)
+BioMedGPT-Mol is a multimodal molecular language model jointly released by PharMolix Inc. and the Institute of AI Industry Research (AIR), Tsinghua University. It is built for both molecular understanding and generation, supporting a wide range of tasks including chemical name conversion, molecular captioning, property prediction, reaction modeling, molecule editing, and property optimization. Trained with a well-structured multi-task curriculum, BioMedGPT-Mol shows remarkable performance across diverse molecule-centric discovery benchmarks. More technical details can be found in the [technical report](https://arxiv.org/pdf/2512.04629).
 ### Get started
 * Download the model and config files.
 * Evaluation on Benchmarks
+    * The test set is available in [testset](./evaluation/datasets/).
+        If you use the dataset for evaluation, please consider citing the related papers:
         ```
         @article{yu2024llasmol,
+            title={Llasmol: Advancing large language models for chemistry with a large-scale, comprehensive, high-quality instruction tuning dataset},
+            author={Yu, Botao and Baker, Frazier N and Chen, Ziqi and Ning, Xia and Sun, Huan},
+            journal={arXiv preprint arXiv:2402.09391},
+            year={2024}
         }
         @article{li2024tomg,
+            title={TOMG-Bench: Evaluating LLMs on text-based open molecule generation},
+            author={Li, Jiatong and Li, Junxian and Liu, Yunqing and Zhou, Dongzhan and Li, Qing},
+            journal={arXiv preprint arXiv:2412.14642},
+            year={2024}
         }
         @article{dey2025mathtt,
+            title={$$\backslash$mathtt $\{$GeLLM\^{} 3O$\}$ $: Generalizing Large Language Models for Multi-property Molecule Optimization},
+            author={Dey, Vishal and Hu, Xiao and Ning, Xia},
+            journal={arXiv preprint arXiv:2502.13398},
+            year={2025}
         }
+        @article{biomedgpt-mol,
+            title={BioMedGPT-Mol: Multi-task Learning for Molecular Understanding and Generation},
+            author={Zuo, Chenyang and Fan, Siqi and Nie, Zaiqing},
+            journal={arXiv preprint arXiv:2512.04629},
+            year={2025}
         }
         ```
+    * Update the configuration and run inference using the provided scripts, and the outputs will be saved in the `logs` directory.
         ```bash
         - logs
+        ---- biomedgpt_mol
         -------- mumoinstruct
         ------------ logs
         ------------ results
         ```
         ```bash
         # SMolInstruction
+        bash evaluation/scripts/inference_smolinstruct.sh
         # OpenMolInstuct
+        bash evaluation/scripts/inference_openmolinst.sh
         # MuMoInstruct
+        bash evaluation/scripts/inference_mumoinstruct.sh
         ```
+    * Update the configuration accordingly and execute the evaluation scripts. The computed metrics will be stored as `metrics.json` in the results directory, e.g., `/logs/biomedgpt_mol/mumoinstruct/results/metrics.json`.
          ```bash
         # SMolInstruction
+        bash evaluation/scripts/evaluate_smolinstruct.sh
         # OpenMolInstuct
+        bash evaluation/scripts/evaluate_openmolinst.sh
         # MuMoInstruct
+        bash evaluation/scripts/evaluate_mumoinstruct.sh
         ```
+* 🔥Explore our [OpenBioMed](https://github.com/PharMolix/OpenBioMed) platform for more discovery tasks.
 ### Cite Us
 If you find our open-sourced models helpful to your research, please consider citing:
 ```
+@article{biomedgpt-mol,
   title={BioMedGPT-Mol: Multi-task Learning for Molecular Understanding and Generation},
   author={Zuo, Chenyang and Fan, Siqi and Nie, Zaiqing},
   journal={arXiv preprint arXiv:2512.04629},