leofansq commited on
Commit
b27553e
·
verified ·
1 Parent(s): 3824ea0

update README

Browse files
Files changed (3) hide show
  1. .gitattributes +1 -0
  2. BioMedGPT-Mol.png +3 -0
  3. README.md +32 -30
.gitattributes CHANGED
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ BioMedGPT-Mol.png filter=lfs diff=lfs merge=lfs -text
BioMedGPT-Mol.png ADDED

Git LFS Details

  • SHA256: 04b6e530e36891578315e0a7e858c73ed038f4a57156da5e8369b3dbe13e5f61
  • Pointer size: 131 Bytes
  • Size of remote file: 214 kB
README.md CHANGED
@@ -2,47 +2,47 @@
2
 
3
  ![BioMedGPT-Mol](./BioMedGPT-Mol.png)
4
 
5
- BioMedGPT-Mol is a molecular language model jointly released by PharMolix Inc. and the Institute of AI Industry Research (AIR), Tsinghua University. It is built for both molecular understanding and generation, supporting a wide range of tasks including chemical name conversion, molecular captioning, property prediction, reaction modeling, molecule editing, and property optimization. Trained with a well-structured multi-task curriculum, BioMedGPT-Mol demonstrates remarkable performance across diverse molecule-centric discovery benchmarks.
6
 
7
  ### Get started
8
  * Download the model and config files.
9
 
10
  * Evaluation on Benchmarks
11
- * The testset data is provided in [testset](./evaluation/datasets/).
12
- If you use the dataset for evaluation, please consider citing:
13
  ```
14
  @article{yu2024llasmol,
15
- title={Llasmol: Advancing large language models for chemistry with a large-scale, comprehensive, high-quality instruction tuning dataset},
16
- author={Yu, Botao and Baker, Frazier N and Chen, Ziqi and Ning, Xia and Sun, Huan},
17
- journal={arXiv preprint arXiv:2402.09391},
18
- year={2024}
19
  }
20
 
21
  @article{li2024tomg,
22
- title={TOMG-Bench: Evaluating LLMs on text-based open molecule generation},
23
- author={Li, Jiatong and Li, Junxian and Liu, Yunqing and Zhou, Dongzhan and Li, Qing},
24
- journal={arXiv preprint arXiv:2412.14642},
25
- year={2024}
26
  }
27
 
28
  @article{dey2025mathtt,
29
- title={$$\backslash$mathtt $\{$GeLLM\^{} 3O$\}$ $: Generalizing Large Language Models for Multi-property Molecule Optimization},
30
- author={Dey, Vishal and Hu, Xiao and Ning, Xia},
31
- journal={arXiv preprint arXiv:2502.13398},
32
- year={2025}
33
  }
34
 
35
- @article{zuo2025biomedgptmol,
36
- title={BioMedGPT-Mol: Multi-task Learning for Molecular Understanding and Generation},
37
- author={Zuo, Chenyang and Fan, Siqi and Nie, Zaiqing},
38
- journal={arXiv preprint arXiv:2512.04629},
39
- year={2025}
40
  }
41
  ```
42
- * Update the config and inference with scripts. The results will be saved in the 'logs' directory.
43
  ```bash
44
  - logs
45
- ---- biomedgpt-mol
46
  -------- mumoinstruct
47
  ------------ logs
48
  ------------ results
@@ -55,31 +55,33 @@ BioMedGPT-Mol is a molecular language model jointly released by PharMolix Inc. a
55
  ```
56
  ```bash
57
  # SMolInstruction
58
- bash evaluation/biomedgpt-mol/scripts/inference_smolinstruct.sh
59
 
60
  # OpenMolInstuct
61
- bash evaluation/biomedgpt-mol/scripts/inference_openmolinst.sh
62
 
63
  # MuMoInstruct
64
- bash evaluation/biomedgpt-mol/scripts/inference_mumoinstruct.sh
65
  ```
66
- * Update the config and evaluate with scripts. The metrics will be saved as `metrics.json` in the result directory, e.g., `/logs/biomedgpt-mol/mumoinstruct/results/metrics.json`.
67
  ```bash
68
  # SMolInstruction
69
- bash evaluation/biomedgpt-mol/scripts/evaluate_smolinstruct.sh
70
 
71
  # OpenMolInstuct
72
- bash evaluation/biomedgpt-mol/scripts/evaluate_openmolinst.sh
73
 
74
  # MuMoInstruct
75
- bash evaluation/biomedgpt-mol/scripts/evaluate_mumoinstruct.sh
76
  ```
77
 
 
 
78
  ### Cite Us
79
  If you find our open-sourced models helpful to your research, please consider citing:
80
 
81
  ```
82
- @article{zuo2025biomedgptmol,
83
  title={BioMedGPT-Mol: Multi-task Learning for Molecular Understanding and Generation},
84
  author={Zuo, Chenyang and Fan, Siqi and Nie, Zaiqing},
85
  journal={arXiv preprint arXiv:2512.04629},
 
2
 
3
  ![BioMedGPT-Mol](./BioMedGPT-Mol.png)
4
 
5
+ BioMedGPT-Mol is a multimodal molecular language model jointly released by PharMolix Inc. and the Institute of AI Industry Research (AIR), Tsinghua University. It is built for both molecular understanding and generation, supporting a wide range of tasks including chemical name conversion, molecular captioning, property prediction, reaction modeling, molecule editing, and property optimization. Trained with a well-structured multi-task curriculum, BioMedGPT-Mol shows remarkable performance across diverse molecule-centric discovery benchmarks. More technical details can be found in the [technical report](https://arxiv.org/pdf/2512.04629).
6
 
7
  ### Get started
8
  * Download the model and config files.
9
 
10
  * Evaluation on Benchmarks
11
+ * The test set is available in [testset](./evaluation/datasets/).
12
+ If you use the dataset for evaluation, please consider citing the related papers:
13
  ```
14
  @article{yu2024llasmol,
15
+ title={Llasmol: Advancing large language models for chemistry with a large-scale, comprehensive, high-quality instruction tuning dataset},
16
+ author={Yu, Botao and Baker, Frazier N and Chen, Ziqi and Ning, Xia and Sun, Huan},
17
+ journal={arXiv preprint arXiv:2402.09391},
18
+ year={2024}
19
  }
20
 
21
  @article{li2024tomg,
22
+ title={TOMG-Bench: Evaluating LLMs on text-based open molecule generation},
23
+ author={Li, Jiatong and Li, Junxian and Liu, Yunqing and Zhou, Dongzhan and Li, Qing},
24
+ journal={arXiv preprint arXiv:2412.14642},
25
+ year={2024}
26
  }
27
 
28
  @article{dey2025mathtt,
29
+ title={$$\backslash$mathtt $\{$GeLLM\^{} 3O$\}$ $: Generalizing Large Language Models for Multi-property Molecule Optimization},
30
+ author={Dey, Vishal and Hu, Xiao and Ning, Xia},
31
+ journal={arXiv preprint arXiv:2502.13398},
32
+ year={2025}
33
  }
34
 
35
+ @article{biomedgpt-mol,
36
+ title={BioMedGPT-Mol: Multi-task Learning for Molecular Understanding and Generation},
37
+ author={Zuo, Chenyang and Fan, Siqi and Nie, Zaiqing},
38
+ journal={arXiv preprint arXiv:2512.04629},
39
+ year={2025}
40
  }
41
  ```
42
+ * Update the configuration and run inference using the provided scripts, and the outputs will be saved in the `logs` directory.
43
  ```bash
44
  - logs
45
+ ---- biomedgpt_mol
46
  -------- mumoinstruct
47
  ------------ logs
48
  ------------ results
 
55
  ```
56
  ```bash
57
  # SMolInstruction
58
+ bash evaluation/scripts/inference_smolinstruct.sh
59
 
60
  # OpenMolInstuct
61
+ bash evaluation/scripts/inference_openmolinst.sh
62
 
63
  # MuMoInstruct
64
+ bash evaluation/scripts/inference_mumoinstruct.sh
65
  ```
66
+ * Update the configuration accordingly and execute the evaluation scripts. The computed metrics will be stored as `metrics.json` in the results directory, e.g., `/logs/biomedgpt_mol/mumoinstruct/results/metrics.json`.
67
  ```bash
68
  # SMolInstruction
69
+ bash evaluation/scripts/evaluate_smolinstruct.sh
70
 
71
  # OpenMolInstuct
72
+ bash evaluation/scripts/evaluate_openmolinst.sh
73
 
74
  # MuMoInstruct
75
+ bash evaluation/scripts/evaluate_mumoinstruct.sh
76
  ```
77
 
78
+ * 🔥Explore our [OpenBioMed](https://github.com/PharMolix/OpenBioMed) platform for more discovery tasks.
79
+
80
  ### Cite Us
81
  If you find our open-sourced models helpful to your research, please consider citing:
82
 
83
  ```
84
+ @article{biomedgpt-mol,
85
  title={BioMedGPT-Mol: Multi-task Learning for Molecular Understanding and Generation},
86
  author={Zuo, Chenyang and Fan, Siqi and Nie, Zaiqing},
87
  journal={arXiv preprint arXiv:2512.04629},