|
|
--- |
|
|
tags: |
|
|
- generated_from_trainer |
|
|
model-index: |
|
|
- name: ROBERTA_SMILES_LARGE |
|
|
results: [] |
|
|
widget: |
|
|
- text: <mask>1CC[C@]23[C@@H]4[C@H]1CC5=C2C(=C(C=C5)O)O[C@H]3[C@H](C=C4)O |
|
|
|
|
|
pipeline_tag: fill-mask |
|
|
--- |
|
|
|
|
|
|
|
|
# BERT_SMILES_LARGE |
|
|
|
|
|
This model is a 83.5M parameter ROBERTA model fine tuned on a dataset of 1.1M SMILES (Simplified molecular-input line-entry system) for masked language modeling (MLM). This model builds on BERT_SMILES which was fine tuned on only 50k SMILES. |
|
|
|
|
|
If you find this model useful, I would really appreciate you giving it a like! |
|
|
|
|
|
Evaluation Loss: 0.482 |
|
|
|
|
|
Example: |
|
|
|
|
|
Morphine |
|
|
|
|
|
``` |
|
|
CN1CC[C@]23[C@@H]4[C@H]1CC5=C2C(=C(C=C5)O)O[C@H]3[C@H](C=C4)O |
|
|
``` |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
|
|
This model can now be used to predict physical or chemical properties with further training. |
|
|
|
|
|
### Framework versions |
|
|
|
|
|
- Transformers 4.37.0.dev0 |
|
|
- Pytorch 2.1.0+cu121 |
|
|
- Tokenizers 0.15.0 |
|
|
|