File size: 884 Bytes
8f24b69
 
 
 
7f9c4c4
8f24b69
ae73202
f7b6ade
 
ae73202
8f24b69
 
 
 
 
11ac8e6
 
cedce84
 
11ac8e6
8f24b69
ac106e9
 
 
 
 
 
 
8f24b69
 
 
7f9c4c4
8f24b69
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
---
tags:
- generated_from_trainer
model-index:
- name: ROBERTA_SMILES_LARGE
  results: []
widget:
- text: <mask>1CC[C@]23[C@@H]4[C@H]1CC5=C2C(=C(C=C5)O)O[C@H]3[C@H](C=C4)O

pipeline_tag: fill-mask
---


# BERT_SMILES_LARGE

This model is a 83.5M parameter ROBERTA model fine tuned on a dataset of 1.1M SMILES (Simplified molecular-input line-entry system) for masked language modeling (MLM). This model builds on BERT_SMILES which was fine tuned on only 50k SMILES.

If you find this model useful, I would really appreciate you giving it a like!

Evaluation Loss: 0.482

Example: 

Morphine

```
CN1CC[C@]23[C@@H]4[C@H]1CC5=C2C(=C(C=C5)O)O[C@H]3[C@H](C=C4)O
```

## Intended uses & limitations

This model can now be used to predict physical or chemical properties with further training. 

### Framework versions

- Transformers 4.37.0.dev0
- Pytorch 2.1.0+cu121
- Tokenizers 0.15.0