File size: 1,176 Bytes
762d2de ea04d48 762d2de 66e3253 762d2de 66e3253 3635b12 66e3253 762d2de d5d5696 762d2de | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | ---
license: mit
---
This repository contains the model weights of the BERT model trained using masked language modelling on 60% of the [GuacaMol](https://pubs.acs.org/doi/abs/10.1021/acs.jcim.8b00839) dataset.
Further information can be found in our [publication](https://arxiv.org/abs/2503.03360).
```python
from transformers import AutoModel, AutoTokenizer
mols = [
"CCOc1cc2nn(CCC(C)(C)O)cc2cc1NC(=O)c1cccc(C(F)F)n1",
"CN(c1ncc(F)cn1)[C@H]1CCCNC1",
"CC(C)(Oc1ccc(-c2cnc(N)c(-c3ccc(Cl)cc3)c2)cc1)C(=O)O",
"CC(C)(O)CCn1cc2cc(NC(=O)c3cccc(C(F)(F)F)n3)c(C(C)(C)O)cc2n1",
# ...
]
tokenizer = AutoTokenizer.from_pretrained("UdS-LSV/da4mt-mlm-60")
model = AutoModel.from_pretrained("UdS-LSV/da4mt-mlm-60")
inputs = tokenizer(mols, add_special_tokens=True, truncation=True, max_length=128, padding="max_length", return_tensors="pt")
embeddings = model(**inputs).last_hidden_state[:, 0, :]
```

### See also
- https://huggingface.co/UdS-LSV/da4mt-mlm-30
- https://huggingface.co/UdS-LSV/da4mt-mtr-30
- https://huggingface.co/UdS-LSV/da4mt-mtr-60 |