Instructions to use DeepChem/ChemBERTa-77M-MLM with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use DeepChem/ChemBERTa-77M-MLM with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("fill-mask", model="DeepChem/ChemBERTa-77M-MLM")# Load model directly from transformers import AutoTokenizer, AutoModelForMaskedLM tokenizer = AutoTokenizer.from_pretrained("DeepChem/ChemBERTa-77M-MLM") model = AutoModelForMaskedLM.from_pretrained("DeepChem/ChemBERTa-77M-MLM") - Inference
- Notebooks
- Google Colab
- Kaggle
The Tokenizer is Broken
#3
by ribesstefano - opened
As already mentioned in other models' repositories from DeepChem (see here), the model tokenizer is broken.
Snippet to reproduce:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('DeepChem/ChemBERTa-77M-MLM')
sample_smiles = 'CN(CCCNc1nc(Nc2ccc([*:1])cc2)ncc1Br)C(=O)C1CCC1'
tokens = tokenizer(sample_smiles)
print(tokens)
decoded_smiles = tokenizer.decode(tokens['input_ids'], skip_special_tokens=True)
print(f"Original: {sample_smiles}")
print(f"Decoded: {decoded_smiles}")
assert sample_smiles == decoded_smiles
Output:
Original: CN(CCCNc1nc(Nc2ccc([*:1])cc2)ncc1Br)C(=O)C1CCC1
Decoded: CN(CCCNc1nc(Nc2ccc(*1)cc2)ncc1B)C(=O)C1CCC1
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
Cell In[154], line 10
8 print(f"Original: {sample_smiles}")
9 print(f"Decoded: {decoded_smiles}")
---> 10 assert sample_smiles == decoded_smiles
AssertionError:
Yes bro this not working