Tokenizer trained on BindingDB SMILES encodings. Trained on 1008081 samples with one blank space after each character in the SMILES string