Update README.md
Browse files
README.md
CHANGED
|
@@ -52,6 +52,17 @@ mask = inputs['attention_mask']
|
|
| 52 |
embeddings = ((full_embeddings * mask.unsqueeze(-1)).sum(1) / mask.sum(-1).unsqueeze(-1))
|
| 53 |
```
|
| 54 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 55 |
## Model Performance
|
| 56 |
|
| 57 |
To test generation performance, 1m compounds were generated at various temperature values. Generated compounds were checked for uniqueness and structural validity.
|
|
|
|
| 52 |
embeddings = ((full_embeddings * mask.unsqueeze(-1)).sum(1) / mask.sum(-1).unsqueeze(-1))
|
| 53 |
```
|
| 54 |
|
| 55 |
+
### WARNING
|
| 56 |
+
|
| 57 |
+
This model was trained with `bos` and `eos` tokens around SMILES inputs. The `GPT2TokenizerFast` tokenizer DOES NOT ADD special tokens,
|
| 58 |
+
even when `add_special_tokens=True`. Huggingface says this is [intended behavior](https://github.com/huggingface/transformers/issues/3311#issuecomment-693719190).
|
| 59 |
+
|
| 60 |
+
It may be necessary to manually add these tokens
|
| 61 |
+
|
| 62 |
+
```python
|
| 63 |
+
inputs = collator(tokenizer([tokenizer.bos_token+i+tokenizer.eos_token for i in smiles]))
|
| 64 |
+
```
|
| 65 |
+
|
| 66 |
## Model Performance
|
| 67 |
|
| 68 |
To test generation performance, 1m compounds were generated at various temperature values. Generated compounds were checked for uniqueness and structural validity.
|