| | --- |
| | license: mit |
| | tags: |
| | - chemistry |
| | - gpt2 |
| | - representation-consistency |
| | --- |
| | |
| | # Consistency |
| |
|
| | **Architecture:** GPT-2 small |
| | **Task:** Forward reaction prediction (SMILES or IUPAC representation for the input) |
| | **Training data:** 80k mapped reactions |
| | **Checkpoint size:** 124M params |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | # model trained on SMILES input, without KL divergence loss |
| | tok = AutoTokenizer.from_pretrained("bing-yan/consistency", subfolder="nokl-smiles") |
| | model = AutoModelForCausalLM.from_pretrained("bing-yan/consistency", subfolder="nokl-smiles") |
| | |
| | # model trained on IUPAC input, without KL divergence loss |
| | tok = AutoTokenizer.from_pretrained("bing-yan/consistency", subfolder="nokl-iupac") |
| | model = AutoModelForCausalLM.from_pretrained("bing-yan/consistency", subfolder="nokl-iupac") |
| | |
| | # model trained on SMILES input, with KL divergence loss |
| | tok = AutoTokenizer.from_pretrained("bing-yan/consistency", subfolder="kl-smiles") |
| | model = AutoModelForCausalLM.from_pretrained("bing-yan/consistency", subfolder="nokl-smiles") |
| | |
| | # model trained on IUPAC input, with KL divergence loss |
| | tok = AutoTokenizer.from_pretrained("bing-yan/consistency", subfolder="kl-iupac") |
| | model = AutoModelForCausalLM.from_pretrained("bing-yan/consistency", subfolder="nokl-iupac") |
| | |
| | ## Citations |
| | |
| | - GitHub: [github.com/bingyan4science/consistency](https://github.com/bingyan4science/consistency) |
| | - Dataset (Zenodo): [https://doi.org/10.5281/zenodo.14430369](https://doi.org/10.5281/zenodo.14430369) |
| | - Paper: [Inconsistency of LLMs in Molecular Representations](https://chemrxiv.org/engage/chemrxiv/article-details/675b9de27be152b1d0ced2b5) |
| | |
| | |
| | |