File size: 1,676 Bytes
83a3934
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
---
license: mit
tags:
  - chemistry
  - gpt2
  - representation-consistency
---

# Consistency

**Architecture:** GPT-2 small  
**Task:** Forward reaction prediction (SMILES or IUPAC representation for the input)  
**Training data:** 80k mapped reactions  
**Checkpoint size:** 124M params

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# model trained on SMILES input, without KL divergence loss
tok = AutoTokenizer.from_pretrained("bing-yan/consistency", subfolder="nokl-smiles")
model = AutoModelForCausalLM.from_pretrained("bing-yan/consistency", subfolder="nokl-smiles")

# model trained on IUPAC input, without KL divergence loss
tok = AutoTokenizer.from_pretrained("bing-yan/consistency", subfolder="nokl-iupac")
model = AutoModelForCausalLM.from_pretrained("bing-yan/consistency", subfolder="nokl-iupac")

# model trained on SMILES input, with KL divergence loss
tok = AutoTokenizer.from_pretrained("bing-yan/consistency", subfolder="kl-smiles")
model = AutoModelForCausalLM.from_pretrained("bing-yan/consistency", subfolder="nokl-smiles")

# model trained on IUPAC input, with KL divergence loss
tok = AutoTokenizer.from_pretrained("bing-yan/consistency", subfolder="kl-iupac")
model = AutoModelForCausalLM.from_pretrained("bing-yan/consistency", subfolder="nokl-iupac")

## Citations

- GitHub: [github.com/bingyan4science/consistency](https://github.com/bingyan4science/consistency)
- Dataset (Zenodo): [https://doi.org/10.5281/zenodo.14430369](https://doi.org/10.5281/zenodo.14430369)
- Paper: [Inconsistency of LLMs in Molecular Representations](https://chemrxiv.org/engage/chemrxiv/article-details/675b9de27be152b1d0ced2b5)