Instructions to use knowledgator/SMILES2IUPAC-canonical-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use knowledgator/SMILES2IUPAC-canonical-base with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="knowledgator/SMILES2IUPAC-canonical-base")

# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("knowledgator/SMILES2IUPAC-canonical-base")
model = AutoModelForSeq2SeqLM.from_pretrained("knowledgator/SMILES2IUPAC-canonical-base")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use knowledgator/SMILES2IUPAC-canonical-base with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "knowledgator/SMILES2IUPAC-canonical-base"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "knowledgator/SMILES2IUPAC-canonical-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/knowledgator/SMILES2IUPAC-canonical-base

SGLang

How to use knowledgator/SMILES2IUPAC-canonical-base with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "knowledgator/SMILES2IUPAC-canonical-base" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "knowledgator/SMILES2IUPAC-canonical-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "knowledgator/SMILES2IUPAC-canonical-base" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "knowledgator/SMILES2IUPAC-canonical-base",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use knowledgator/SMILES2IUPAC-canonical-base with Docker Model Runner:
```
docker model run hf.co/knowledgator/SMILES2IUPAC-canonical-base
```

YAML Metadata Warning:The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other

SMILES2IUPAC-canonical-base

SMILES2IUPAC-canonical-base was designed to accurately translate SMILES chemical names to IUPAC standards.

Model Details

Model Description

SMILES2IUPAC-canonical-base is based on the MT5 model with optimizations in implementing different tokenizers for the encoder and decoder.

Developed by: Knowladgator Engineering
Model type: Encoder-Decoder with attention mechanism
Language(s) (NLP): SMILES, IUPAC (English)
License: Apache License 2.0

Model Sources

Paper: coming soon
Demo: ChemicalConverters

Quickstart

Firstly, install the library:

pip install chemical-converters

SMILES to IUPAC

! Preferred IUPAC style

To choose the preferred IUPAC style, place style tokens before your SMILES sequence.

Style Token	Description
`<BASE>`	The most known name of the substance, sometimes is the mixture of traditional and systematic style
`<SYST>`	The totally systematic style without trivial names
`<TRAD>`	The style is based on trivial names of the parts of substances

To perform simple translation, follow the example:

from chemicalconverters import NamesConverter

converter = NamesConverter(model_name="knowledgator/SMILES2IUPAC-canonical-base")
print(converter.smiles_to_iupac('CCO'))
print(converter.smiles_to_iupac(['<SYST>CCO', '<TRAD>CCO', '<BASE>CCO']))

['ethanol']
['ethanol', 'ethanol', 'ethanol']

Processing in batches:

from chemicalconverters import NamesConverter

converter = NamesConverter(model_name="knowledgator/SMILES2IUPAC-canonical-base")
print(converter.smiles_to_iupac(["<BASE>C=CC=C" for _ in range(10)], num_beams=1, 
                                process_in_batch=True, batch_size=1000))

['buta-1,3-diene', 'buta-1,3-diene'...]

Validation SMILES to IUPAC translations

It's possible to validate the translations by reverse translation into IUPAC and calculating Tanimoto similarity of two molecules fingerprints.

from chemicalconverters import NamesConverter

converter = NamesConverter(model_name="knowledgator/SMILES2IUPAC-canonical-base")
print(converter.smiles_to_iupac('CCO', validate=True))

['ethanol'] 1.0

The larger is Tanimoto similarity, the larger is probability, that the prediction was correct.

You can also process validation manually:

from chemicalconverters import NamesConverter

validation_model = NamesConverter(model_name="knowledgator/IUPAC2SMILES-canonical-base")
print(NamesConverter.validate_iupac(input_sequence='CCO', predicted_sequence='CCO', validation_model=validation_model))

1.0

Bias, Risks, and Limitations

This model has limited accuracy in processing large molecules and currently, doesn't support isomeric and isotopic SMILES.

Training Procedure

The model was trained on 100M examples of SMILES-IUPAC pairs with lr=0.00001, batch_size=512 for 2 epochs.

Evaluation

Model	Accuracy	BLEU-4 score	Size(MB)
SMILES2IUPAC-canonical-small	75%	0.93	23
SMILES2IUPAC-canonical-base	86.9%	0.964	180
STOUT V2.0*	66.65%	0.92	128
STOUT V2.0 (according to our tests)		0.89	128
*According to the original paper https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00512-4

Citation

Coming soon.

Model Card Authors

Mykhailo Shtopko

Model Card Contact

info@knowledgator.com

Downloads last month: 2,292

Spaces using knowledgator/SMILES2IUPAC-canonical-base 4

Collection including knowledgator/SMILES2IUPAC-canonical-base

Chemical Converters

Collection

Collection of models for converting chemical formats between each other. • 6 items • Updated Jan 29 • 2