Instructions to use knowledgator/SMILES2IUPAC-canonical-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use knowledgator/SMILES2IUPAC-canonical-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="knowledgator/SMILES2IUPAC-canonical-base")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("knowledgator/SMILES2IUPAC-canonical-base") model = AutoModelForSeq2SeqLM.from_pretrained("knowledgator/SMILES2IUPAC-canonical-base") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use knowledgator/SMILES2IUPAC-canonical-base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "knowledgator/SMILES2IUPAC-canonical-base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "knowledgator/SMILES2IUPAC-canonical-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/knowledgator/SMILES2IUPAC-canonical-base
- SGLang
How to use knowledgator/SMILES2IUPAC-canonical-base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "knowledgator/SMILES2IUPAC-canonical-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "knowledgator/SMILES2IUPAC-canonical-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "knowledgator/SMILES2IUPAC-canonical-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "knowledgator/SMILES2IUPAC-canonical-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use knowledgator/SMILES2IUPAC-canonical-base with Docker Model Runner:
docker model run hf.co/knowledgator/SMILES2IUPAC-canonical-base
YAML Metadata Warning:The pipeline tag "text2text-generation" is not in the official list: text-classification, token-classification, table-question-answering, question-answering, zero-shot-classification, translation, summarization, feature-extraction, text-generation, fill-mask, sentence-similarity, text-to-speech, text-to-audio, automatic-speech-recognition, audio-to-audio, audio-classification, audio-text-to-text, voice-activity-detection, depth-estimation, image-classification, object-detection, image-segmentation, text-to-image, image-to-text, image-to-image, image-to-video, unconditional-image-generation, video-classification, reinforcement-learning, robotics, tabular-classification, tabular-regression, tabular-to-text, table-to-text, multiple-choice, text-ranking, text-retrieval, time-series-forecasting, text-to-video, image-text-to-text, image-text-to-image, image-text-to-video, visual-question-answering, document-question-answering, zero-shot-image-classification, graph-ml, mask-generation, zero-shot-object-detection, text-to-3d, image-to-3d, image-feature-extraction, video-text-to-text, keypoint-detection, visual-document-retrieval, any-to-any, video-to-video, other
SMILES2IUPAC-canonical-base
SMILES2IUPAC-canonical-base was designed to accurately translate SMILES chemical names to IUPAC standards.
Model Details
Model Description
SMILES2IUPAC-canonical-base is based on the MT5 model with optimizations in implementing different tokenizers for the encoder and decoder.
- Developed by: Knowladgator Engineering
- Model type: Encoder-Decoder with attention mechanism
- Language(s) (NLP): SMILES, IUPAC (English)
- License: Apache License 2.0
Model Sources
- Paper: coming soon
- Demo: ChemicalConverters
Quickstart
Firstly, install the library:
pip install chemical-converters
SMILES to IUPAC
! Preferred IUPAC style
To choose the preferred IUPAC style, place style tokens before your SMILES sequence.
| Style Token | Description |
|---|---|
<BASE> |
The most known name of the substance, sometimes is the mixture of traditional and systematic style |
<SYST> |
The totally systematic style without trivial names |
<TRAD> |
The style is based on trivial names of the parts of substances |
To perform simple translation, follow the example:
from chemicalconverters import NamesConverter
converter = NamesConverter(model_name="knowledgator/SMILES2IUPAC-canonical-base")
print(converter.smiles_to_iupac('CCO'))
print(converter.smiles_to_iupac(['<SYST>CCO', '<TRAD>CCO', '<BASE>CCO']))
['ethanol']
['ethanol', 'ethanol', 'ethanol']
Processing in batches:
from chemicalconverters import NamesConverter
converter = NamesConverter(model_name="knowledgator/SMILES2IUPAC-canonical-base")
print(converter.smiles_to_iupac(["<BASE>C=CC=C" for _ in range(10)], num_beams=1,
process_in_batch=True, batch_size=1000))
['buta-1,3-diene', 'buta-1,3-diene'...]
Validation SMILES to IUPAC translations
It's possible to validate the translations by reverse translation into IUPAC and calculating Tanimoto similarity of two molecules fingerprints.
from chemicalconverters import NamesConverter
converter = NamesConverter(model_name="knowledgator/SMILES2IUPAC-canonical-base")
print(converter.smiles_to_iupac('CCO', validate=True))
['ethanol'] 1.0
The larger is Tanimoto similarity, the larger is probability, that the prediction was correct.
You can also process validation manually:
from chemicalconverters import NamesConverter
validation_model = NamesConverter(model_name="knowledgator/IUPAC2SMILES-canonical-base")
print(NamesConverter.validate_iupac(input_sequence='CCO', predicted_sequence='CCO', validation_model=validation_model))
1.0
Bias, Risks, and Limitations
This model has limited accuracy in processing large molecules and currently, doesn't support isomeric and isotopic SMILES.
Training Procedure
The model was trained on 100M examples of SMILES-IUPAC pairs with lr=0.00001, batch_size=512 for 2 epochs.
Evaluation
| Model | Accuracy | BLEU-4 score | Size(MB) |
|---|---|---|---|
| SMILES2IUPAC-canonical-small | 75% | 0.93 | 23 |
| SMILES2IUPAC-canonical-base | 86.9% | 0.964 | 180 |
| STOUT V2.0* | 66.65% | 0.92 | 128 |
| STOUT V2.0 (according to our tests) | 0.89 | 128 | |
| *According to the original paper https://jcheminf.biomedcentral.com/articles/10.1186/s13321-021-00512-4 |
Citation
Coming soon.
Model Card Authors
Model Card Contact
- Downloads last month
- 2,292