Text Generation
Transformers
PyTorch
bart
mass-spectrometry
GC-EI-MS
Transformer
molecular-structure-reconstruction
compound-identification
Instructions to use LMHHHHHH/SpecTUS_pretrained_only with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LMHHHHHH/SpecTUS_pretrained_only with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="LMHHHHHH/SpecTUS_pretrained_only")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("LMHHHHHH/SpecTUS_pretrained_only") model = AutoModelForSeq2SeqLM.from_pretrained("LMHHHHHH/SpecTUS_pretrained_only") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use LMHHHHHH/SpecTUS_pretrained_only with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LMHHHHHH/SpecTUS_pretrained_only" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LMHHHHHH/SpecTUS_pretrained_only", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/LMHHHHHH/SpecTUS_pretrained_only
- SGLang
How to use LMHHHHHH/SpecTUS_pretrained_only with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LMHHHHHH/SpecTUS_pretrained_only" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LMHHHHHH/SpecTUS_pretrained_only", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LMHHHHHH/SpecTUS_pretrained_only" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LMHHHHHH/SpecTUS_pretrained_only", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use LMHHHHHH/SpecTUS_pretrained_only with Docker Model Runner:
docker model run hf.co/LMHHHHHH/SpecTUS_pretrained_only
| license: cc-by-4.0 | |
| datasets: | |
| - MS-ML/synth1_2x4.7M | |
| - MS-ML/synth2_2x4.8M | |
| metrics: | |
| - accuracy | |
| library_name: transformers | |
| tags: | |
| - mass-spectrometry | |
| - GC-EI-MS | |
| - Transformer | |
| - molecular-structure-reconstruction | |
| - compound-identification | |
| The SpecTUS model pretrained on synth1_2x4.7 and synth2_2x4.8M combined for 448k steps. | |
| The model is a Transformer-based neural network trained to elucidate molecular structures from GC-EI-MS spectra. | |
| The model was pretrained on a large dataset of 17.2M synthetic training spectra generated from two identical sets of 8.6M | |
| compounds using the [NEIMS] and [RASSP] models. | |
| We mainly aimed to give the model an understanding of the chemical space of small molecules. The training was | |
| conducted with a batch size of 128 for 448,000 steps, allowing the model to process each of the 17.2 million spectra approximately three times. | |
| The entire pretraining process, including control evaluations every 16,000 steps, took 58 hours on a single Nvidia H100 GPU. | |
| During pretraining, the percentage of correctly reconstructed structures increased steadily but it remained relatively low at the | |
| end of the stage: 38% for RASSP-generated spectra, 29% for NEIMS-generated spectra, and 3% for NIST spectra. However, 96% of | |
| the generated SMILES strings (RASSP, NEIMS) were valid canonical molecules, with 91% (RASSP), 78% (NEIMS), and 14% (NIST) having | |
| correct molecular formulas, though possibly incorrect structures. These results suggest that during the pretraining phase, the model | |
| successfully learned molecular structure rules and the relationship between atomic weight and m/z values, forming a good foundation | |
| for subsequent finetuning. | |
| We suggest to finetune the model further on experimental data (NIST, Wiley) to reach the performance reported in our [preprint]. Though we can not | |
| make the final model available, since it was finetuned on a proprietary dataset (NIST). If youhave purchased the NIST GC-EI-MS license, you can | |
| either fine-tune the model yourself using the code in [our GitHub repository] or contact us with a proof of the license and we will share the final | |
| model with you. The code we used for the data processing, finetuning, evaluation, model comparison and more can also be found in [our GitHub repository]. | |
| Our [preprint] provides more information about the task background, the final finetuned model, and the experiments. | |
| How to cite: | |
| ```text | |
| @misc{hájek2025spectusspectraltranslatorunknown, | |
| title={SpecTUS: Spectral Translator for Unknown Structures annotation from EI-MS spectra}, | |
| author={Adam Hájek and Helge Hecht and Elliott J. Price and Aleš Křenek}, | |
| year={2025}, | |
| eprint={2502.05114}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.LG}, | |
| url={https://arxiv.org/abs/2502.05114}, | |
| } | |
| ``` | |
| [NEIMS]: https://github.com/brain-research/deep-molecular-massspec | |
| [RASSP]: https://github.com/thejonaslab/rassp-public | |
| [our GitHub repository]: https://github.com/hejjack/SpecTUS/ | |
| [preprint]: https://arxiv.org/abs/2502.05114 |