Instructions to use MS-ML/SpecTUS_pretrained_only with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use MS-ML/SpecTUS_pretrained_only with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="MS-ML/SpecTUS_pretrained_only")

# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("MS-ML/SpecTUS_pretrained_only")
model = AutoModelForSeq2SeqLM.from_pretrained("MS-ML/SpecTUS_pretrained_only")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use MS-ML/SpecTUS_pretrained_only with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "MS-ML/SpecTUS_pretrained_only"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MS-ML/SpecTUS_pretrained_only",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/MS-ML/SpecTUS_pretrained_only

SGLang

How to use MS-ML/SpecTUS_pretrained_only with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "MS-ML/SpecTUS_pretrained_only" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MS-ML/SpecTUS_pretrained_only",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "MS-ML/SpecTUS_pretrained_only" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "MS-ML/SpecTUS_pretrained_only",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use MS-ML/SpecTUS_pretrained_only with Docker Model Runner:
```
docker model run hf.co/MS-ML/SpecTUS_pretrained_only
```

hajekad commited on Dec 2, 2024

Commit

aab53be

verified ·

1 Parent(s): 1de3670

Update README.md

Browse files

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -23,8 +23,8 @@ We mainly aimed to give the model an understanding of the chemical space of smal
 conducted with a batch size of 128 for 224,000 steps, allowing the model to process each of the 9.4 million spectra approximately three times.
 The entire pretraining process, including control evaluations every 16,000 steps, took 33 hours on a single Nvidia H100 GPU.
-During pretraining, the percentage of correctly reconstructed validation spectra steadily increased, but remained relatively low at the end
-of the stage: 27\% for RASSP-generated spectra, 13\% for NEIMS-generated spectra, and 2\% for NIST spectra. However, 94\% of the generated SMILES
 strings (RASSP, NEIMS) were valid canonical molecules, with 83\% (RASSP), 65\% (NEIMS), and 11\% (NIST) having correct molecular formulas.
 These results suggest that during the pretraining phase, the model successfully learned molecular structure rules and the relationship between atomic
 weight and m/z values, forming a good foundation for subsequent finetuning.

 conducted with a batch size of 128 for 224,000 steps, allowing the model to process each of the 9.4 million spectra approximately three times.
 The entire pretraining process, including control evaluations every 16,000 steps, took 33 hours on a single Nvidia H100 GPU.
+During pretraining, the percentage of correctly reconstructed validation spectra steadily increased, but remained relatively low at the end: 27\%
+for RASSP-generated spectra, 13\% for NEIMS-generated spectra, and 2\% for NIST spectra. However, 94\% of the generated SMILES
 strings (RASSP, NEIMS) were valid canonical molecules, with 83\% (RASSP), 65\% (NEIMS), and 11\% (NIST) having correct molecular formulas.
 These results suggest that during the pretraining phase, the model successfully learned molecular structure rules and the relationship between atomic
 weight and m/z values, forming a good foundation for subsequent finetuning.