Instructions to use lamm-mit/ProteinForceGPT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use lamm-mit/ProteinForceGPT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="lamm-mit/ProteinForceGPT")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("lamm-mit/ProteinForceGPT") model = AutoModelForCausalLM.from_pretrained("lamm-mit/ProteinForceGPT") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use lamm-mit/ProteinForceGPT with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "lamm-mit/ProteinForceGPT" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lamm-mit/ProteinForceGPT", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/lamm-mit/ProteinForceGPT
- SGLang
How to use lamm-mit/ProteinForceGPT with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "lamm-mit/ProteinForceGPT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lamm-mit/ProteinForceGPT", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "lamm-mit/ProteinForceGPT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "lamm-mit/ProteinForceGPT", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use lamm-mit/ProteinForceGPT with Docker Model Runner:
docker model run hf.co/lamm-mit/ProteinForceGPT
ProteinForceGPT: Generative strategies for modeling, design and analysis of protein mechanics
Basic information
This protein language model is a 454M parameter autoregressive transformer model in GPT-style, trained to analyze and predict the mechanical properties of a large number of protein sequences. The model has both forward and inverse capabilities. For instance, using generate tasks, the model can design novel proteins that meet one or more mechanical constraints.
This protein language foundation model was based on the NeoGPT-X architecture and uses rotary positional embeddings (RoPE). It has 16 attention heads, 36 hidden layers and a hidden size of 1024, an intermediate size of 4096 and uses a GeLU activation function.
The pretraining task is defined as "Sequence<...>" where ... is an amino acid sequence.
Pretraining dataset: https://huggingface.co/datasets/lamm-mit/GPTProteinPretrained Pretrained model: https://huggingface.co/lamm-mit/GPTProteinPretrained
In this fine-tuned model, mechanics-related forward and inverse tasks are:
CalculateForce<GEECDCGSPSNP..>,
CalculateEnergy<GEECDCGSPSNP..>
CalculateForceEnergy<GEECDCGSPSNP...>
CalculateForceHistory<GEECDCGSPSNP...>
GenerateForce<0.262>
GenerateForce<0.220>
GenerateForceEnergy<0.262,0.220>
GenerateForceHistory<0.004,0.034,0.125,0.142,0.159,0.102,0.079,0.073,0.131,0.105,0.071,0.058,0.072,0.060,0.049,0.114,0.122,0.108,0.173,0.192,0.208,0.153,0.212,0.222,0.244>
Load model
You can load the model using this code.
from transformers import AutoModelForCausalLM, AutoTokenizer
ForceGPT_model_name='lamm-mit/ProteinForceGPT'
tokenizer = AutoTokenizer.from_pretrained(ForceGPT_model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
model = AutoModelForCausalLM.from_pretrained(
ForceGPT_model_name,
trust_remote_code=True
).to(device)
model.config.use_cache = False
Inference
Sample inference using the "Sequence<...>" task, where here, the model will simply autocomplete the sequence starting with "AIIAA":
prompt = "Sequence<GEECDC"
generated = torch.tensor(tokenizer.encode(prompt, add_special_tokens = False)) .unsqueeze(0).to(device)
print(generated.shape, generated)
sample_outputs = model.generate(
inputs=generated,
eos_token_id =tokenizer.eos_token_id,
do_sample=True,
top_k=500,
max_length = 300,
top_p=0.9,
num_return_sequences=1,
temperature=1,
).to(device)
for i, sample_output in enumerate(sample_outputs):
print("{}: {}\n\n".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))
Sample inference using the "CalculateForce<...>" task, where here, the model will calculate the maximum unfolding force of a given sequence:
prompt = "'CalculateForce<GEECDCGSPSNPCCDAATCKLRPGAQCADGLCCDQCRFKKKRTICRIARGDFPDDRCTGQSADCPRWN>"
generated = torch.tensor(tokenizer.encode(prompt, add_special_tokens = False)) .unsqueeze(0).to(device)
sample_outputs = model.generate(
inputs=generated,
eos_token_id =tokenizer.eos_token_id,
do_sample=True,
top_k=500,
max_length = 300,
top_p=0.9,
num_return_sequences=3,
temperature=1,
).to(device)
for i, sample_output in enumerate(sample_outputs):
print("{}: {}\n\n".format(i, tokenizer.decode(sample_output, skip_special_tokens=True)))
Output:
0: CalculateForce<GEECDCGSPSNPCCDAATCKLRPGAQCADGLCCDQCRFKKKRTICRIARGDFPDDRCTGQSADCPRWN> [0.262]```
Citations
To cite this work:
@article{GhafarollahiBuehler_2024,
title = {ProtAgents: Protein discovery via large language model multi-agent collaborations combining physics and machine learning },
author = {A. Ghafarollahi, M.J. Buehler},
journal = {},
year = {2024},
volume = {},
pages = {},
url = {}
}
The dataset used to fine-tune the model is available at:
@article{GhafarollahiBuehler_2024,
title = {ForceGen: End-to-end de novo protein generation based on nonlinear mechanical unfolding responses using a protein language diffusion model},
author = {B. Ni, D.L. Kaplan, M.J. Buehler},
journal = {Science Advances},
year = {2024},
volume = {},
pages = {},
url = {}
}
- Downloads last month
- 185