FLAIR Lab

AMPLIFY 120M

FLAIR Lab · Website · GitHub · Paper

A 120M-parameter protein language model pre-trained on UR100P using masked language modeling. Trained for 1M steps (~2T tokens) with context length 2,048.

This model was trained using the AMPLIFY training codebase. The original models and code were released under chandar-lab/AMPLIFY. See also flair-bio/AMPLIFY_350M.

Property Value
Architecture BERT-style encoder (RoPE, SwiGLU, RMSNorm)
Parameters 120M
Training tokens ~2T
Vocabulary size 32 (amino acid alphabet + special tokens)
Context length 2,048
Training steps 1,000,000
License Apache 2.0

Quick Start

from transformers import AutoModelForMaskedLM, AutoTokenizer

model = AutoModelForMaskedLM.from_pretrained("flair-bio/amplify-120m", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("flair-bio/amplify-120m", trust_remote_code=True)
model.eval()

How to Use

Extract Embeddings

import torch
from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("flair-bio/amplify-120m", trust_remote_code=True)
model = AutoModel.from_pretrained("flair-bio/amplify-120m", trust_remote_code=True)

sequences = ["MKTAYIAK", "MVLSPADKTNVK"]
inputs = tokenizer(sequences, return_tensors="pt", padding=True, truncation=True, max_length=2048)

with torch.no_grad():
    outputs = model(**inputs)

embeddings = outputs.last_hidden_state  # [batch, seq_len, 640]

Masked Language Modeling

from transformers import AutoTokenizer, AutoModelForMaskedLM
import torch

tokenizer = AutoTokenizer.from_pretrained("flair-bio/amplify-120m", trust_remote_code=True)
model = AutoModelForMaskedLM.from_pretrained("flair-bio/amplify-120m", trust_remote_code=True)

sequence = "MKTAY<mask>AKQRQISFVK"
inputs = tokenizer(sequence, return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

mask_idx = (inputs["input_ids"] == tokenizer.mask_token_id).nonzero(as_tuple=True)[1]
predicted = tokenizer.decode(logits[0, mask_idx].argmax(dim=-1))
print(predicted)

Model Description

Architecture

AMPLIFY 120M is a BERT-style transformer encoder with 24 layers, 640-dimensional hidden states, and 10 attention heads. It uses rotary positional embeddings (RoPE), SwiGLU feed-forward blocks, and RMSNorm. Tokenization is at the amino acid level with a vocabulary of 32 tokens.

Config Layers Hidden dim Heads FFN dim Context
AMPLIFY 120M 24 640 10 1,712 512 → 2,048

Intended Use

This model is intended for extracting per-residue or per-sequence representations for downstream tasks, zero-shot variant effect prediction via pseudo-log-likelihood scoring, and fine-tuning on protein fitness, stability, binding, or functional annotation tasks.


Training

Data

Pre-trained on UR100P (chandar-lab/UR100P), a deduplicated union of UniRef100, OAS, and SCOPe.

Training Procedure

Hyperparameter Value
Hardware 8× H100 80GB
Optimizer AdamW
Learning rate 1e-3 (peak)
LR schedule Linear warmup + cosine decay
Batch size (tokens) ~2M per step
Masking rate 15%
Training objective Masked language modeling
Precision BF16
Framework PyTorch + HuggingFace Transformers

Training logs are available on Weights & Biases.


Citation

If you use this model in your work, please cite:

@article{Fournier2024.09.23.614603,
  title        = {Protein Language Models: Is Scaling Necessary?},
  author       = {Fournier, Quentin and Vernon, Robert M. and van der Sloot, Almer and Schulz, Benjamin and Chandar, Sarath and Langmead, Christopher James},
  year         = {2024},
  journal      = {bioRxiv},
  publisher    = {Cold Spring Harbor Laboratory},
  doi          = {10.1101/2024.09.23.614603},
  url          = {https://www.biorxiv.org/content/early/2024/09/23/2024.09.23.614603}
}

Downloads last month
418
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for flair-bio/amplify-120m

Unable to build the model tree, the base model loops to the model itself. Learn more.

Dataset used to train flair-bio/amplify-120m