Instructions to use togethercomputer/evo-1-131k-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use togethercomputer/evo-1-131k-base with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="togethercomputer/evo-1-131k-base", trust_remote_code=True)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("togethercomputer/evo-1-131k-base", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use togethercomputer/evo-1-131k-base with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "togethercomputer/evo-1-131k-base" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "togethercomputer/evo-1-131k-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/togethercomputer/evo-1-131k-base
- SGLang
How to use togethercomputer/evo-1-131k-base with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "togethercomputer/evo-1-131k-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "togethercomputer/evo-1-131k-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "togethercomputer/evo-1-131k-base" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "togethercomputer/evo-1-131k-base", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use togethercomputer/evo-1-131k-base with Docker Model Runner:
docker model run hf.co/togethercomputer/evo-1-131k-base
Configuration Parsing Warning:In UNKNOWN_FILENAME: "auto_map.AutoTokenizer" must be a string
Evo-1 (Phase 2)
News
We identified and fixed an issue related to a wrong permutation of some projections, which affects generation quality. To use the new model revision, please load as follows:
config = AutoConfig.from_pretrained(model_name, trust_remote_code=True, revision="1.1_fix")
model = AutoModelForCausalLM.from_pretrained(
model_name,
config=config,
trust_remote_code=True,
revision="1.1_fix"
)
About
Evo is a biological foundation model capable of long-context modeling and design.
Evo uses the StripedHyena architecture to enable modeling of sequences at a single-nucleotide, byte-level resolution with near-linear scaling of compute and memory relative to context length. Evo has 7 billion parameters and is trained on OpenGenome, a prokaryotic whole-genome dataset containing ~300 billion tokens.
We describe Evo in the paper βSequence modeling and design from molecular to genome scale with Evoβ.
As part of our commitment to open science, we release weights of 15 intermediate pretraining checkpoints for phase 1 and phase 2 of pretraining. The checkpoints are available as branches of the corresponding HuggingFace repository.
Evo-1 (Phase 2) is our longer context model in the Evo family, trained at a context length of 131k and tested on generation of sequences of length >650k
We provide the following model checkpoints:
| Checkpoint Name | Description |
|---|---|
evo-1-8k-base |
A model pretrained with 8,192 context. We use this model as the base model for molecular-scale finetuning tasks. |
evo-1-131k-base |
A model pretrained with 131,072 context using evo-1-8k-base as the base model. We use this model to reason about and generate sequences at the genome scale. |
evo-1-8k-crispr |
A model finetuned using evo-1-8k-base as the base model to generate CRISPR-Cas systems. |
evo-1-8k-transposon |
A model finetuned using evo-1-8k-base as the base model to generate IS200/IS605 transposons. |
Model Architecture
StripedHyena is a deep signal processing, hybrid architecture composed of multi-head attention and gated convolutions arranged in Hyena blocks, improving over decoder-only Transformers.
StripedHyena is designed to leverage the specialization of each of its layer classes, with Hyena layers implementing the bulk of the computation required for sequence processing and attention layers supplementing the ability to perform targeted pattern recall.
Some highlights of the architecture:
- Efficient autoregressive generation via a recurrent mode (>500k generation with a single 80GB GPU)
- Significantly faster training and finetuning at long context (>3x at 131k)
- Improved scaling laws over state-of-the-art architectures (e.g., Transformer++) on both natural language and biological sequences.
- Robust to training beyond the compute-optimal frontier e.g., training way beyond Chinchilla-optimal token amounts (see preprint for details -- more details to come)
How to use Evo
Example usage is provided in the standalone repo.
Parametrization for Inference and Finetuning
One of the advantages of deep signal processing models is their flexibility. Different parametrizations of convolutions can be used depending on the memory, expressivity and causality requirements of pretraining, finetuning or inference workloads.
The main classes are:
- Modal canonical: unconstrained poles (reference, reference), or constrained poles (reference, reference).
- Companion canonical / rational: TBA.
- Hypernetworks: hypernetwork (reference), modulated hypernetwork (reference).
- Explicit: modulated explicit (reference).
StripedHyena is a mixed precision model. Make sure to keep your poles and residues in float32 precision, especially for longer prompts or training.
Disclaimer
To use StripedHyena, you will need to install custom kernels. Please follow the instructions from the standalone repository.
Cite
@article{nguyen2024sequence,
author = {Eric Nguyen and Michael Poli and Matthew G. Durrant and Brian Kang and Dhruva Katrekar and David B. Li and Liam J. Bartie and Armin W. Thomas and Samuel H. King and Garyk Brixi and Jeremy Sullivan and Madelena Y. Ng and Ashley Lewis and Aaron Lou and Stefano Ermon and Stephen A. Baccus and Tina Hernandez-Boussard and Christopher RΓ© and Patrick D. Hsu and Brian L. Hie },
title = {Sequence modeling and design from molecular to genome scale with Evo},
journal = {Science},
volume = {386},
number = {6723},
pages = {eado9336},
year = {2024},
doi = {10.1126/science.ado9336},
URL = {https://www.science.org/doi/abs/10.1126/science.ado9336},
- Downloads last month
- 4,802
docker model run hf.co/togethercomputer/evo-1-131k-base