ESM-C 300M โ€” GGUF (esmc.cpp)

GGUF conversions of ESM Cambrian (ESM-C) 300M, an encoder-only protein language model, for fast, low-memory per-residue and per-sequence embeddings on CPU and Apple Metal โ€” with no Python or PyTorch needed at inference time.

These files use a custom GGUF architecture (general.architecture = "esmc") and are not loadable by stock llama.cpp / llama-cli. Use the esmc.cpp runtime (the esmc-embed tool) shown below.

Which file should I download?

File Size (MiB) sha256 (first 16) When to use
esmc-300m-Q4_K_M.gguf 237.5 96c08911822906dc Smallest with good quality; best 4-bit choice.
esmc-300m-Q4_K_S.gguf 228.1 02328ea3555903ef Smallest footprint; lowest peak RAM.
esmc-300m-Q8_0.gguf 336.9 d7a57a5ab21c172b Recommended default โ€” near-F16 quality at ~half the size.
esmc-300m-f16.gguf 633.5 7c37c24e156920bd Highest fidelity; numerical reference.
esmc-300m-f32.gguf 1266.4 7e3e319c9bd00abb Full precision; mainly the quantization source (largest).

If unsure, start with esmc-300m-Q8_0.gguf (near-identical to PyTorch at ~half the size). Use Q4_K_M for the smallest deployment with good quality, or F16 when you want the closest possible match to the reference.

Quick start

1. Build the esmc.cpp runtime

git clone --recursive https://github.com/AnanyaP-WDW/esmc.cpp
cd esmc.cpp
cmake -B build -DCMAKE_BUILD_TYPE=Release && cmake --build build -j8

2. Download a model

pip install -U huggingface_hub
huggingface-cli download AnanyaPathak/esmc-300m-gguf esmc-300m-Q8_0.gguf --local-dir ./models

3. Embed a protein sequence

# Mean-pooled sequence embedding -> one vector per sequence ([n_embd])
./build/esmc-embed -m ./models/esmc-300m-Q8_0.gguf \
    -s "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGY" \
    --pool mean --output embedding.npy

# Per-residue embeddings -> matrix ([n_tokens, n_embd])
./build/esmc-embed -m ./models/esmc-300m-Q8_0.gguf \
    -s "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGY" \
    --pool none --output residues.npy

# Force CPU (skip the Metal/GPU backend)
./build/esmc-embed -m ./models/esmc-300m-Q8_0.gguf -s "..." --pool mean --no-metal

Outputs are NumPy .npy arrays. Mean pooling strips the <cls>/<eos> tokens.

4. Load the embedding in Python

import numpy as np

emb = np.load("embedding.npy")   # mean pool: shape (960,)
res = np.load("residues.npy")    # per-residue: shape (n_tokens, 960)
print(emb.shape, res.shape)

Benchmarks (300M)

Measured on an Apple M1 (16 GB) against the official PyTorch ESM-C 300M. Full methodology and per-sequence data are in the esmc.cpp repository.

Numerical fidelity vs PyTorch (per-residue cosine, 100 Swiss-Prot sequences)

Precision Aggregate mean cosine Worst min cosine Max mean-pool L2 Pass rate
F16 0.99999 0.9997 0.0030 100/100
Q8_0 0.99971 0.9943 0.0164 100/100
Q4_K_M 0.99597 0.9401 0.0656 91/100
Q4_K_S 0.99523 0.9281 0.0709 75/100

F16 and Q8_0 clear per-sequence mean cosine > 0.999; Q4_K_M / Q4_K_S clear the aggregate > 0.995 (4-bit misses concentrate in very short sequences).

Throughput (seq/s, best esmc.cpp config vs PyTorch)

Bucket Tokens Best esmc.cpp seq/s PyTorch CPU PyTorch MPS vs CPU
short 47 metal/q4_k_s 14.54 10.31 29.29 1.41x
medium 235 metal/q4_k_m 5.62 4.56 10.11 1.23x
long 850 metal/q8_0 1.33 1.74 2.83 0.76x

Peak memory (long sequences, 16 GiB budget)

  • Lowest peak RAM: pytorch/pytorch_mps/f32 at 282 MiB (long sequences).
  • Highest peak RAM: esmc.cpp/cpu/f16 at 7426 MiB.
  • All 12/12 measured configurations fit within a 16 GiB machine.

Downstream variant-effect preservation (ProteinGym, 10 assays x 1000 variants)

Precision Assays Mean abs Spearman delta Max abs Spearman delta Metric rows pass
F16 10 0.0006 0.0014 50/50
Q8_0 10 0.0031 0.0092 45/50
Q4_K_M 10 0.0068 0.0231 38/50
Q4_K_S 10 0.0110 0.0258 32/50

Variants are scored by the cosine between mean-pooled mutant and wild-type embeddings; deltas are versus the PyTorch reference (preservation probe).

Model details

  • Architecture: encoder-only transformer; 30 layers, d_model 960, 15 heads (head dim 64), SwiGLU FFN (width 2560), pre-LayerNorm, RoPE-NeoX (theta 10000), query/key LayerNorm, no biases, context length 2048.
  • Tokenizer: 33-token amino-acid alphabet; <cls> prepended and <eos> appended (direct character lookup, no subword splitting).
  • Provenance: converted from the upstream safetensors checkpoint to GGUF (fused QKV and SwiGLU projections split); quantized variants use ggml block quantization. Weight values are otherwise unchanged from the upstream release.

Verify downloads

shasum -a 256 models/*.gguf   # compare against the sha256 column above

Reproduce

The full replication guide (convert, quantize, validate, benchmark) is in the esmc.cpp README.

License

Built with ESM.

These GGUF files are Derivative Works of the ESM-C 300M Open Model and are distributed under the EvolutionaryScale Cambrian Open License Agreement (the permissive license that governs ESM-C 300M), subject to the Acceptable Use Policy. The ESMC 300M Model is licensed under the EvolutionaryScale Cambrian Open License Agreement.

Citation

If you use these models, please cite the ESM Cambrian work by EvolutionaryScale and link the esmc.cpp runtime.

Downloads last month
152
GGUF
Model size
0.3B params
Architecture
esmc
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AnanyaPathak/esmc-300m-gguf

Quantized
(1)
this model