Feature Extraction
Transformers
Safetensors
English
esm
pharmacore
sparse
drug-discovery
apple-silicon
protein-language-model
esm2
bioinformatics
computational-biology
pruning
efficient-inference
Eval Results (legacy)
Instructions to use stephenjun8192/esm2-8m-sparse50 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use stephenjun8192/esm2-8m-sparse50 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="stephenjun8192/esm2-8m-sparse50")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("stephenjun8192/esm2-8m-sparse50") model = AutoModel.from_pretrained("stephenjun8192/esm2-8m-sparse50") - Notebooks
- Google Colab
- Kaggle
metadata
license: mit
language:
- en
tags:
- pharmacore
- sparse
- drug-discovery
- apple-silicon
- protein-language-model
- esm2
- bioinformatics
- computational-biology
- pruning
- efficient-inference
library_name: transformers
pipeline_tag: feature-extraction
base_model: facebook/esm2_t6_8M_UR50D
model-index:
- name: esm2-8m-sparse50
results:
- task:
type: feature-extraction
name: Protein Embedding
metrics:
- type: cosine_similarity
value: 0.975
name: Quality Retention vs Dense
ESM-2 8M Sparse 50% — PharmaCore
A 50% magnitude-pruned version of facebook/esm2_t6_8M_UR50D optimized for efficient drug discovery inference on Apple Silicon.
Why This Model?
| Metric | Dense (Original) | Sparse (This) | Improvement |
|---|---|---|---|
| Parameters (active) | 7.8M | 3.9M | 50% reduction |
| Inference (M4 MPS) | ~10ms | ~8ms | 20% faster |
| Quality Retention | 100% | 97.5% | Minimal loss |
| Memory | 30MB | 30MB | Same (unstructured) |
Use Case
Protein target encoding in the PharmaCore drug discovery pipeline:
- Encode protein sequences into embeddings for drug-target compatibility scoring
- Fast screening of drug candidates against protein targets
- Runs entirely on consumer Apple Silicon hardware (M1/M2/M3/M4)
Usage
from transformers import AutoModel, AutoTokenizer
import torch
model = AutoModel.from_pretrained("stephenjun8192/esm2-8m-sparse50")
tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t6_8M_UR50D")
# Encode a protein sequence
sequence = "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVL"
inputs = tokenizer(sequence, return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
embedding = outputs.last_hidden_state.mean(dim=1) # [1, 320]
print(f"Embedding shape: {embedding.shape}")
Sparsification Method
- Technique: Global magnitude pruning (unstructured)
- Sparsity: 50% of all weight parameters set to zero
- Layers pruned: All linear layers (attention Q/K/V/O, FFN)
- Validation: Cosine similarity of embeddings vs dense model ≥ 0.975
Part of PharmaCore
PharmaCore — the first AI drug discovery platform that runs entirely on a MacBook. No cloud GPUs, no API keys, no data leaves your machine.
Citation
@software{pharmacore2026,
title={PharmaCore: Apple Silicon-Native AI Drug Discovery},
author={Stephen Wu},
year={2026},
url={https://github.com/reacherwu/PharmaCore}
}