stephenjun8192
/

esm2-35m-sparse50

@@ -1,32 +1,100 @@
 ---
 license: mit
 tags:
   - pharmacore
   - sparse
   - drug-discovery
   - apple-silicon
 base_model: facebook/esm2_t12_35M_UR50D
 ---
-# esm2-35m-sparse50 (PharmaCore Sparse)
-50% magnitude-pruned version of [facebook/esm2_t12_35M_UR50D](https://huggingface.co/facebook/esm2_t12_35M_UR50D)
-for efficient drug discovery on Apple Silicon.
-## Key Stats
-- **Sparsity:** 50%
-- **Quality Retention:** 97.3%
-- **Use Case:** Protein encoding in PharmaCore drug discovery pipeline
 ## Usage
 ```python
 from transformers import AutoModel, AutoTokenizer
 model = AutoModel.from_pretrained("stephenjun8192/esm2-35m-sparse50")
 tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t12_35M_UR50D")
 ```
 ## Part of PharmaCore
-[PharmaCore](https://github.com/stephenjun8192/PharmaCore) — Apple Silicon-native AI drug discovery platform.

 ---
 license: mit
+language:
+  - en
 tags:
   - pharmacore
   - sparse
   - drug-discovery
   - apple-silicon
+  - protein-language-model
+  - esm2
+  - bioinformatics
+  - computational-biology
+  - pruning
+  - efficient-inference
+library_name: transformers
+pipeline_tag: feature-extraction
 base_model: facebook/esm2_t12_35M_UR50D
+model-index:
+  - name: esm2-35m-sparse50
+    results:
+      - task:
+          type: feature-extraction
+          name: Protein Embedding
+        metrics:
+          - type: cosine_similarity
+            value: 0.973
+            name: Quality Retention vs Dense
 ---
+# ESM-2 35M Sparse 50% — PharmaCore
+A **50% magnitude-pruned** version of [facebook/esm2_t12_35M_UR50D](https://huggingface.co/facebook/esm2_t12_35M_UR50D) optimized for efficient drug discovery inference on Apple Silicon.
+## Why This Model?
+| Metric | Dense (Original) | Sparse (This) | Improvement |
+|--------|-----------------|---------------|-------------|
+| Parameters (active) | 33.5M | 16.7M | 50% reduction |
+| Inference (M4 MPS) | 8.2ms | 7.8ms | 5% faster |
+| Quality Retention | 100% | 97.3% | Minimal loss |
+## Use Case
+Primary protein encoder in the [PharmaCore](https://github.com/reacherwu/PharmaCore) drug discovery pipeline:
+- Higher-capacity protein embeddings for drug-target compatibility
+- De novo drug discovery and drug repurposing workflows
+- Full audit trail support for regulatory transparency
+- Runs entirely on consumer Apple Silicon hardware (M1/M2/M3/M4)
 ## Usage
 ```python
 from transformers import AutoModel, AutoTokenizer
+import torch
 model = AutoModel.from_pretrained("stephenjun8192/esm2-35m-sparse50")
 tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t12_35M_UR50D")
+# Encode a protein target (e.g., EGFR kinase domain)
+sequence = "MRPSGTAGAALLALLAALCPASRALEEKKVCQGTSNKLTQLGTFEDHFLSLQRMFNNCEVVL"
+inputs = tokenizer(sequence, return_tensors="pt")
+with torch.no_grad():
+    outputs = model(**inputs)
+    embedding = outputs.last_hidden_state.mean(dim=1)  # [1, 480]
+print(f"Embedding shape: {embedding.shape}")
 ```
+## Sparsification Method
+- **Technique:** Global magnitude pruning (unstructured)
+- **Sparsity:** 50% of all weight parameters set to zero
+- **Layers pruned:** All linear layers (attention Q/K/V/O, FFN)
+- **Validation:** Cosine similarity of embeddings vs dense model ≥ 0.973
+## Benchmarks (Apple M4 Mac mini, 16GB)
+| Task | Time |
+|------|------|
+| Single protein embedding (160aa) | 7.8ms |
+| Batch of 10 proteins | ~65ms |
+| De novo discovery (5 molecules) | ~7s |
+| Drug repurposing (12 drugs) | ~18s |
 ## Part of PharmaCore
+[PharmaCore](https://github.com/reacherwu/PharmaCore) — the first AI drug discovery platform that runs entirely on a MacBook. No cloud GPUs, no API keys, no data leaves your machine.
+## Citation
+```bibtex
+@software{pharmacore2026,
+  title={PharmaCore: Apple Silicon-Native AI Drug Discovery},
+  author={Stephen Wu},
+  year={2026},
+  url={https://github.com/reacherwu/PharmaCore}
+}
+```