stephenjun8192 commited on
Commit
815e7ae
·
verified ·
1 Parent(s): d797a6c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +76 -8
README.md CHANGED
@@ -1,32 +1,100 @@
1
  ---
2
  license: mit
 
 
3
  tags:
4
  - pharmacore
5
  - sparse
6
  - drug-discovery
7
  - apple-silicon
 
 
 
 
 
 
 
 
8
  base_model: facebook/esm2_t12_35M_UR50D
 
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
- # esm2-35m-sparse50 (PharmaCore Sparse)
12
 
13
- 50% magnitude-pruned version of [facebook/esm2_t12_35M_UR50D](https://huggingface.co/facebook/esm2_t12_35M_UR50D)
14
- for efficient drug discovery on Apple Silicon.
15
 
16
- ## Key Stats
17
- - **Sparsity:** 50%
18
- - **Quality Retention:** 97.3%
19
- - **Use Case:** Protein encoding in PharmaCore drug discovery pipeline
 
 
 
 
 
 
 
 
 
 
 
20
 
21
  ## Usage
22
 
23
  ```python
24
  from transformers import AutoModel, AutoTokenizer
 
25
 
26
  model = AutoModel.from_pretrained("stephenjun8192/esm2-35m-sparse50")
27
  tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t12_35M_UR50D")
 
 
 
 
 
 
 
 
 
 
28
  ```
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  ## Part of PharmaCore
31
 
32
- [PharmaCore](https://github.com/stephenjun8192/PharmaCore) — Apple Silicon-native AI drug discovery platform.
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: mit
3
+ language:
4
+ - en
5
  tags:
6
  - pharmacore
7
  - sparse
8
  - drug-discovery
9
  - apple-silicon
10
+ - protein-language-model
11
+ - esm2
12
+ - bioinformatics
13
+ - computational-biology
14
+ - pruning
15
+ - efficient-inference
16
+ library_name: transformers
17
+ pipeline_tag: feature-extraction
18
  base_model: facebook/esm2_t12_35M_UR50D
19
+ model-index:
20
+ - name: esm2-35m-sparse50
21
+ results:
22
+ - task:
23
+ type: feature-extraction
24
+ name: Protein Embedding
25
+ metrics:
26
+ - type: cosine_similarity
27
+ value: 0.973
28
+ name: Quality Retention vs Dense
29
  ---
30
 
31
+ # ESM-2 35M Sparse 50% — PharmaCore
32
 
33
+ A **50% magnitude-pruned** version of [facebook/esm2_t12_35M_UR50D](https://huggingface.co/facebook/esm2_t12_35M_UR50D) optimized for efficient drug discovery inference on Apple Silicon.
 
34
 
35
+ ## Why This Model?
36
+
37
+ | Metric | Dense (Original) | Sparse (This) | Improvement |
38
+ |--------|-----------------|---------------|-------------|
39
+ | Parameters (active) | 33.5M | 16.7M | 50% reduction |
40
+ | Inference (M4 MPS) | 8.2ms | 7.8ms | 5% faster |
41
+ | Quality Retention | 100% | 97.3% | Minimal loss |
42
+
43
+ ## Use Case
44
+
45
+ Primary protein encoder in the [PharmaCore](https://github.com/reacherwu/PharmaCore) drug discovery pipeline:
46
+ - Higher-capacity protein embeddings for drug-target compatibility
47
+ - De novo drug discovery and drug repurposing workflows
48
+ - Full audit trail support for regulatory transparency
49
+ - Runs entirely on consumer Apple Silicon hardware (M1/M2/M3/M4)
50
 
51
  ## Usage
52
 
53
  ```python
54
  from transformers import AutoModel, AutoTokenizer
55
+ import torch
56
 
57
  model = AutoModel.from_pretrained("stephenjun8192/esm2-35m-sparse50")
58
  tokenizer = AutoTokenizer.from_pretrained("facebook/esm2_t12_35M_UR50D")
59
+
60
+ # Encode a protein target (e.g., EGFR kinase domain)
61
+ sequence = "MRPSGTAGAALLALLAALCPASRALEEKKVCQGTSNKLTQLGTFEDHFLSLQRMFNNCEVVL"
62
+ inputs = tokenizer(sequence, return_tensors="pt")
63
+
64
+ with torch.no_grad():
65
+ outputs = model(**inputs)
66
+ embedding = outputs.last_hidden_state.mean(dim=1) # [1, 480]
67
+
68
+ print(f"Embedding shape: {embedding.shape}")
69
  ```
70
 
71
+ ## Sparsification Method
72
+
73
+ - **Technique:** Global magnitude pruning (unstructured)
74
+ - **Sparsity:** 50% of all weight parameters set to zero
75
+ - **Layers pruned:** All linear layers (attention Q/K/V/O, FFN)
76
+ - **Validation:** Cosine similarity of embeddings vs dense model ≥ 0.973
77
+
78
+ ## Benchmarks (Apple M4 Mac mini, 16GB)
79
+
80
+ | Task | Time |
81
+ |------|------|
82
+ | Single protein embedding (160aa) | 7.8ms |
83
+ | Batch of 10 proteins | ~65ms |
84
+ | De novo discovery (5 molecules) | ~7s |
85
+ | Drug repurposing (12 drugs) | ~18s |
86
+
87
  ## Part of PharmaCore
88
 
89
+ [PharmaCore](https://github.com/reacherwu/PharmaCore) — the first AI drug discovery platform that runs entirely on a MacBook. No cloud GPUs, no API keys, no data leaves your machine.
90
+
91
+ ## Citation
92
+
93
+ ```bibtex
94
+ @software{pharmacore2026,
95
+ title={PharmaCore: Apple Silicon-Native AI Drug Discovery},
96
+ author={Stephen Wu},
97
+ year={2026},
98
+ url={https://github.com/reacherwu/PharmaCore}
99
+ }
100
+ ```