leexiaohua
/

subloc_small

Model card Files Files and versions

leexiaohua commited on Mar 6

Commit

8970aae

·

verified ·

1 Parent(s): 798ea70

Create README.md

Files changed (1) hide show

README.md +74 -0

README.md ADDED Viewed

	@@ -0,0 +1,74 @@

+---
+license: apache-2.0
+---
+A protein Subcellular localisation prediction model based on [ESM2-8M model] (https://www.science.org/doi/full/10.1126/science.ade2574) fine-tuning. Model deployment references Synthira's [fastESM] (https://huggingface.co/Synthyra) series.
+The dataset comes from the [DeepLoc project] (https://services.healthtech.dtu.dk/services/DeepLoc-2.1/).
+![evaluation_metrics](./ESM2_Subloc_Metrics.png)
+```
+from transformers import AutoTokenizer, AutoModel, AutoModelForSequenceClassification
+import torch
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model_id = "leexiaohua/subloc_small"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForSequenceClassification.from_pretrained(
+    "leexiaohua/subloc_small",
+    trust_remote_code=True
+)
+model.eval()
+```
+```
+def predict_sublocation(sequence, model, tokenizer, device):
+    inputs = tokenizer(sequence, return_tensors="pt", truncation=True, max_length=1024)
+    inputs = {k: v.to(device) for k, v in inputs.items()}
+    with torch.no_grad():
+        outputs = model(**inputs)
+        logits = outputs.logits if hasattr(outputs, "logits") else outputs
+        probs = torch.sigmoid(logits).cpu().numpy()[0]
+    id2label = model.config.id2label
+    results = {}
+    for i, prob in enumerate(probs):
+        if prob > 0.5:
+            label = id2label.get(i) or id2label.get(str(i))
+            if label:
+                results[label] = float(prob)
+            else:
+                results[f"Unknown_{i}"] = float(prob)
+    if not results:
+        max_idx = int(probs.argmax())
+        label = id2label.get(max_idx) or id2label.get(str(max_idx))
+        results[label or f"Unknown_{max_idx}"] = float(probs[max_idx])
+    return results
+```
+An example:
+```
+test_seq = "MSRLEAKKPSLCKSEPLTTERVRTTLSVLKRIVTSCYGPSGRLKQLHNGFGGYVCTTSQSSALLSHLLVTHPILKILTASIQNHVSSFSDCGLFTAILCCNLIENVQRLGLTPTTVIRLNKHLLSLCISYLKSETCGCRIPVDFSSTQILLCLVRSILTSKPACMLTRKETEHVSALILRAFLLTIPENAEGHIILGKSLIVPLKGQRVIDSTVLPGILIEMSEVQLMRLLPIKKSTALKVALFCTTLSGDTSDTGEGTVVVSYGVSLENAVLDQLLNLGRQLISDHVDLVLCQKVIHPSLKQFLNMHRIIAIDRIGVTLMEPLTKMTGTQPIGSLGSICPNSYGSVKDVCTAKFGSKHFFHLIPNEATICSLLLCNRNDTAWDELKLTCQTALHVLQLTLKEPWALLGGGCTETHLAAYIRHKTHNDPESILKDDECTQTELQLIAEAFCSALESVVGSLEHDGGEILTDMKYGHLWSVQADSPCVANWPDLLSQCGCGLYNSQEELNWSFLRSTRRPFVPQSCLPHEAVGSASNLTLDCLTAKLSGLQVAVETANLILDLSYVIEDKN"
+predictions = predict_sublocation(test_seq, model, tokenizer, device)
+print(f"Result: {predictions}")
+```
+The output will be similar to:
+```text
+Result: {'Cytoplasm': 0.9772326350212097, 'Soluble': 0.998727023601532}
+```