Dark Proteins PCI/LIFA Prototype

This repository contains a working prototype for genome-scale protein-chemical interaction (PCI) and ligand-induced functional activity (LIFA) prediction for dark proteins.

Architecture: frozen ESM-2 t6 8M protein encoder + hashed SMILES ligand encoder + multitask heads for binding, activity, and affinity.

OOD dark-protein evaluation: {"binding_accuracy": 0.8888888888888888, "activity_accuracy": 0.8888888888888888, "affinity_rmse": 1.5511236557645622}

Artifacts

  • dark_protein_model.pt: trained PyTorch checkpoint
  • avil_screen.csv: ranked AVIL inhibitor/ligand candidates
  • drd2_drd3_dual_antagonists.csv: ranked DRD2/DRD3 dual antagonist candidates

Top AVIL candidates

target compound smiles binding_probability predicted_pAffinity activity activity_confidence
AVIL benzamide CC(C)N1CCC(C(=O)Nc2ccccc2N2CCCCC2)CC1 0.417894 6.50255 inactive 0.613166
AVIL tyramine CN(C)CCc1ccc(O)cc1 0.417845 6.48015 inactive 0.612231
AVIL haloperidol-like Clc1ccc(C2CCNCC2)cc1 0.416993 6.52308 inactive 0.614537
AVIL naltrexone-like C=CCN1CC(C(=O)OC)C=C2c3cc4c(cc3CC21)OCO4 0.416627 6.53851 inactive 0.615551
AVIL arylpiperazine CCOc1ccccc1N1CCN(C)CC1 0.41661 6.52149 inactive 0.614814

Top DRD2/DRD3 dual candidates

target_DRD2 compound smiles binding_probability_DRD2 predicted_pAffinity_DRD2 activity_DRD2 activity_confidence_DRD2 target_DRD3 binding_probability_DRD3 predicted_pAffinity_DRD3 activity_DRD3 activity_confidence_DRD3 dual_score
DRD2 benzamide CC(C)N1CCC(C(=O)Nc2ccccc2N2CCCCC2)CC1 0.589879 6.9194 inactive 0.392142 DRD3 0.677743 7.30553 antagonist 0.409369 0.633811
DRD2 tyramine CN(C)CCc1ccc(O)cc1 0.589791 6.89811 inactive 0.391389 DRD3 0.67738 7.28465 antagonist 0.409597 0.633586
DRD2 haloperidol-like Clc1ccc(C2CCNCC2)cc1 0.589155 6.93862 inactive 0.393289 DRD3 0.677397 7.32427 antagonist 0.408889 0.633276
DRD2 naltrexone-like C=CCN1CC(C(=O)OC)C=C2c3cc4c(cc3CC21)OCO4 0.588766 6.95339 inactive 0.394221 DRD3 0.677305 7.33871 antagonist 0.408469 0.633035
DRD2 arylpiperazine CCOc1ccccc1N1CCN(C)CC1 0.58875 6.9371 inactive 0.393589 DRD3 0.677092 7.32269 antagonist 0.408721 0.632921

Generated by ML Intern

This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = 'vedatonuryilmaz/dark-proteins-model'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support