Dark Proteins PCI/LIFA Prototype
This repository contains a working prototype for genome-scale protein-chemical interaction (PCI) and ligand-induced functional activity (LIFA) prediction for dark proteins.
Architecture: frozen ESM-2 t6 8M protein encoder + hashed SMILES ligand encoder + multitask heads for binding, activity, and affinity.
OOD dark-protein evaluation: {"binding_accuracy": 0.8888888888888888, "activity_accuracy": 0.8888888888888888, "affinity_rmse": 1.5511236557645622}
Artifacts
dark_protein_model.pt: trained PyTorch checkpointavil_screen.csv: ranked AVIL inhibitor/ligand candidatesdrd2_drd3_dual_antagonists.csv: ranked DRD2/DRD3 dual antagonist candidates
Top AVIL candidates
| target | compound | smiles | binding_probability | predicted_pAffinity | activity | activity_confidence |
|---|---|---|---|---|---|---|
| AVIL | benzamide | CC(C)N1CCC(C(=O)Nc2ccccc2N2CCCCC2)CC1 | 0.417894 | 6.50255 | inactive | 0.613166 |
| AVIL | tyramine | CN(C)CCc1ccc(O)cc1 | 0.417845 | 6.48015 | inactive | 0.612231 |
| AVIL | haloperidol-like | Clc1ccc(C2CCNCC2)cc1 | 0.416993 | 6.52308 | inactive | 0.614537 |
| AVIL | naltrexone-like | C=CCN1CC(C(=O)OC)C=C2c3cc4c(cc3CC21)OCO4 | 0.416627 | 6.53851 | inactive | 0.615551 |
| AVIL | arylpiperazine | CCOc1ccccc1N1CCN(C)CC1 | 0.41661 | 6.52149 | inactive | 0.614814 |
Top DRD2/DRD3 dual candidates
| target_DRD2 | compound | smiles | binding_probability_DRD2 | predicted_pAffinity_DRD2 | activity_DRD2 | activity_confidence_DRD2 | target_DRD3 | binding_probability_DRD3 | predicted_pAffinity_DRD3 | activity_DRD3 | activity_confidence_DRD3 | dual_score |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DRD2 | benzamide | CC(C)N1CCC(C(=O)Nc2ccccc2N2CCCCC2)CC1 | 0.589879 | 6.9194 | inactive | 0.392142 | DRD3 | 0.677743 | 7.30553 | antagonist | 0.409369 | 0.633811 |
| DRD2 | tyramine | CN(C)CCc1ccc(O)cc1 | 0.589791 | 6.89811 | inactive | 0.391389 | DRD3 | 0.67738 | 7.28465 | antagonist | 0.409597 | 0.633586 |
| DRD2 | haloperidol-like | Clc1ccc(C2CCNCC2)cc1 | 0.589155 | 6.93862 | inactive | 0.393289 | DRD3 | 0.677397 | 7.32427 | antagonist | 0.408889 | 0.633276 |
| DRD2 | naltrexone-like | C=CCN1CC(C(=O)OC)C=C2c3cc4c(cc3CC21)OCO4 | 0.588766 | 6.95339 | inactive | 0.394221 | DRD3 | 0.677305 | 7.33871 | antagonist | 0.408469 | 0.633035 |
| DRD2 | arylpiperazine | CCOc1ccccc1N1CCN(C)CC1 | 0.58875 | 6.9371 | inactive | 0.393589 | DRD3 | 0.677092 | 7.32269 | antagonist | 0.408721 | 0.632921 |
Generated by ML Intern
This model repository was generated by ML Intern, an agent for machine learning research and development on the Hugging Face Hub.
- Try ML Intern: https://smolagents-ml-intern.hf.space
- Source code: https://github.com/huggingface/ml-intern
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = 'vedatonuryilmaz/dark-proteins-model'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
For non-causal architectures, replace AutoModelForCausalLM with the appropriate AutoModel class.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support