| --- |
| language: en |
| license: mit |
| tags: |
| - explainability |
| - interpretability |
| - protein-protein-interaction |
| - deeplift |
| - integrated-gradients |
| - captum |
| - prochlorococcus |
| - cyanobacteria |
| - pytorch |
| library_name: pytorch |
| pipeline_tag: other |
| --- |
| |
| # Explainability Analysis of ppiGPT Interaction Predictions in *Prochlorococcus* MED4 |
|
|
| This repository hosts large result files and the model checkpoint used in the explainability analyses described in Daakour et al., "Topological entrenchment of adaptive proteins in the streamlined interactome of *Prochlorococcus* MED4." Analysis code and source data are in the companion GitHub repository. |
|
|
| ## What This Repository Contains |
|
|
| ### Explainability Results |
|
|
| These files are the outputs of interpretability analyses applied to ppiGPT predictions: |
|
|
| | File | Size | Description | |
| |------|------|-------------| |
| | `results/deeplift_motif_analysis_results.pkl` | 78 MB | Captum DeepLift per-residue attribution scores, motif discovery results, and position-wise statistics for all 2,168 protein pairs (1,084 PRS + 1,084 RRS) | |
| | `results/integrated_gradients_random_ppi_per_token_attributions.csv` | 174 MB | Captum Integrated Gradients per-token attribution scores for the 1,084 random reference set pairs | |
|
|
| ### ppiGPT Model Checkpoint (for reproducibility) |
|
|
| The ppiGPT model was created by **Kourosh Salehi-Ashtiani** and is included here solely to enable reproduction of the explainability analyses. It is not a product of the explainability work. |
|
|
| | File | Size | Description | |
| |------|------|-------------| |
| | `model/out_3e/ckpt.pt` | 1.0 GB | ppiGPT model checkpoint (3 epochs) | |
| | `model/data/meta.pkl` | 343 B | Character-level tokenizer metadata (29-token vocabulary) | |
|
|
| **ppiGPT architecture:** GPT-2 decoder-only transformer, 12 layers, 12 attention heads, 768 embedding dimensions, ~84.98M parameters. Trained from scratch on *Prochlorococcus* MED4 protein sequences with a 29-token character-level vocabulary (20 amino acids + 9 special tokens). |
|
|
| ## Code Repository |
|
|
| Analysis scripts, source datasets, publication figures, and documentation: |
| https://github.com/olympus-terminal/Prochlorococcus_interactome_model_explainability |
| |
| ## Citation |
| |
| This repository is part of: |
| |
| > Daakour et al., "Topological entrenchment of adaptive proteins in the streamlined interactome of *Prochlorococcus* MED4." |
| |
| ## License |
| |
| MIT |
| |