--- language: en license: mit tags: - explainability - interpretability - protein-protein-interaction - deeplift - integrated-gradients - captum - prochlorococcus - cyanobacteria - pytorch library_name: pytorch pipeline_tag: other --- # Explainability Analysis of ppiGPT Interaction Predictions in *Prochlorococcus* MED4 This repository hosts large result files and the model checkpoint used in the explainability analyses described in Daakour et al., "Topological entrenchment of adaptive proteins in the streamlined interactome of *Prochlorococcus* MED4." Analysis code and source data are in the companion GitHub repository. ## What This Repository Contains ### Explainability Results These files are the outputs of interpretability analyses applied to ppiGPT predictions: | File | Size | Description | |------|------|-------------| | `results/deeplift_motif_analysis_results.pkl` | 78 MB | Captum DeepLift per-residue attribution scores, motif discovery results, and position-wise statistics for all 2,168 protein pairs (1,084 PRS + 1,084 RRS) | | `results/integrated_gradients_random_ppi_per_token_attributions.csv` | 174 MB | Captum Integrated Gradients per-token attribution scores for the 1,084 random reference set pairs | ### ppiGPT Model Checkpoint (for reproducibility) The ppiGPT model was created by **Kourosh Salehi-Ashtiani** and is included here solely to enable reproduction of the explainability analyses. It is not a product of the explainability work. | File | Size | Description | |------|------|-------------| | `model/out_3e/ckpt.pt` | 1.0 GB | ppiGPT model checkpoint (3 epochs) | | `model/data/meta.pkl` | 343 B | Character-level tokenizer metadata (29-token vocabulary) | **ppiGPT architecture:** GPT-2 decoder-only transformer, 12 layers, 12 attention heads, 768 embedding dimensions, ~84.98M parameters. Trained from scratch on *Prochlorococcus* MED4 protein sequences with a 29-token character-level vocabulary (20 amino acids + 9 special tokens). ## Code Repository Analysis scripts, source datasets, publication figures, and documentation: https://github.com/olympus-terminal/Prochlorococcus_interactome_model_explainability ## Citation This repository is part of: > Daakour et al., "Topological entrenchment of adaptive proteins in the streamlined interactome of *Prochlorococcus* MED4." ## License MIT