File size: 2,371 Bytes
01e3e14 9fe28a9 01e3e14 9fe28a9 01e3e14 9fe28a9 01e3e14 9fe28a9 01e3e14 2a01193 01e3e14 9fe28a9 01e3e14 9fe28a9 01e3e14 9fe28a9 01e3e14 9fe28a9 01e3e14 9fe28a9 01e3e14 9fe28a9 01e3e14 9fe28a9 01e3e14 9fe28a9 01e3e14 9fe28a9 01e3e14 2a01193 01e3e14 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | ---
language: en
license: mit
tags:
- explainability
- interpretability
- protein-protein-interaction
- deeplift
- integrated-gradients
- captum
- prochlorococcus
- cyanobacteria
- pytorch
library_name: pytorch
pipeline_tag: other
---
# Explainability Analysis of ppiGPT Interaction Predictions in *Prochlorococcus* MED4
This repository hosts large result files and the model checkpoint used in the explainability analyses described in Daakour et al., "Topological entrenchment of adaptive proteins in the streamlined interactome of *Prochlorococcus* MED4." Analysis code and source data are in the companion GitHub repository.
## What This Repository Contains
### Explainability Results
These files are the outputs of interpretability analyses applied to ppiGPT predictions:
| File | Size | Description |
|------|------|-------------|
| `results/deeplift_motif_analysis_results.pkl` | 78 MB | Captum DeepLift per-residue attribution scores, motif discovery results, and position-wise statistics for all 2,168 protein pairs (1,084 PRS + 1,084 RRS) |
| `results/integrated_gradients_random_ppi_per_token_attributions.csv` | 174 MB | Captum Integrated Gradients per-token attribution scores for the 1,084 random reference set pairs |
### ppiGPT Model Checkpoint (for reproducibility)
The ppiGPT model was created by **Kourosh Salehi-Ashtiani** and is included here solely to enable reproduction of the explainability analyses. It is not a product of the explainability work.
| File | Size | Description |
|------|------|-------------|
| `model/out_3e/ckpt.pt` | 1.0 GB | ppiGPT model checkpoint (3 epochs) |
| `model/data/meta.pkl` | 343 B | Character-level tokenizer metadata (29-token vocabulary) |
**ppiGPT architecture:** GPT-2 decoder-only transformer, 12 layers, 12 attention heads, 768 embedding dimensions, ~84.98M parameters. Trained from scratch on *Prochlorococcus* MED4 protein sequences with a 29-token character-level vocabulary (20 amino acids + 9 special tokens).
## Code Repository
Analysis scripts, source datasets, publication figures, and documentation:
https://github.com/olympus-terminal/Prochlorococcus_interactome_model_explainability
## Citation
This repository is part of:
> Daakour et al., "Topological entrenchment of adaptive proteins in the streamlined interactome of *Prochlorococcus* MED4."
## License
MIT
|