File size: 2,732 Bytes
c43fbc6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 |
# Embedding exploration
This folder contains all the data and code needed to run embedding exploration (Fig. S3).
### Data download
To help select TF (transcription factor) and Kinase-containing fusions for investigation (Fig. S3a), Supplementary Table 3 from [Salokas et al. 2020](https://doi.org/10.1038/s41598-020-71040-8) was downloaded as a reference of transcription factors and kinases.
```
benchmarking/
βββ embedding_exploration/
βββ data/
βββ salokas_2020_tableS3.csv
βββ tf_and_kinase_fusions.csv
βββ top_genes.csv
```
- **`data/salokas_2020_tableS3.csv`**: Supplementary Table 3 from [Salokas et al. 2020](https://doi.org/10.1038/s41598-020-71040-8)
- **`data/tf_and_kinase_fusions.csv`**: set of TF::TF and Kinase::Kinase fusion oncoproteins from FusOn-DB database. Curated in `plot.py`
- **`data/top_genes.csv`**: fusion oncoproteins (and their head and tail components) visualized in Fig. S3b. Sequences for head and tail components were pulled from the best-aligned sequences in `fuson_plm/data/blast/blast_outputs/best_htg_alignments_swissprot_seqs.pkl`
### Plotting
Run `plot.py` to regenerate plots in Figure S3:
```
# Dictionary: key = run name, values = epochs. (use this option if you've trained your own model)
# # Or "FusOn-pLM" to use official model
FUSON_PLM_CKPT= "FusOn-pLM"
# Type of dim reduction
PLOT_UMAP = True
PLOT_TSNE = False
# Overwriting configs
PERMISSION_TO_OVERWRITE = False # if False, script will halt if it believes these embeddings have already been made.
```
To run, use:
```
nohup python plot.py > plot.out 2> plot.err &
```
- All **results** are stored in `embedding_exploration/results/<timestamp>`, where `timestamp` is a unique string encoding the date and time when you started training.
Below are the FusOn-pLM paper results in `results/final/umap_plots/fuson_plm/best/`:
```
benchmarking/
βββ embedding_exploration/
βββ results/final/umap_plots/fuson_plm/best/
βββ favorites/
βββ umap_favorites_source_data.csv
βββ umap_favorites_visualization.png
βββ tf_and_kinase/
βββ umap_tf_and_kinase_fusions_source_data.csv βββ umap_tf_and_kinase_fusions_visualization.png
```
- **`favorites/umap_favorites_visualization.png`**: Fig. S3b, with the data directly plotted stored in `favorites/umap_favorites_source_data.csv`
- **`tf_and_kinase/umap_tf_and_kinase_fusions_visualization.png`**: Fig. S3a, with the data directly plotted stored in `tf_and_kinase/umap_tf_and_kinase_fusions_source_data.csv`. |