This directory holds functions for downloading raw data needed to train the binding site predictor (classifier) on processed ChIP-seq peaks.
Human genome
genome.py
- Download the sequences of all chromosomes from a certain genome (e.g. hg38, used for this project)
- Configurations associated with this download can be found in
./configs/data_task/download/genome.yaml
Running the download
To run this download, please change directory to DPACMAN/dpacman and run:
python -u -m scripts.preprocess data_task=download/genome
ReMap 2022
remap.py
- Download non-redundant peaks:
remap2022_nr_macs2_hg38_v1_0.bed - Download cis-regulatory modules (CRMS):
remap2022_crm_macs2_hg38_v1_0.bed - Configurations associated with this download can be found in
./configs/data_task/download/remap.yaml
Running the download
To run this download, please change directory to DPACMAN/dpacman and run:
python -u -m scripts.preprocess data_task=download/remap