svincoff's picture
hydra restructure
a887ffc
|
Raw
History Blame Contribute Delete
992 Bytes

This directory holds functions for downloading raw data needed to train the binding site predictor (classifier) on processed ChIP-seq peaks.

Human genome

genome.py

  • Download the sequences of all chromosomes from a certain genome (e.g. hg38, used for this project)
  • Configurations associated with this download can be found in ./configs/data_task/download/genome.yaml

Running the download

To run this download, please change directory to DPACMAN/dpacman and run:

python -u -m scripts.preprocess data_task=download/genome

ReMap 2022

remap.py

  • Download non-redundant peaks: remap2022_nr_macs2_hg38_v1_0.bed
  • Download cis-regulatory modules (CRMS): remap2022_crm_macs2_hg38_v1_0.bed
  • Configurations associated with this download can be found in ./configs/data_task/download/remap.yaml

Running the download

To run this download, please change directory to DPACMAN/dpacman and run:

python -u -m scripts.preprocess data_task=download/remap