DeepMS2 / document /documentation.md
kairongLi's picture
Upload 44 files
fb5f46a

DeepMASS2 Documentation

Oct 16th, 2023

Package: Version: 1.0.0, released at Oct 16th, 2023
Maintainer: Ji Hongchao (jihongchao@caas.cn)

Introduction

DeepMASS is an innovative software tool offering a powerful solution for annotating and discovering metabolites within complex biological systems. Its foundation lies in a sophisticated deep-learning-based semantic similarity model, which seamlessly connects mass spectra to structurally related compounds, effectively mapping the chemical space of the unknown. DeepMASS maximizes the utility of mass spectrometry big data, positioning itself for further development as data scales continue to expand.

Installation

System Recommended:
Operating Systems:

- Windows 11
- MacOS

Recommended Hardware:

- Intel Core i5 or greater
- 16 GB RAM or more
- 5 GB hard drive space

Please follow the following installation steps:

  1. Install Anaconda or Miniconda

  2. Create a new conda environment and activate:

     conda create -n deepmass python=3.8.13
     conda activate deepmass
    
  3. Clone the repository and enter:

     git clone https://github.com/hcji/DeepMASS2_GUI.git
     cd DeepMASS2_GUI
    
  4. Install dependency (for MacOS, some dependency may install with conda manually):

     pip install -r requirements.txt
     
    
  5. Download the dependent data.

    1. put the following files into data folder:

           DeepMassStructureDB-v1.0.csv
           references_index_negative_spec2vec.bin
           references_index_positive_spec2vec.bin
           references_spectrums_negative.pickle
           references_spectrums_positive.pickle
      
    2. put the following files into model folder:

           Ms2Vec_allGNPSnegative.hdf5
           Ms2Vec_allGNPSnegative.hdf5.syn1neg.npy
           Ms2Vec_allGNPSnegative.hdf5.wv.vectors.npy
           Ms2Vec_allGNPSpositive.hdf5
           Ms2Vec_allGNPSpositive.hdf5.syn1neg.npy
           Ms2Vec_allGNPSpositive.hdf5.wv.vectors.npy
      

Please note, these dependent data are introduced as public version of the published paper. It means they are based on the GNPS dataset only. If you test on CASMI dataset with these version, there may be some difference compared with the reported results in the paper. If you have the accessibility of NIST 20, please referred the introduction of the Advanced usage part, and re-train the model.

  1. Run DeepMASS

     python DeepMASS2.py
    

Quick start

  1. DeepMASS may need some time for auto-loading the dependent data. Please wait until the buttons become active.
  1. Click Open button, select a mgf file containing one or multiple MS/MS spectra. See this for an example. Except SMILES and INCHIKEY lines, the other meta information is necessary for each spectrum.
  1. Click Run DeepMASS for annotating with DeepMASS algorithm, or click Run MatchMS for library matching. Wait for the progress bar to finish.
  1. Click Save button, select the folder path to save the annotation results.

Advanced usage

  1. Constructing mgf file from MS-DIAL results.

    1. Process your ms files of DDA/DIA mode metabolomic study following the MS-DIAL tutorial.
    2. Export the alignment result with txt format. Refer the tutorial-section 5-6-(B).
    3. Refer the scripts here.
  2. Training models with NIST 20 spectra.

    1. Use LIB2NIST tool to export NIST 20 database to mgf format.
    2. Refer the scripts here, and transform the data into DeepMASS required format.
    3. Refer the scripts here, and train your ms2vec model.
    4. Refer the scripts here, and build index for the spectra of NIST 20.
    5. Copy all the generated files into corresponding folder of DeepMASS.

Reference

Comming soon ...