# DeepMASS2 Documentation ### Oct 16th, 2023 **Package:** Version: 1.0.0, released at Oct 16th, 2023 Maintainer: Ji Hongchao *(jihongchao@caas.cn)* ## Introduction DeepMASS is an innovative software tool offering a powerful solution for annotating and discovering metabolites within complex biological systems. Its foundation lies in a sophisticated deep-learning-based semantic similarity model, which seamlessly connects mass spectra to structurally related compounds, effectively mapping the chemical space of the unknown. DeepMASS maximizes the utility of mass spectrometry big data, positioning itself for further development as data scales continue to expand. ## Installation **System Recommended:** Operating Systems: - Windows 11 - MacOS Recommended Hardware: - Intel Core i5 or greater - 16 GB RAM or more - 5 GB hard drive space **Please follow the following installation steps:** 1. Install [Anaconda](https://www.anaconda.com/) or [Miniconda](https://docs.conda.io/en/latest/miniconda.html) 2. Create a new conda environment and activate: conda create -n deepmass python=3.8.13 conda activate deepmass 3. Clone the repository and enter: git clone https://github.com/hcji/DeepMASS2_GUI.git cd DeepMASS2_GUI 4. Install dependency (for *MacOS*, some dependency may install with conda manually): pip install -r requirements.txt 5. Download the [dependent data](https://github.com/hcji/DeepMASS2_GUI/releases/tag/v0.99.0). 1) put the following files into *data* folder: DeepMassStructureDB-v1.0.csv references_index_negative_spec2vec.bin references_index_positive_spec2vec.bin references_spectrums_negative.pickle references_spectrums_positive.pickle 2) put the following files into *model* folder: Ms2Vec_allGNPSnegative.hdf5 Ms2Vec_allGNPSnegative.hdf5.syn1neg.npy Ms2Vec_allGNPSnegative.hdf5.wv.vectors.npy Ms2Vec_allGNPSpositive.hdf5 Ms2Vec_allGNPSpositive.hdf5.syn1neg.npy Ms2Vec_allGNPSpositive.hdf5.wv.vectors.npy Please note, these dependent data are introduced as public version of the published paper. It means they are based on the GNPS dataset only. If you test on CASMI dataset with these version, there may be some difference compared with the reported results in the paper. If you have the accessibility of NIST 20, please referred the introduction of the *Advanced usage* part, and re-train the model. 6. Run DeepMASS python DeepMASS2.py ## Quick start 1. DeepMASS may need some time for auto-loading the dependent data. Please wait until the buttons become active.