bilalsm commited on
Commit
1cac55b
·
verified ·
1 Parent(s): 0a52369

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -7,7 +7,7 @@ tags:
7
  - knn
8
  - scikit-learn
9
  library_name: sklearn
10
- pipeline_tag: formula-assignment
11
  ---
12
 
13
  # DOM Formula Assignment using K-Nearest Neighbors
@@ -28,7 +28,7 @@ pipeline_tag: formula-assignment
28
  Dissolved organic matter (DOM) is a critical component of aquatic ecosystems, with the fulvic acid fraction (FA-DOM) exhibiting high mobility and ready bioavailability to microbial communities. While understanding the molecular composition is a vital area of study, the heterogeneity of the material, with a vast number of diverse compounds, makes this task challenging. Existing methods often struggle with incomplete formula assignment or reduced coverage highlighting the need for a better approach. In this study, we developed a machine learning approach using the k-nearest neighbors (KNN) algorithm to predict molecular formulas from ultra-high-resolution mass spectrometry data. The model was trained on chemical formulas assigned to multiple DOM samples using 7 Tesla(7T) and a 21 Tesla(21T) Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) system, and tested on an independent 9.4 T FT-ICR MS Fulvic Acid dataset. A synthetic dataset of plausible elemental combinations (C, H, O, N, S) was also generated to enhance generalization. Our approach achieved a 99.9% assignment rate on the labeled test set and assigned a total of 13,605 formulas for unlabeled peaks compared to the existing approach, which assigned 5914 formulas, achieving up to a 2.3X improvement in formula assignment coverage compared to existing methods.
29
 
30
 
31
- ![Architecture](architecture.png)
32
 
33
  ---
34
 
 
7
  - knn
8
  - scikit-learn
9
  library_name: sklearn
10
+ pipeline_tag: feature-extraction
11
  ---
12
 
13
  # DOM Formula Assignment using K-Nearest Neighbors
 
28
  Dissolved organic matter (DOM) is a critical component of aquatic ecosystems, with the fulvic acid fraction (FA-DOM) exhibiting high mobility and ready bioavailability to microbial communities. While understanding the molecular composition is a vital area of study, the heterogeneity of the material, with a vast number of diverse compounds, makes this task challenging. Existing methods often struggle with incomplete formula assignment or reduced coverage highlighting the need for a better approach. In this study, we developed a machine learning approach using the k-nearest neighbors (KNN) algorithm to predict molecular formulas from ultra-high-resolution mass spectrometry data. The model was trained on chemical formulas assigned to multiple DOM samples using 7 Tesla(7T) and a 21 Tesla(21T) Fourier transform ion cyclotron resonance mass spectrometry (FT-ICR MS) system, and tested on an independent 9.4 T FT-ICR MS Fulvic Acid dataset. A synthetic dataset of plausible elemental combinations (C, H, O, N, S) was also generated to enhance generalization. Our approach achieved a 99.9% assignment rate on the labeled test set and assigned a total of 13,605 formulas for unlabeled peaks compared to the existing approach, which assigned 5914 formulas, achieving up to a 2.3X improvement in formula assignment coverage compared to existing methods.
29
 
30
 
31
+ ![Architecture](Architecture.png)
32
 
33
  ---
34