TARA-XGBoost-Bidirectional

Bidirectional XGBoost ensemble models linking marine environmental variables to microalgal protein domain (Pfam) abundance profiles from the TARA Oceans metagenomic dataset.

Model Description

Forward Models (Environment β†’ Pfam)

  • Input: 32 Google Earth Engine oceanographic variables
  • Output: CLR-transformed abundance of 100 top-variance Pfam domains
  • 100 independent XGBoost regressors (one per target domain)
  • Median test RΒ² = 0.20 (IQR: 0.16–0.29; max RΒ² = 0.54)

Reverse Models (Pfam β†’ Environment)

  • Input: 9,611 Pfam domain abundances (100 PCA components, 72.2% variance)
  • Output: 31 environmental variables
  • 31 independent XGBoost regressors (one per target variable)
  • Best targets: MODIS SST (RΒ² = 0.61), mean SST (RΒ² = 0.61), bathymetry (RΒ² = 0.53)
  • Independent-subset validation: bathymetry RΒ² = 0.25 across disjoint ocean basins

XGBoost Hyperparameters

Parameter Value
n_estimators 200
max_depth 6
learning_rate 0.1
subsample 0.8
colsample_bytree 0.8
min_child_weight 3
reg_alpha 0.1
reg_lambda 1.0

Files

  • xgboost_forward_models_20260124_104452.joblib β€” 100 forward models (49 MB)
  • xgboost_reverse_models_20260124_104452.joblib β€” 31 reverse models (15 MB)
  • model_manifest_20260124_104452.json β€” Feature lists and hyperparameters

Usage

import joblib

# Load reverse models (Pfam β†’ Environment)
reverse_bundle = joblib.load("xgboost_reverse_models_20260124_104452.joblib")

# Load forward models (Environment β†’ Pfam)
forward_bundle = joblib.load("xgboost_forward_models_20260124_104452.joblib")

Dataset

Trained on AlgaGPT-extracted proteomes from 2,044 TARA Oceans metagenomic assemblies. Environmental variables from Google Earth Engine (GEE) for 1,279 samples with complete metadata.

Related Models

Citation

LA4SR classification models:

Nelson DR, Jaiswal AK, Ismail NS, Mystikou A, Salehi-Ashtiani K. Patterns. 2024;6(11).

License

Apache 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support