Model Card for MuScaRi

MuScaRi (Multi-Scale species Richness estimation, also named after the Muscari genus of perennial bulbous plants) is a deep learning model that estimates vascular plant species richness at arbitrary spatial scales from ecological survey data and environmental covariates.

Model Description

MuScaRi composes a fully connected feedforward neural network with a four-parameter Weibull rarefaction model. Given the area of a spatial unit and summary statistics of environmental covariates within it, the neural network predicts the parameters of the rarefaction curve, which in turn predicts expected species richness as a function of sampling effort. Evaluating the curve at infinite sampling effort yields total (asymptotic) species richness predictions.

The pretrained model is an ensemble of 5 members, one per spatial cross-validation fold, trained on ~350k European vegetation plots from the European Vegetation Archive (EVA). Ensemble predictions are aggregated by arithmetic mean; standard deviations quantify prediction uncertainty.

See the paper for full architecture details and benchmarks, and the muscari-data dataset card for the dataset used during training.

Quick Start

from muscari import MuScaRiEnsemble
from muscari.data_processing.utils_features import EnvironmentalFeatureDataset
import pandas as pd

model = MuScaRiEnsemble.from_pretrained("vboussange/muscari")
print(f"Ensemble with {model.n_models} members")
print("Required features:", model.feature_names)

# Predict total species richness for a spatial unit
# df must contain columns listed in model.feature_names
df = pd.DataFrame([...])  # one row per spatial unit; see Colab demo for how to build it
sr_mean = model.predict_mean_sr_tot(df)   # asymptotic richness
sr_std  = model.get_std_sr_tot(df)        # ensemble uncertainty

For an end-to-end walkthrough, see the Colab demo.

Inputs and Outputs

Inputs: a df: pandas.Dataframe with the following columns (see Colab demo for more details)

Feature group Columns Description
Spatial unit area log_observed_area Log of sampling effort (m²); omit for asymptotic prediction
Mean environmental conditions mean of bio1, bio12, sfcWind, pet, elevation Mean of CHELSA/EU-DEM variables within the spatial unit
Environmental heterogeneity std of bio1, bio12, sfcWind, pet, elevation Std of CHELSA/EU-DEM variables within the spatial unit

Outputs:

  • model.predict_mean_sr(df): expected species richness at a given sampling effort (interpolation mode)
  • model.predict_mean_sr_tot(df): total species richness under asymptotic sampling effort (extrapolation mode)
  • model.get_std_sr_tot(df): ensemble standard deviation of the above

Training Data and Evaluation

Full performance tables are in the paper.

Limitations

  • Trained on European vascular plants; performance outside Europe is untested.
  • Environmental predictors use a 1981-2010 climatological baseline.
  • Predictions are less reliable in data-sparse regions (e.g. parts of France, Spain, Scandinavia).

Citation

@misc{boussange2025muscari,
  title         = {Multi-scale species richness estimation with deep learning},
  author        = {Victor Boussange and Bert Wuyts and Philipp Brun and
                   Johanna T. Malle and Gabriele Midolo and Jeanne Portier and
                   Théophile Sanchez and Niklaus E. Zimmermann and
                   Irena Axmanová and Helge Bruelheide and Milan Chytrý and
                   Stephan Kambach and Zdeňka Lososová and Martin Večeřa and
                   Idoia Biurrun and Klaus T. Ecker and Jonathan Lenoir and
                   Jens-Christian Svenning and Dirk Nikolaus Karger},
  year          = {2025},
  eprint        = {2507.06358},
  archivePrefix = {arXiv},
  primaryClass  = {q-bio.PE},
  url           = {https://arxiv.org/abs/2507.06358},
}
Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for vboussange/muscari