PM2.5 Air Quality Forecasting Models

Pre-trained models for predicting PM2.5 concentrations 1-24 hours ahead across European cities.

Model Overview

These models were trained on European Environment Agency (EEA) air quality data from 2018-2022 and evaluated on 2023-2024 data. They predict PM2.5 at multiple forecast horizons: 1h, 3h, 6h, 12h, and 24h.

Training Data

  • Countries: 5 (AT, BE, ES, FI, FR)
  • Cities: Wien, Paris, Madrid, Antwerpen, Helsinki
  • Stations: 38 monitoring stations
  • Records: 1.9M+ hourly observations

Available Models

Model Type File Pattern Description
Linear Regression Statistical lr_h{horizon}.pkl Baseline linear model
GAM Statistical gam_h{horizon}.pkl Generalized Additive Model
Random Forest ML rf_h{horizon}.pkl Tuned Random Forest
XGBoost ML xgb_h{horizon}.pkl Tuned XGBoost
LightGBM ML lgb_h{horizon}.pkl Tuned LightGBM
LSTM Deep Learning lstm_global_h{horizon}.keras Basic LSTM (168h lookback)
LSTM-Residual Deep Learning lstm_residual_h{horizon}.keras Residual connections
LSTM-Attention Deep Learning lstm_attention_h{horizon}.keras Global attention mechanism
LSTM-CNN Deep Learning lstm_cnn_h{horizon}.keras Hybrid CNN-LSTM

Performance (1-hour horizon)

Protocol A: Full Dataset (606,635 test samples)

Model MAE (µg/m³) RMSE (µg/m³)
Persistence 1.50 2.64 0.872
Linear Regression 1.49 2.51 0.885
LightGBM 1.44 2.45 0.890

Protocol B: Sequence-Eligible Subset (375,906 test samples)

Model MAE (µg/m³) RMSE (µg/m³)
LSTM-Attention 1.19 2.18 0.916

Protocol B uses stations with sufficient sequential data for LSTM (168h+ continuous sequences). See full results in the GitHub repository.

Usage

Download Models

from huggingface_hub import hf_hub_download

# Download a specific model
model_path = hf_hub_download(
    repo_id="cosuleabianca/eea-pm25-models",
    filename="models_lgb/lgb_h1.pkl"
)

# Load with joblib (for sklearn/xgboost/lightgbm models)
import joblib
model = joblib.load(model_path)

Load Keras Models

from huggingface_hub import hf_hub_download
from tensorflow import keras

model_path = hf_hub_download(
    repo_id="cosuleabianca/eea-pm25-models",
    filename="lstm_attention_models/lstm_attention_h1.keras"
)
model = keras.models.load_model(model_path)

Input Features

All models expect the same feature set (81 features total):

Pollutant Features

  • PM2.5: lag_1h, lag_2h, lag_3h, lag_6h, lag_12h, lag_24h, lag_168h, rolling_mean_3h/6h/12h/24h, rolling_std_3h/6h/12h/24h
  • NO2: current, lags (1h-168h), rolling_mean_3h/6h/12h/24h, rolling_std_3h/6h/12h/24h
  • PM10: current, lags (1h-168h), rolling_mean_3h/6h/12h/24h, rolling_std_3h/6h/12h/24h

Weather Features (Open-Meteo)

  • temperature_2m, relative_humidity_2m, dew_point_2m
  • wind_u, wind_v (east-west and north-south components)
  • precipitation, surface_pressure

Temporal Features

  • hour_sin, hour_cos, month_sin, month_cos
  • day_of_week, is_weekend, season

Station Metadata

  • Latitude, Longitude, Altitude
  • StationType (background, industrial, traffic)
  • StationArea (rural, suburban, urban)

Repository Structure

├── models_rf/           # Random Forest models
├── models_lgb/          # LightGBM models
├── models_gam/          # GAM models
├── lstm_global_models/  # Basic LSTM
├── lstm_residual_models/# Residual LSTM
├── lstm_attention_models/# Attention LSTM
├── lstm_cnn_models/     # CNN-LSTM hybrid
└── scalers/             # Per-station scalers (for LSTM)

Citation

If you use these models, please cite:

@misc{eea-pm25-forecasting,
  author = {Chisilev Bianca-Iuliana},
  title = {PM2.5 Air Quality Forecasting Models for Europe},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/cosuleabianca/eea-pm25-models}
}

Links

License

CC BY 4.0 - You are free to share and adapt, with attribution.

Downloads last month
446
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support