cosuleabianca commited on
Commit
db752fa
·
verified ·
1 Parent(s): 8907d5d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +151 -3
README.md CHANGED
@@ -1,3 +1,151 @@
1
- ---
2
- license: cc-by-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-4.0
3
+ task_categories:
4
+ - time-series-forecasting
5
+ - tabular-regression
6
+ tags:
7
+ - air-quality
8
+ - pm25
9
+ - forecasting
10
+ - environment
11
+ - europe
12
+ language:
13
+ - en
14
+ pretty_name: PM2.5 Air Quality Forecasting Models (Europe)
15
+ ---
16
+
17
+ # PM2.5 Air Quality Forecasting Models
18
+
19
+ Pre-trained models for predicting PM2.5 concentrations 1-24 hours ahead across European cities.
20
+
21
+ ## Model Overview
22
+
23
+ These models were trained on European Environment Agency (EEA) air quality data from 2018-2022 and evaluated on 2023-2024 data. They predict PM2.5 at multiple forecast horizons: **1h, 3h, 6h, 12h, and 24h**.
24
+
25
+ ### Training Data
26
+ - **Countries**: 5 (AT, BE, ES, FI, FR)
27
+ - **Cities**: Wien, Paris, Madrid, Antwerpen, Helsinki
28
+ - **Stations**: 38 monitoring stations
29
+ - **Records**: 1.9M+ hourly observations
30
+
31
+ ## Available Models
32
+
33
+ | Model | Type | File Pattern | Description |
34
+ |-------|------|--------------|-------------|
35
+ | **Linear Regression** | Statistical | `lr_h{horizon}.pkl` | Baseline linear model |
36
+ | **GAM** | Statistical | `gam_h{horizon}.pkl` | Generalized Additive Model |
37
+ | **Random Forest** | ML | `rf_h{horizon}.pkl` | Tuned Random Forest |
38
+ | **XGBoost** | ML | `xgb_h{horizon}.pkl` | Tuned XGBoost |
39
+ | **LightGBM** | ML | `lgb_h{horizon}.pkl` | Tuned LightGBM |
40
+ | **LSTM** | Deep Learning | `lstm_global_h{horizon}.keras` | Basic LSTM (168h lookback) |
41
+ | **LSTM-Residual** | Deep Learning | `lstm_residual_h{horizon}.keras` | Residual connections |
42
+ | **LSTM-Attention** | Deep Learning | `lstm_attention_h{horizon}.keras` | Global attention mechanism |
43
+ | **LSTM-CNN** | Deep Learning | `lstm_cnn_h{horizon}.keras` | Hybrid CNN-LSTM |
44
+
45
+ ## Performance (1-hour horizon)
46
+
47
+ ### Protocol A: Full Dataset (606,635 test samples)
48
+
49
+ | Model | MAE (µg/m³) | RMSE (µg/m³) | R² |
50
+ |-------|-------------|--------------|-----|
51
+ | Persistence | 1.50 | 2.64 | 0.872 |
52
+ | Linear Regression | 1.49 | 2.51 | 0.885 |
53
+ | LightGBM | 1.44 | 2.45 | 0.890 |
54
+
55
+ ### Protocol B: Sequence-Eligible Subset (375,906 test samples)
56
+
57
+ | Model | MAE (µg/m³) | RMSE (µg/m³) | R² |
58
+ |-------|-------------|--------------|-----|
59
+ | LSTM-Attention | 1.19 | 2.18 | 0.916 |
60
+
61
+ *Protocol B uses stations with sufficient sequential data for LSTM (168h+ continuous sequences). See full results in the [GitHub repository](https://github.com/CosuleaBianca/eea-pm25).*
62
+
63
+ ## Usage
64
+
65
+ ### Download Models
66
+
67
+ ```python
68
+ from huggingface_hub import hf_hub_download
69
+
70
+ # Download a specific model
71
+ model_path = hf_hub_download(
72
+ repo_id="cosuleabianca/eea-pm25-models",
73
+ filename="models_lgb/lgb_h1.pkl"
74
+ )
75
+
76
+ # Load with joblib (for sklearn/xgboost/lightgbm models)
77
+ import joblib
78
+ model = joblib.load(model_path)
79
+ ```
80
+
81
+ ### Load Keras Models
82
+
83
+ ```python
84
+ from huggingface_hub import hf_hub_download
85
+ from tensorflow import keras
86
+
87
+ model_path = hf_hub_download(
88
+ repo_id="cosuleabianca/eea-pm25-models",
89
+ filename="lstm_attention_models/lstm_attention_h1.keras"
90
+ )
91
+ model = keras.models.load_model(model_path)
92
+ ```
93
+
94
+ ## Input Features
95
+
96
+ All models expect the same feature set (81 features total):
97
+
98
+ ### Pollutant Features
99
+ - **PM2.5**: lag_1h, lag_2h, lag_3h, lag_6h, lag_12h, lag_24h, lag_168h, rolling_mean_3h/6h/12h/24h, rolling_std_3h/6h/12h/24h
100
+ - **NO2**: current, lags (1h-168h), rolling_mean_3h/6h/12h/24h, rolling_std_3h/6h/12h/24h
101
+ - **PM10**: current, lags (1h-168h), rolling_mean_3h/6h/12h/24h, rolling_std_3h/6h/12h/24h
102
+
103
+ ### Weather Features (Open-Meteo)
104
+ - temperature_2m, relative_humidity_2m, dew_point_2m
105
+ - wind_u, wind_v (east-west and north-south components)
106
+ - precipitation, surface_pressure
107
+
108
+ ### Temporal Features
109
+ - hour_sin, hour_cos, month_sin, month_cos
110
+ - day_of_week, is_weekend, season
111
+
112
+ ### Station Metadata
113
+ - Latitude, Longitude, Altitude
114
+ - StationType (background, industrial, traffic)
115
+ - StationArea (rural, suburban, urban)
116
+
117
+ ## Repository Structure
118
+
119
+ ```
120
+ ├── models_rf/ # Random Forest models
121
+ ├── models_lgb/ # LightGBM models
122
+ ├── models_gam/ # GAM models
123
+ ├── lstm_global_models/ # Basic LSTM
124
+ ├── lstm_residual_models/# Residual LSTM
125
+ ├── lstm_attention_models/# Attention LSTM
126
+ ├── lstm_cnn_models/ # CNN-LSTM hybrid
127
+ └── scalers/ # Per-station scalers (for LSTM)
128
+ ```
129
+
130
+ ## Citation
131
+
132
+ If you use these models, please cite:
133
+
134
+ ```bibtex
135
+ @misc{eea-pm25-forecasting,
136
+ author = {Chisilev Bianca-Iuliana},
137
+ title = {PM2.5 Air Quality Forecasting Models for Europe},
138
+ year = {2025},
139
+ publisher = {Hugging Face},
140
+ url = {https://huggingface.co/cosuleabianca/eea-pm25-models}
141
+ }
142
+ ```
143
+
144
+ ## Links
145
+
146
+ - **GitHub Repository**: [Github repository](https://github.com/CosuleaBianca/eea-pm25)
147
+ - **Dataset**: [Dataset](https://huggingface.co/datasets/cosuleabianca/eea-pm25-forecasting)
148
+
149
+ ## License
150
+
151
+ CC BY 4.0 - You are free to share and adapt, with attribution.