| --- |
| license: apache-2.0 |
| language: |
| - en |
| tags: |
| - solar-energy |
| - photovoltaic |
| - forecasting |
| - random-forest |
| - pycaret |
| - gfs |
| - nasa-power |
| - time-series |
| pipeline_tag: tabular-regression |
| --- |
| |
| # ☀️ Suncast — Hourly Solar PV Generation Forecasting Model (China Region) |
|
|
| A machine learning model that predicts **hourly solar PV power generation (kWh)** for any location across mainland China, given latitude, longitude, and a date range. |
|
|
| --- |
|
|
| ## 📌 Model Overview |
|
|
| | Item | Detail | |
| |------|--------| |
| | **Task** | Tabular Regression (Solar Irradiance → PV Power) | |
| | **Algorithm** | Random Forest Regressor (via PyCaret AutoML) | |
| | **Target Region** | Mainland China (UTC+8) | |
| | **Temporal Resolution** | 1-hour intervals | |
| | **Output Unit** | kWh (1 kW standard PV plant) | |
| | **Training Period** | 2024 full year | |
| | **Training Samples** | 4,861,296 | |
|
|
| --- |
|
|
| ## 📊 Performance |
|
|
| | Metric | Value | |
| |--------|-------| |
| | **MAE** | 76.19 W/m² | |
| | **RMSE** | 126.96 W/m² | |
| | **R²** | 0.748 | |
| | **MAPE** | 1.49% | |
|
|
| **Notable observations:** |
| - ✅ High accuracy during summer months (abundant solar irradiance) |
| - ⚠️ Increased error in winter (low irradiance, high meteorological variability) |
| - The seasonal structure of the model allows for long-term extensibility |
|
|
| --- |
|
|
| ## 🗂️ Data Sources |
|
|
| ### Input — GFS (Global Forecast System, NOAA) |
| - Spatial resolution: 1° × 1° |
| - Temporal resolution: 1 hour |
| - Coverage: Lat 19°–53° (2° step), Lon 74°–134° (2° step) → 558 grid points |
|
|
| | Variable | Unit | |
| |----------|------| |
| | Surface Pressure | Pa | |
| | Surface Temperature | K | |
| | Relative Humidity (2m) | % | |
| | U-Component of Wind (10m) | m/s | |
| | V-Component of Wind (10m) | m/s | |
| | Sunshine Duration | s | |
| | Low / Mid / High Cloud Cover | % | |
| | Downward Short-Wave Radiation Flux | W/m² | |
|
|
| > GFS DSWRF is a model-simulated value computed via the RRTMG radiation transfer scheme — not a direct satellite measurement. |
|
|
| ### Target — NASA POWER / CERES SYN1deg |
| - Source: CERES SYN1deg (Ed4.x), cross-calibrated with Terra/Aqua CERES, MODIS, and GEO satellites |
| - Spatial resolution: 1° × 1° (downsampled to 2° × 2°) |
| - Temporal resolution: 1 hour (linearly interpolated from 3-hour data) |
| - Time zone: UTC+8 fixed (unified across all of China) |
|
|
| --- |
|
|
| ## 🧠 Model Training Details |
|
|
| ### Feature Engineering |
| - Spatiotemporal alignment and standardization of GFS input variables |
| - Added temporal features: `hour_local`, `month_local`, `day_of_year`, `season` |
|
|
| ### Candidate Models Compared |
| - Extra Trees Regressor |
| - **Random Forest Regressor** ✅ (selected) |
| - LightGBM |
| - Gradient Boosting Regressor |
|
|
| Random Forest was selected for its strong resistance to overfitting and balanced performance across all evaluation metrics. |
|
|
| ### Training Configuration |
| | Setting | Value | |
| |---------|-------| |
| | Train / Test Split | 80% / 20% | |
| | Cross-Validation | k-fold (k=10) | |
| | Hyperparameter Tuning | Grid Search | |
|
|
| --- |
|
|
| ## ⚡ PV Power Conversion |
|
|
| Predicted solar irradiance (W/m²) is converted to power generation (kWh) using **[pvlib](https://pvlib-python.readthedocs.io/)**. |
|
|
| | Parameter | Value | |
| |-----------|-------| |
| | Panel Tilt | 25° | |
| | Panel Azimuth | 180° (south-facing) | |
| | Temperature Coefficient | −0.004 /°C | |
| | Capacity | 1 kW (standard) | |
|
|
| Power generation is set to **0 kWh before 06:00 and after 19:00** (local time). |
|
|
| --- |
|
|
| ## 🚀 How to Use |
|
|
| ### 1. Install dependencies |
|
|
| ```bash |
| pip install huggingface_hub pycaret[full] |
| ``` |
|
|
| ### 2. Download and load the model from Hugging Face Hub |
|
|
| ```python |
| from huggingface_hub import hf_hub_download |
| from pycaret.regression import load_model, predict_model |
| import pandas as pd |
| |
| # Download model from Hugging Face Hub |
| model_path = hf_hub_download( |
| repo_id="ryukkt62/Suncast", |
| filename="Suncast_v1.pkl" |
| ) |
| |
| # Load PyCaret pipeline (strip .pkl extension) |
| model = load_model(model_path.replace(".pkl", "")) |
| ``` |
|
|
| ### 3. Prepare input features and predict |
|
|
| ```python |
| # Prepare input features |
| input_data = pd.DataFrame([{ |
| "sp": 101325, # Surface Pressure [Pa] |
| "t": 300.15, # Surface Temperature [K] |
| "r2": 60.0, # Relative Humidity [%] |
| "u10": 2.0, # U-Wind [m/s] |
| "v10": -1.5, # V-Wind [m/s] |
| "SUNSD": 3200, # Sunshine Duration [s] |
| "lcc": 10.0, # Low Cloud Cover [%] |
| "mcc": 5.0, # Mid Cloud Cover [%] |
| "hcc": 20.0, # High Cloud Cover [%] |
| "sdswrf": 650.0, # DSWRF [W/m²] |
| "hour_local": 12, |
| "month_local": 7, |
| "day_of_year": 190 |
| }]) |
| |
| # Predict irradiance → PV power |
| prediction = predict_model(model, data=input_data) |
| print(prediction["prediction_label"]) |
| ``` |
|
|
| > **Note:** The model file is cached locally after the first download (`~/.cache/huggingface/hub/`), so subsequent calls will not re-download. |
|
|
| --- |
|
|
| ## 📁 Repository Files |
|
|
| | File | Description | |
| |------|-------------| |
| | `Suncast_v1.pkl` | Trained PyCaret Random Forest pipeline | |
| | `config.json` | Model metadata | |
|
|
| --- |
|
|
| ## ⚠️ Limitations |
|
|
| - Training data is limited to **2024 only** (originally planned for 2020–2024; reduced due to GFS server instability and storage constraints) |
| - Grid resolution is **2° × 2°** — predictions use the nearest grid point to the input coordinates |
| - Not applicable outside mainland China grid coverage |
|
|
| --- |
|
|
|
|
|  |
|
|
| ## 📄 License |
|
|
| This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0). |
|
|