--- license: apache-2.0 language: - en tags: - solar-energy - photovoltaic - forecasting - random-forest - pycaret - gfs - nasa-power - time-series pipeline_tag: tabular-regression --- # ☀️ Suncast — Hourly Solar PV Generation Forecasting Model (China Region) A machine learning model that predicts **hourly solar PV power generation (kWh)** for any location across mainland China, given latitude, longitude, and a date range. --- ## 📌 Model Overview | Item | Detail | |------|--------| | **Task** | Tabular Regression (Solar Irradiance → PV Power) | | **Algorithm** | Random Forest Regressor (via PyCaret AutoML) | | **Target Region** | Mainland China (UTC+8) | | **Temporal Resolution** | 1-hour intervals | | **Output Unit** | kWh (1 kW standard PV plant) | | **Training Period** | 2024 full year | | **Training Samples** | 4,861,296 | --- ## 📊 Performance | Metric | Value | |--------|-------| | **MAE** | 76.19 W/m² | | **RMSE** | 126.96 W/m² | | **R²** | 0.748 | | **MAPE** | 1.49% | **Notable observations:** - ✅ High accuracy during summer months (abundant solar irradiance) - ⚠️ Increased error in winter (low irradiance, high meteorological variability) - The seasonal structure of the model allows for long-term extensibility --- ## 🗂️ Data Sources ### Input — GFS (Global Forecast System, NOAA) - Spatial resolution: 1° × 1° - Temporal resolution: 1 hour - Coverage: Lat 19°–53° (2° step), Lon 74°–134° (2° step) → 558 grid points | Variable | Unit | |----------|------| | Surface Pressure | Pa | | Surface Temperature | K | | Relative Humidity (2m) | % | | U-Component of Wind (10m) | m/s | | V-Component of Wind (10m) | m/s | | Sunshine Duration | s | | Low / Mid / High Cloud Cover | % | | Downward Short-Wave Radiation Flux | W/m² | > GFS DSWRF is a model-simulated value computed via the RRTMG radiation transfer scheme — not a direct satellite measurement. ### Target — NASA POWER / CERES SYN1deg - Source: CERES SYN1deg (Ed4.x), cross-calibrated with Terra/Aqua CERES, MODIS, and GEO satellites - Spatial resolution: 1° × 1° (downsampled to 2° × 2°) - Temporal resolution: 1 hour (linearly interpolated from 3-hour data) - Time zone: UTC+8 fixed (unified across all of China) --- ## 🧠 Model Training Details ### Feature Engineering - Spatiotemporal alignment and standardization of GFS input variables - Added temporal features: `hour_local`, `month_local`, `day_of_year`, `season` ### Candidate Models Compared - Extra Trees Regressor - **Random Forest Regressor** ✅ (selected) - LightGBM - Gradient Boosting Regressor Random Forest was selected for its strong resistance to overfitting and balanced performance across all evaluation metrics. ### Training Configuration | Setting | Value | |---------|-------| | Train / Test Split | 80% / 20% | | Cross-Validation | k-fold (k=10) | | Hyperparameter Tuning | Grid Search | --- ## ⚡ PV Power Conversion Predicted solar irradiance (W/m²) is converted to power generation (kWh) using **[pvlib](https://pvlib-python.readthedocs.io/)**. | Parameter | Value | |-----------|-------| | Panel Tilt | 25° | | Panel Azimuth | 180° (south-facing) | | Temperature Coefficient | −0.004 /°C | | Capacity | 1 kW (standard) | Power generation is set to **0 kWh before 06:00 and after 19:00** (local time). --- ## 🚀 How to Use ### 1. Install dependencies ```bash pip install huggingface_hub pycaret[full] ``` ### 2. Download and load the model from Hugging Face Hub ```python from huggingface_hub import hf_hub_download from pycaret.regression import load_model, predict_model import pandas as pd # Download model from Hugging Face Hub model_path = hf_hub_download( repo_id="ryukkt62/Suncast", filename="Suncast_v1.pkl" ) # Load PyCaret pipeline (strip .pkl extension) model = load_model(model_path.replace(".pkl", "")) ``` ### 3. Prepare input features and predict ```python # Prepare input features input_data = pd.DataFrame([{ "sp": 101325, # Surface Pressure [Pa] "t": 300.15, # Surface Temperature [K] "r2": 60.0, # Relative Humidity [%] "u10": 2.0, # U-Wind [m/s] "v10": -1.5, # V-Wind [m/s] "SUNSD": 3200, # Sunshine Duration [s] "lcc": 10.0, # Low Cloud Cover [%] "mcc": 5.0, # Mid Cloud Cover [%] "hcc": 20.0, # High Cloud Cover [%] "sdswrf": 650.0, # DSWRF [W/m²] "hour_local": 12, "month_local": 7, "day_of_year": 190 }]) # Predict irradiance → PV power prediction = predict_model(model, data=input_data) print(prediction["prediction_label"]) ``` > **Note:** The model file is cached locally after the first download (`~/.cache/huggingface/hub/`), so subsequent calls will not re-download. --- ## 📁 Repository Files | File | Description | |------|-------------| | `Suncast_v1.pkl` | Trained PyCaret Random Forest pipeline | | `config.json` | Model metadata | --- ## ⚠️ Limitations - Training data is limited to **2024 only** (originally planned for 2020–2024; reduced due to GFS server instability and storage constraints) - Grid resolution is **2° × 2°** — predictions use the nearest grid point to the input coordinates - Not applicable outside mainland China grid coverage --- ![image](https://cdn-uploads.huggingface.co/production/uploads/67f388b8f87453e821718bb1/zL4Z6t9R2vO3mcon2NcHE.png) ## 📄 License This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).