Suncast / README.md
ryukkt62's picture
Update README.md
b25e264 verified
---
license: apache-2.0
language:
- en
tags:
- solar-energy
- photovoltaic
- forecasting
- random-forest
- pycaret
- gfs
- nasa-power
- time-series
pipeline_tag: tabular-regression
---
# ☀️ Suncast — Hourly Solar PV Generation Forecasting Model (China Region)
A machine learning model that predicts **hourly solar PV power generation (kWh)** for any location across mainland China, given latitude, longitude, and a date range.
---
## 📌 Model Overview
| Item | Detail |
|------|--------|
| **Task** | Tabular Regression (Solar Irradiance → PV Power) |
| **Algorithm** | Random Forest Regressor (via PyCaret AutoML) |
| **Target Region** | Mainland China (UTC+8) |
| **Temporal Resolution** | 1-hour intervals |
| **Output Unit** | kWh (1 kW standard PV plant) |
| **Training Period** | 2024 full year |
| **Training Samples** | 4,861,296 |
---
## 📊 Performance
| Metric | Value |
|--------|-------|
| **MAE** | 76.19 W/m² |
| **RMSE** | 126.96 W/m² |
| **R²** | 0.748 |
| **MAPE** | 1.49% |
**Notable observations:**
- ✅ High accuracy during summer months (abundant solar irradiance)
- ⚠️ Increased error in winter (low irradiance, high meteorological variability)
- The seasonal structure of the model allows for long-term extensibility
---
## 🗂️ Data Sources
### Input — GFS (Global Forecast System, NOAA)
- Spatial resolution: 1° × 1°
- Temporal resolution: 1 hour
- Coverage: Lat 19°–53° (2° step), Lon 74°–134° (2° step) → 558 grid points
| Variable | Unit |
|----------|------|
| Surface Pressure | Pa |
| Surface Temperature | K |
| Relative Humidity (2m) | % |
| U-Component of Wind (10m) | m/s |
| V-Component of Wind (10m) | m/s |
| Sunshine Duration | s |
| Low / Mid / High Cloud Cover | % |
| Downward Short-Wave Radiation Flux | W/m² |
> GFS DSWRF is a model-simulated value computed via the RRTMG radiation transfer scheme — not a direct satellite measurement.
### Target — NASA POWER / CERES SYN1deg
- Source: CERES SYN1deg (Ed4.x), cross-calibrated with Terra/Aqua CERES, MODIS, and GEO satellites
- Spatial resolution: 1° × 1° (downsampled to 2° × 2°)
- Temporal resolution: 1 hour (linearly interpolated from 3-hour data)
- Time zone: UTC+8 fixed (unified across all of China)
---
## 🧠 Model Training Details
### Feature Engineering
- Spatiotemporal alignment and standardization of GFS input variables
- Added temporal features: `hour_local`, `month_local`, `day_of_year`, `season`
### Candidate Models Compared
- Extra Trees Regressor
- **Random Forest Regressor** ✅ (selected)
- LightGBM
- Gradient Boosting Regressor
Random Forest was selected for its strong resistance to overfitting and balanced performance across all evaluation metrics.
### Training Configuration
| Setting | Value |
|---------|-------|
| Train / Test Split | 80% / 20% |
| Cross-Validation | k-fold (k=10) |
| Hyperparameter Tuning | Grid Search |
---
## ⚡ PV Power Conversion
Predicted solar irradiance (W/m²) is converted to power generation (kWh) using **[pvlib](https://pvlib-python.readthedocs.io/)**.
| Parameter | Value |
|-----------|-------|
| Panel Tilt | 25° |
| Panel Azimuth | 180° (south-facing) |
| Temperature Coefficient | −0.004 /°C |
| Capacity | 1 kW (standard) |
Power generation is set to **0 kWh before 06:00 and after 19:00** (local time).
---
## 🚀 How to Use
### 1. Install dependencies
```bash
pip install huggingface_hub pycaret[full]
```
### 2. Download and load the model from Hugging Face Hub
```python
from huggingface_hub import hf_hub_download
from pycaret.regression import load_model, predict_model
import pandas as pd
# Download model from Hugging Face Hub
model_path = hf_hub_download(
repo_id="ryukkt62/Suncast",
filename="Suncast_v1.pkl"
)
# Load PyCaret pipeline (strip .pkl extension)
model = load_model(model_path.replace(".pkl", ""))
```
### 3. Prepare input features and predict
```python
# Prepare input features
input_data = pd.DataFrame([{
"sp": 101325, # Surface Pressure [Pa]
"t": 300.15, # Surface Temperature [K]
"r2": 60.0, # Relative Humidity [%]
"u10": 2.0, # U-Wind [m/s]
"v10": -1.5, # V-Wind [m/s]
"SUNSD": 3200, # Sunshine Duration [s]
"lcc": 10.0, # Low Cloud Cover [%]
"mcc": 5.0, # Mid Cloud Cover [%]
"hcc": 20.0, # High Cloud Cover [%]
"sdswrf": 650.0, # DSWRF [W/m²]
"hour_local": 12,
"month_local": 7,
"day_of_year": 190
}])
# Predict irradiance → PV power
prediction = predict_model(model, data=input_data)
print(prediction["prediction_label"])
```
> **Note:** The model file is cached locally after the first download (`~/.cache/huggingface/hub/`), so subsequent calls will not re-download.
---
## 📁 Repository Files
| File | Description |
|------|-------------|
| `Suncast_v1.pkl` | Trained PyCaret Random Forest pipeline |
| `config.json` | Model metadata |
---
## ⚠️ Limitations
- Training data is limited to **2024 only** (originally planned for 2020–2024; reduced due to GFS server instability and storage constraints)
- Grid resolution is **2° × 2°** — predictions use the nearest grid point to the input coordinates
- Not applicable outside mainland China grid coverage
---
![image](https://cdn-uploads.huggingface.co/production/uploads/67f388b8f87453e821718bb1/zL4Z6t9R2vO3mcon2NcHE.png)
## 📄 License
This model is released under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0).