peterdudfield commited on Jun 11, 2025

Commit

b74423e

1 Parent(s): 7b19f32

Delete experiments

Browse files

Files changed (28) hide show

experiments/india/001_v1/india_pv_wind.md +0 -69
experiments/india/002_wind_meteomatics/india_windnet_v2.md +0 -46
experiments/india/003_wind_plevels/MAE.png +0 -3
experiments/india/003_wind_plevels/MAEvstimesteps.png +0 -3
experiments/india/003_wind_plevels/p10.png +0 -3
experiments/india/003_wind_plevels/p50.png +0 -3
experiments/india/003_wind_plevels/plevel.md +0 -54
experiments/india/004_n_training_samples/log-plot.py +0 -14
experiments/india/004_n_training_samples/mae_samples.png +0 -0
experiments/india/004_n_training_samples/mae_step.png +0 -3
experiments/india/004_n_training_samples/readme.md +0 -48
experiments/india/005_extra_nwp_variables/mae_steps.png +0 -3
experiments/india/005_extra_nwp_variables/mae_steps_grouped.png +0 -3
experiments/india/005_extra_nwp_variables/readmd.md +0 -55
experiments/india/006_da_only/bad.png +0 -3
experiments/india/006_da_only/da_only.md +0 -37
experiments/india/006_da_only/good.png +0 -3
experiments/india/006_da_only/mae_steps.png +0 -3
experiments/india/007_different_seeds/mae_all_steps.png +0 -3
experiments/india/007_different_seeds/mae_steps.png +0 -3
experiments/india/007_different_seeds/readme.md +0 -33
experiments/india/008_coarse4/mae_step.png +0 -3
experiments/india/008_coarse4/mae_step_smooth.png +0 -3
experiments/india/008_coarse4/readme.md +0 -77
experiments/mae_analysis.py +0 -152
experiments/uk/011 - Extending forecast to 36 hours (updated ECMWF data)/PVNEt_national_XG_comparison.png +0 -3
experiments/uk/011 - Extending forecast to 36 hours (updated ECMWF data)/PVNet_day_ahead.md +0 -22
experiments/uk/011 - Extending forecast to 36 hours (updated ECMWF data)/PVNets_comparison.png +0 -3

experiments/india/001_v1/india_pv_wind.md DELETED Viewed

@@ -1,69 +0,0 @@
-# PVNet for Wind and PV Sites in India
-## PVNet for sites
-### Data
-We use PV generation data for India from April 2019-Nov 2022 for training
-and Dec 2022- Nov 2023 for validation. This is only with ECMWF data, and PV generation history.
-The forecast is every 15 minutes for 48 hours for PV generation.
-The input NWP data is hourly, and 32x32 pixels (corresponding to around 320kmx320km) around a central
-point in NW-India.
-[WandB Link](https://wandb.ai/openclimatefix/pvnet_india2.1/runs/o4xpvzrc)
-### Results
-Overall MAE is 4.9% on the validation set, and forecasts look overall good.
-![batch_idx_1_all_892_2ca7e12db5de2cf2e244](https://github.com/openclimatefix/PVNet/assets/7170359/07e8199a-11b5-4400-9897-37b7738a4f39)
-![W B Chart 05_02_2024, 10_07_12_pvnet](https://github.com/openclimatefix/PVNet/assets/7170359/abaefdc1-dedd-4a12-8a26-afaf36d7786b)
-## WindNet
-### April-29-2024 WindNet v1 Production Model
-[WandB Link](https://wandb.ai/openclimatefix/india/runs/5llq8iw6)
-Improvements: Larger input size (64x64), 7 hour delay for ECMWF NWP inputs, to match productions.
-New, much more efficient encoder for NWP, allowing for more filters and layers, with less parameters.
-The 64x64 input size corresponds to 6.4 degrees x 6.4 degrees, which is around 700km x 700km. This allows for the
-model to see the wind over the wind generation sites, which seems to be the biggest reason for the improvement in the model.
-MAE is 7.6% with real improvements on the production side of things.
-There were other experiments with slightly different numbers of filters, model parameters and the like, but generally no
-improvements were seen.
-## WindNet v1 Results
-### Data
-We use Wind generation data for India from April 2019-Nov 2022 for training
-and Dec 2022- Nov 2023 for validation. This is only with ECMWF data, and Wind generation history.
-The forecast is every 15 minutes for 48 hours for Wind generation.
-The input NWP data is hourly, and 32x32 pixels (corresponding to around 320kmx320km) around a central
-point in NW-India. Note: The majority of the wind generation is likely not covered in the 320kmx320km area.
-[WandB Link](https://wandb.ai/openclimatefix/pvnet_india2.1/runs/otdx7axx)
-### Results
-![W B Chart 05_02_2024, 10_05_19](https://github.com/openclimatefix/PVNet/assets/7170359/6a8cd9c5-bdfe-41ab-996d-37fd1be2a07c)
-![W B Chart 05_02_2024, 10_06_51_windnet](https://github.com/openclimatefix/PVNet/assets/7170359/77554ef0-4411-4432-af95-8530aef4a701)
-![batch_idx_1_all_1730_379a9f881a7f01153f98](https://github.com/openclimatefix/PVNet/assets/7170359/243d9f3e-4cb9-405e-80c5-40c6c218c17f)
-MAE is around 10% overall, although it doesn't seem to do very well on the ramps up and down.

experiments/india/002_wind_meteomatics/india_windnet_v2.md DELETED Viewed

@@ -1,46 +0,0 @@
-### WindNet v2 Meteomatics + ECMWF Model
-[WandB Linl](https://wandb.ai/openclimatefix/india/runs/v3mja33d)
-This newest experiment uses Meteomatics data in addition to ECMWF data. The Meteomatics data is at specific locations corresponding
-to the gneeration sites we know about. It is smartly downscaled ECMWF data, down to 15 minutes and at a few height levels we are
-interested in, primarily 10m, 100m, and 200m. The Meteomatics data is a semi-reanalysis, with each block of 6 hours being from one forecast run.
-For example, in one day, hours 00-06 are from the same, 00 forecast run, and hours 06-12 are from the 06 forecast run. This is important to note
-as it is both not a real reanalysis, but we also can't have it exactly match the live data, as any forecast steps beyond 6 hours are thrown away.
-This does mean that these results should be taken as a best case or better than best case scenario, as every 6 hour, observations from the future
-are incorporated into the Meteomatics input data from the next NWP mode run.
-For the purposes of WindNet, Meteomatics data is treated as Sensor data that goes into the future.
-The model encodes the sensor information the same way as for the historical PV, Wind, and GSP generation, and has
-a simple, single attention head to encode the information. This is then concatenated along with the rest of the data, like in
-previous experiments.
-This model also has an even larger input size of ECMWF data, 81x81 pixels, corresponding to around 810kmx810km.
-![Screenshot_20240430_082855](https://github.com/openclimatefix/PVNet/assets/7170359/6981a088-8664-474b-bfea-c94c777fc119)
-MAE is 7.0% on the validation set, showing a slight improvement over the previous model.
-Comperison  with the production model:
-| Timestep | Prod MAE % | No Meteomatics MAE % | Meteomatics MAE % |
-| --- | --- | --- | --- |
-| 0-0 minutes | 7.586 | 5.920 | 2.475 |
-| 15-15 minutes | 8.021 | 5.809 | 2.968 |
-| 30-45 minutes | 7.233 | 5.742 | 3.472 |
-| 45-60 minutes | 7.187 | 5.698 | 3.804 |
-| 60-120 minutes | 7.231 | 5.816 | 4.650 |
-| 120-240 minutes | 7.287 | 6.080 | 6.028 |
-| 240-360 minutes | 7.319 | 6.375 | 6.738 |
-| 360-480 minutes | 7.285 | 6.638 | 6.964 |
-| 480-720 minutes | 7.143 | 6.747 | 6.906 |
-| 720-1440 minutes | 7.380 | 7.207 | 6.962 |
-| 1440-2880 minutes | 7.904 | 7.507 | 7.507 |
-![mae_per_timestep](https://github.com/openclimatefix/PVNet/assets/7170359/e3c942e8-65c6-4b95-8c51-f25d43e7a082)
-Example plot
-![Screenshot_20240430_082937](https://github.com/openclimatefix/PVNet/assets/7170359/88db342e-bf82-414e-8255-5ad4af659fb8)

experiments/india/003_wind_plevels/MAE.png DELETED Viewed

Git LFS Details

SHA256: b06d6f85c2ee708e9555969afd622353b950a744f604d6c31d3c32d9b1543c23
Pointer size: 131 Bytes
Size of remote file: 174 kB

experiments/india/003_wind_plevels/MAEvstimesteps.png DELETED Viewed

Git LFS Details

SHA256: 3646fe682b4d13b2e00d68cf6d19dec9d00e6c56cc4d3995c3903920b35b8707
Pointer size: 131 Bytes
Size of remote file: 219 kB

experiments/india/003_wind_plevels/p10.png DELETED Viewed

Git LFS Details

SHA256: cce6f27ce1bafc89e9b5cb75cc2dad7c1053bea931ea4f5dfa5a1ef404d1042b
Pointer size: 131 Bytes
Size of remote file: 150 kB

experiments/india/003_wind_plevels/p50.png DELETED Viewed

Git LFS Details

SHA256: ceae23a3f91f6bc56cf688bdbcaf5172f1a54736e412c5f0e80d8c056f7d9754
Pointer size: 131 Bytes
Size of remote file: 229 kB

experiments/india/003_wind_plevels/plevel.md DELETED Viewed

@@ -1,54 +0,0 @@
-# Running WindNet for RUVNL for diferent Plevels
-https://wandb.ai/openclimatefix/india/runs/5llq8iw6 is the current production one
-This has 7 plevels and a small patch size.
-## Experiments
-1. Only used plevel 50 (orange)
-https://wandb.ai/openclimatefix/india/runs/ziudzweq/
-2. Use plevels of [2, 10, 25, 50, 75, 90, 98]. This is what is already used. (green)
-https://wandb.ai/openclimatefix/india/runs/xdlew7ib
-3. Use plevels of [1, 02, 10, 20, 25, 30, 40, 50, 60, 70, 75, 80 (brown)
-, 90, 98, 99]
-https://wandb.ai/openclimatefix/india/runs/pcr2zsrc
-## Training
-Each epoch took about ~4 hours, so the training runs took several days.
-TODO add number of samples
-## Results
-MAE results show that using the plevel of 50 only, gives better results
-![](Mae.png "Mae")
-The p50 results are about the same
-![](p50.png "p50")
-We can see that for p10 the results are not right, as they should converge to 0.1
-![](p10.png "p10")
-Interestingly the more plevels you have the better the results are for before 4 hours
-but the less plevels you have the better the results for >= 8 hours.
-| Timestep | P50 only MAE % | 7 plevels MAE % | 15 plevel MAE %  | 7 plevels small patch MAE % |
-| --- | --- | --- |  --- | --- |
-| 0-0 minutes | 5.416 | 5.920 | 3.933 | 7.586 |
-| 15-15 minutes | 5.458 | 5.809 | 4.003 | 8.021 |
-| 30-45 minutes | 5.525 | 5.742 | 4.442 | 7.233 |
-| 45-60 minutes | 5.595 | 5.698 | 4.772 | 7.187 |
-| 60-120 minutes | 5.890 | 5.816 | 5.307 | 7.231 |
-| 120-240 minutes | 6.423 | 6.080 | 6.275 | 7.287 |
-| 240-360 minutes | 6.608 | 6.375 | 6.707 | 7.319 |
-| 360-480 minutes | 6.728 | 6.638 | 6.904 | 7.285 |
-| 480-720 minutes | 6.634 | 6.747 | 6.872 | 7.143 |
-| 720-1440 minutes | 6.940 | 7.207 | 7.176 | 7.380 |
-| 1440-2880 minutes | 7.446 | 7.507 | 7.735 | 7.904 |
-![](MAEvstimesteps.png "MAEvstimesteps")

experiments/india/004_n_training_samples/log-plot.py DELETED Viewed

@@ -1,14 +0,0 @@
-""" Small script to make MAE vs number of batches plot"""
-import pandas as df
-import plotly.graph_objects as go
-data = [[100, 7.779], [300, 7.441], [1000, 7.181], [3000, 7.180], [6711, 7.151]]
-df = df.DataFrame(data, columns=["n_samples", "MAE [%]"])
-fig = go.Figure()
-fig.add_trace(go.Scatter(x=df["n_samples"], y=df["MAE [%]"], mode="lines+markers"))
-fig.update_layout(title="MAE % for N samples", xaxis_title="N Samples", yaxis_title="MAE %")
-# change to log log
-fig.update_xaxes(type="log")
-fig.show(renderer="browser")

experiments/india/004_n_training_samples/mae_samples.png DELETED Viewed

Binary file (77 kB)

experiments/india/004_n_training_samples/mae_step.png DELETED Viewed

Git LFS Details

SHA256: 3a3180a382e4b2c1534524f92a633d488912475a1e8a4effb0b28caf44368834
Pointer size: 131 Bytes
Size of remote file: 325 kB

experiments/india/004_n_training_samples/readme.md DELETED Viewed

@@ -1,48 +0,0 @@
-# N samples experiments
-Kicked off an experiment that uses N samples
-This is done by adding `limit_train_batches` to the `trainer/default.yaml`.
-I checked that when limiting the batches, the same batches are shown to model for each epoch.
-## Experiments
-Original is 6711 batches
-- 100: 3p6scx2r
-- 300: am46tno1
-- 1000: u04xlb6p
-- 3000: p11lhreo
-## Results
-Overall
-| Experiment | MAE % |
-|------------|-------|
-| 100        | 7.779 |
-| 300        | 7.441 |
-| 1000       | 7.181 |
-| 3000       | 7.180 |
-| 6711       | 7.151 |
-Results by timestamps
-| Timestep | 100 MAE % | 300 MAE % | 1000 MAE % | 3000 MAE % | 6711 MAE % |
-| --- | --- | --- | --- | --- | --- |
-| 0-0 minutes | 7.985 | 7.453 | 7.155 | 5.553 | 5.920 |
-| 15-15 minutes | 7.953 | 7.055 | 6.923 | 5.453 | 5.809 |
-| 30-45 minutes | 8.043 | 7.172 | 6.907 | 5.764 | 5.742 |
-| 45-60 minutes | 7.850 | 7.070 | 6.790 | 5.815 | 5.698 |
-| 60-120 minutes | 7.698 | 6.809 | 6.597 | 5.890 | 5.816 |
-| 120-240 minutes | 7.355 | 6.629 | 6.495 | 6.221 | 6.080 |
-| 240-360 minutes | 7.230 | 6.729 | 6.559 | 6.541 | 6.375 |
-| 360-480 minutes | 7.415 | 6.997 | 6.770 | 6.855 | 6.638 |
-| 480-720 minutes | 7.258 | 7.037 | 6.668 | 6.876 | 6.747 |
-| 720-1440 minutes | 7.659 | 7.362 | 7.038 | 7.142 | 7.207 |
-| 1440-2880 minutes | 8.027 | 7.745 | 7.518 | 7.535 | 7.507 |
-![](mae_step.png "mae_steps")
-![](mae_samples.png "mae_samples")

experiments/india/005_extra_nwp_variables/mae_steps.png DELETED Viewed

Git LFS Details

SHA256: 0ef7f7af4dafe38aac5a5df6cc74acc606cb4f0a1a9fc78972b09d68dd7574ad
Pointer size: 131 Bytes
Size of remote file: 215 kB

experiments/india/005_extra_nwp_variables/mae_steps_grouped.png DELETED Viewed

Git LFS Details

SHA256: 547d3aafbb1658602fe03ea1677589de4e208467756e9ce9cd1d8727f364dffa
Pointer size: 131 Bytes
Size of remote file: 133 kB

experiments/india/005_extra_nwp_variables/readmd.md DELETED Viewed

@@ -1,55 +0,0 @@
-# Adding extra nwp variables
-I wanted to run Windnet but testing some new nwp variables from ecmwf
-General conclusion, although more experiments could be done.
-The current nwp variables are about right.
-If you add lots it makes it worse.
-If you take some away, it makes it worse.
-## Bugs
-Ran into a problem where found that some xamples have
-`d.__getitem__('nwp-ecmwf__init_time_utc').values` had size 50, where it should be just one values. I removed these examples. This might
-## Experiments
-The number of samples were 8000 when training.
-### 15 variablles
-Run windnet with `'hcc', 'lcc', 'mcc', 'prate', 'sde', 'sr', 't2m', 'tcc', 'u10',
-       'v10', 'u100', 'v100', 'u200', 'v200', 'dlwrf', 'dswrf'`.
-The experiment on wandb is [here](https://wandb.ai/openclimatefix/india/runs/k91rdffo)
-### 7 variables
-Run windnet with the original 7 variables.
-`t2m, u10, u100, u200, v10, v100, v200  `
-The experiment on wandb is [here](https://wandb.ai/openclimatefix/india/runs/miszfep5)
-### 3 variables
-Run windnet with only `t, u10, v100`
-The experiment on wandb is [here](https://wandb.ai/openclimatefix/india/runs/22v3a39g)
-## Results
-| Timestep | 15 MAE % | 7 MAE % | 3 MAE % |
-| --- | --- | --- | --- |
-| 0-0 minutes | 7.450 | 6.623 | 7.529 |
-| 15-15 minutes | 7.348 | 6.441 | 7.408 |
-| 30-45 minutes | 7.242 | 6.544 | 7.294 |
-| 45-60 minutes | 7.134 | 6.567 | 7.185 |
-| 60-120 minutes | 7.058 | 6.295 | 7.009 |
-| 120-240 minutes | 6.965 | 6.290 | 6.800 |
-| 240-360 minutes | 6.807 | 6.374 | 6.580 |
-| 360-480 minutes | 6.749 | 6.482 | 6.548 |
-| 480-720 minutes | 6.892 | 6.686 | 6.685 |
-| 720-1440 minutes | 7.020 | 6.756 | 6.780 |
-| 1440-2880 minutes | 7.445 | 7.095 | 7.214 |
-![](mae_steps_grouped.png "mae_steps")
-The raw data is here
-![](mae_steps.png "mae_steps")

experiments/india/006_da_only/bad.png DELETED Viewed

Git LFS Details

SHA256: 37cbbf51e7fa7dceb8b2074419267b4bde8186ddcd40b4a49c085735fdf72e43
Pointer size: 131 Bytes
Size of remote file: 358 kB

experiments/india/006_da_only/da_only.md DELETED Viewed

@@ -1,37 +0,0 @@
-## DA forecasts only
-The idea was to create a forecast for DA (day-ahead) only for Windnet.
-We hope this would bring down the DA MAE values.
-We do this by not forecasting the first X hours.
-Unfortunately, it doesnt not look like ignore X hours, make the DA forecast better.
-## Experiments
-1. Baseline - [here](https://wandb.ai/openclimatefix/india/runs/miszfep5)
-2. Ignore first 6 hours - [here](https://wandb.ai/openclimatefix/india/runs/uosk0qug)
-3. Ignore first 12 hours - [here](https://wandb.ai/openclimatefix/india/runs/s9cnn4ei)
-## Results
-| Timestep | all MAE % | 6 MAE % | 12 MAE % |
-| --- | --- |---------|---------|
-| 0-0 minutes | nan | nan     | nan     |
-| 15-15 minutes | nan | nan     | nan     |
-| 30-45 minutes | 0.065 | nan     | nan     |
-| 45-60 minutes | 0.066 | nan     | nan     |
-| 60-120 minutes | 0.063 | nan     | nan     |
-| 120-240 minutes | 0.063 | nan     | nan     |
-| 240-360 minutes | 0.064 | nan     | nan     |
-| 360-480 minutes | 0.065 | 0.068   | nan     |
-| 480-720 minutes | 0.067 | 0.065   | nan     |
-| 720-1440 minutes | 0.068 | 0.065   | 0.065   |
-| 1440-2880 minutes | 0.071 | 0.071   | 0.071   |
-![](mae_steps.png "mae_steps")
-Here's two examples from the 6 hour ignore model, one that forecated it well, one that didnt
-![](bad.png "bad")
-![](good.png "good")

experiments/india/006_da_only/good.png DELETED Viewed

Git LFS Details

SHA256: 5f4b6a11ac1560dbea1214ce381602b9eab7334a74110052dda072f0f53c3de8
Pointer size: 131 Bytes
Size of remote file: 424 kB

experiments/india/006_da_only/mae_steps.png DELETED Viewed

Git LFS Details

SHA256: 5ca49fbc24530c3d75d0ec5cd2ba6345082c1747a600143afc40faf7bade0cd6
Pointer size: 131 Bytes
Size of remote file: 122 kB

experiments/india/007_different_seeds/mae_all_steps.png DELETED Viewed

Git LFS Details

SHA256: b06eaa2f75d645185bea5b874d6020bae3bccd7de25ec519cf348cde511f27c6
Pointer size: 131 Bytes
Size of remote file: 203 kB

experiments/india/007_different_seeds/mae_steps.png DELETED Viewed

Git LFS Details

SHA256: 3adfaa5394e9f45c684812e47e385c25d1796a6c772d04f4e7a3cbcbeffafda3
Pointer size: 131 Bytes
Size of remote file: 130 kB

experiments/india/007_different_seeds/readme.md DELETED Viewed

@@ -1,33 +0,0 @@
-# Training models with different seeds
-Want to see the effect or training a model with different seeds.
-We can see that the results for different seeds can vary by 0.5%,
-and some models being better at different time horizons than others
-## Experiments
-- seed 1 - [miszfep5](https://wandb.ai/openclimatefix/india/runs/miszfep5)
-- seed 2 - [cxshv2q4](https://wandb.ai/openclimatefix/india/runs/cxshv2q4)
-- seed 3 - [m46wdrr7](https://wandb.ai/openclimatefix/india/runs/m46wdrr7)
-These were trained with 1000 batches, and 300 batches for validation
-## Results
-| Timestep | s1 MAE % | s2 MAE % | s3 MAE % |
-| --- | --- | --- | --- |
-| 0-0 minutes | 0.066 | 0.061 | 0.066 |
-| 15-15 minutes | 0.064 | 0.058 | 0.064 |
-| 30-45 minutes | 0.065 | 0.060 | 0.063 |
-| 45-60 minutes | 0.066 | 0.060 | 0.063 |
-| 60-120 minutes | 0.063 | 0.060 | 0.063 |
-| 120-240 minutes | 0.063 | 0.063 | 0.065 |
-| 240-360 minutes | 0.064 | 0.066 | 0.065 |
-| 360-480 minutes | 0.065 | 0.066 | 0.066 |
-| 480-720 minutes | 0.067 | 0.066 | 0.065 |
-| 720-1440 minutes | 0.068 | 0.068 | 0.066 |
-| 1440-2880 minutes | 0.071 | 0.072 | 0.071 |
-![](mae_steps.png "mae_steps")
-![](mae_all_steps.png "mae_steps")

experiments/india/008_coarse4/mae_step.png DELETED Viewed

Git LFS Details

SHA256: 52e85df6c2ed7865e0f6f412ae47e7e5f0a1b12550b72702ebe7e166dec53636
Pointer size: 131 Bytes
Size of remote file: 179 kB

experiments/india/008_coarse4/mae_step_smooth.png DELETED Viewed

Git LFS Details

SHA256: 38e2772ac0c28684a10f8fc98fc55afc0401a403d025ac6e3d97a9d328ab8624
Pointer size: 131 Bytes
Size of remote file: 140 kB

experiments/india/008_coarse4/readme.md DELETED Viewed

@@ -1,77 +0,0 @@
-# Coarser data and more examples
-We downsampled the ECMWF data from 0.05 to 0.2.
-In previous experiments we used a 0.1 resolution, as this is the same as the live ECMWF data.
-By reducing the resolution we can increase the number of samples we have to train on.
-We used 41408 number of samples to train, and 10352 samples to validate
-This is approximately 5 times more samples than the previous experiments.
-## Experiments
-### b8_s1
-Batche size 8, with 0.2 degree NWP data.
-https://wandb.ai/openclimatefix/india/runs/w85hftb6
-### b8_s2
-Batch size 8, different seed, with 0.2 degree NWP data.
-https://wandb.ai/openclimatefix/india/runs/k4x1tunj
-### b32_s3
-Batch size 32, with 0.2 degree NWP data. Also kept the learning rate a bit higher
-https://wandb.ai/openclimatefix/india/runs/ktale7pa
-### epochs
-We set the early stopping epochs from 10 to 15. This should mean model will train a bit more
-https://wandb.ai/openclimatefix/india/runs/8hfc83uv
-### small model
-We made the model about 50% of the size by reduce the reducing the channels in the NWP encoder fomr 256 to 64 and reducing the hidden features in the output network fomr 1024 to 256
-https://wandb.ai/openclimatefix/india/runs/sk5ek3pk
-### early stopping on MAE/val
-Changing from quantile_loss to MAE/val to stop early on. This should mean the model does more training epochs, and the results we are interested int.
-https://wandb.ai/openclimatefix/india/runs/a5nkkzj6
-### old
-Old experiment with 0.1 degree NWP data.
-https://wandb.ai/openclimatefix/india/runs/m46wdrr7.
-Note the validation batches are different that the experiments above.
-Interesting the GPU memory did not increase much better experiments 2 and 3.
-Need to check that 32 batches were being passed through.
-## Results
-The coarsening data does seem to improve the experiments results in the first 10 hours of the forecast.
-DA forecast looks very similar. Note the 0 hour forecast has a large amount of variation.
-Still spike results in the individual runs.
-| Timestep | b8_s1 MAE % | b8_s2 MAE % | b32_s3 MAE % | epochs MAE % | small MAE % | mae/val MAE % | old MAE % |
-| --- | --- | --- | --- | --- | --- | --- | --- |
-| 0-0 minutes | 0.052 | 0.047 | 0.027 | 0.030 | 0.041 | 0.041 | 0.066 |
-| 15-15 minutes | 0.052 | 0.049 | 0.031 | 0.033 | 0.041 | 0.041 | 0.064 |
-| 30-45 minutes | 0.052 | 0.051 | 0.037 | 0.039 | 0.043 | 0.043 | 0.063 |
-| 45-60 minutes | 0.053 | 0.052 | 0.040 | 0.043 | 0.044 | 0.044 | 0.063 |
-| 60-120 minutes | 0.056 | 0.054 | 0.048 | 0.052 | 0.048 | 0.048 | 0.063 |
-| 120-240 minutes | 0.061 | 0.060 | 0.060 | 0.064 | 0.057 | 0.057 | 0.065 |
-| 240-360 minutes | 0.061 | 0.062 | 0.063 | 0.065 | 0.061 | 0.061 | 0.065 |
-| 360-480 minutes | 0.062 | 0.062 | 0.062 | 0.063 | 0.063 | 0.063 | 0.066 |
-| 480-720 minutes | 0.063 | 0.063 | 0.062 | 0.064 | 0.064 | 0.064 | 0.065 |
-| 720-1440 minutes | 0.065 | 0.066 | 0.065 | 0.067 | 0.066 | 0.066 | 0.066 |
-| 1440-2880 minutes | 0.069 | 0.070 | 0.071 | 0.071 | 0.071 | 0.071 | 0.071 |
-![](mae_step.png "mae_steps")
-![](mae_step_smooth.png "mae_steps")
-I think its worth noting the model traing MAE is around `3`% and the validation MAE is about `7`%, so there is good reason to believe that the model is over fit to the trianing set.
-It would be good to plot some of the trainin examples, to see if they are less spiky.

experiments/mae_analysis.py DELETED Viewed

@@ -1,152 +0,0 @@
-"""
-Script to generate analysis of MAE values for multiple model forecasts
-Does this for 48 hour horizon forecasts with 15 minute granularity
-"""
-import argparse
-import matplotlib
-import matplotlib.pyplot as plt
-import numpy as np
-import pandas as pd
-import wandb
-matplotlib.rcParams["axes.prop_cycle"] = matplotlib.cycler(
-    color=[
-        "FFD053",  # yellow
-        "7BCDF3",  # blue
-        "63BCAF",  # teal
-        "086788",  # dark blue
-        "FF9736",  # dark orange
-        "E4E4E4",  # grey
-        "14120E",  # black
-        "FFAC5F",  # orange
-        "4C9A8E",  # dark teal
-    ]
-)
-def main(project: str, runs: list[str], run_names: list[str]) -> None:
-    """
-    Compare MAE values for multiple model forecasts for 48 hour horizon with 15 minute granularity
-    Args:
-            project: name of W&B project
-            runs: W&B ids of runs
-            run_names: user specified names for runs
-    """
-    api = wandb.Api()
-    dfs = []
-    epoch_num = []
-    for run in runs:
-        run = api.run(f"openclimatefix/{project}/{run}")
-        df = run.history(samples=run.lastHistoryStep + 1)
-        # Get the columns that are in the format 'MAE_horizon/step_<number>/val`
-        mae_cols = [col for col in df.columns if "MAE_horizon/step_" in col and "val" in col]
-        # Sort them
-        mae_cols.sort()
-        df = df[mae_cols]
-        # Get last non-NaN value
-        # Drop all rows with all NaNs
-        df = df.dropna(how="all")
-        # Select the last row
-        # Get average across entire row, and get the IDX for the one with the smallest values
-        min_row_mean = np.inf
-        for idx, (row_idx, row) in enumerate(df.iterrows()):
-            if row.mean() < min_row_mean:
-                min_row_mean = row.mean()
-                min_row_idx = idx
-        df = df.iloc[min_row_idx]
-        # Calculate the timedelta for each group
-        # Get the step from the column name
-        column_timesteps = [int(col.split("_")[-1].split("/")[0]) * 15 for col in mae_cols]
-        dfs.append(df)
-        epoch_num.append(min_row_idx)
-    # Get the timedelta for each group
-    groupings = [
-        [0, 0],
-        [15, 15],
-        [30, 45],
-        [45, 60],
-        [60, 120],
-        [120, 240],
-        [240, 360],
-        [360, 480],
-        [480, 720],
-        [720, 1440],
-        [1440, 2880],
-    ]
-    groups_df = []
-    grouping_starts = [grouping[0] for grouping in groupings]
-    header = "| Timestep |"
-    separator = "| --- |"
-    for run_name in run_names:
-        header += f" {run_name} MAE % |"
-        separator += " --- |"
-    print(header)
-    print(separator)
-    for grouping in groupings:
-        group_string = f"| {grouping[0]}-{grouping[1]} minutes |"
-        # Select indicies from column_timesteps that are within the grouping, inclusive
-        group_idx = [
-            idx
-            for idx, timestep in enumerate(column_timesteps)
-            if timestep >= grouping[0] and timestep <= grouping[1]
-        ]
-        data_one_group = []
-        for df in dfs:
-            mean_row = df.iloc[group_idx].mean()
-            group_string += f" {mean_row:0.3f} |"
-            data_one_group.append(mean_row)
-        print(group_string)
-        groups_df.append(data_one_group)
-    groups_df = pd.DataFrame(groups_df, columns=run_names, index=grouping_starts)
-    for idx, df in enumerate(dfs):
-        print(f"{run_names[idx]}: {df.mean()*100:0.3f}")
-    # Plot the error per timestep
-    plt.figure()
-    for idx, df in enumerate(dfs):
-        plt.plot(
-            column_timesteps, df, label=f"{run_names[idx]}, epoch: {epoch_num[idx]}", linestyle="-"
-        )
-    plt.legend()
-    plt.xlabel("Timestep (minutes)")
-    plt.ylabel("MAE %")
-    plt.title("MAE % for each timestep")
-    plt.savefig("mae_per_timestep.png")
-    plt.show()
-    # Plot the error per grouped timestep
-    plt.figure()
-    for idx, run_name in enumerate(run_names):
-        plt.plot(
-            groups_df[run_name],
-            label=f"{run_name}, epoch: {epoch_num[idx]}",
-            marker="o",
-            linestyle="-",
-        )
-    plt.legend()
-    plt.xlabel("Timestep (minutes)")
-    plt.ylabel("MAE %")
-    plt.title("MAE % for each grouped timestep")
-    plt.savefig("mae_per_grouped_timestep.png")
-    plt.show()
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser()
-    parser.add_argument("--project", type=str, default="")
-    # Add arguments that is a list of strings
-    parser.add_argument("--list_of_runs", nargs="+")
-    parser.add_argument("--run_names", nargs="+")
-    args = parser.parse_args()
-    main(args.project, args.list_of_runs, args.run_names)

experiments/uk/011 - Extending forecast to 36 hours (updated ECMWF data)/PVNEt_national_XG_comparison.png DELETED Viewed

Git LFS Details

SHA256: eab8cf00defbfb39a9d5b9cea319f1b78db8b05e2baec7ef80351dc37eb041c4
Pointer size: 131 Bytes
Size of remote file: 169 kB

experiments/uk/011 - Extending forecast to 36 hours (updated ECMWF data)/PVNet_day_ahead.md DELETED Viewed

@@ -1,22 +0,0 @@
-PVNet day ahead was retrained to produce a 36 hour forecast, it was given its [previous configuration](https://huggingface.co/openclimatefix/pvnet_uk_region/tree/main) and data except for being given ECMWF NWP data with a longer forecast horizon (max 85 hours but 37 hours given to the model). Longer horizon UKV NWP data was not available at time of training and will be a further addition in the future.
-**Results** \
-[The training run](https://wandb.ai/openclimatefix/pvnet_day_ahead_36_hours/runs/m4d3wlft/overview) had 3.15% normalised mean absolute error (NMAE) on validation data (100,000 samples from May 2022 to May 2023), [previous training of PVNet day ahead](https://wandb.ai/openclimatefix/pvnet2.1/runs/2ghzwbxg/overview?) had similar results of 3.19% NMAE.
-![](PVNets_comparison.png "PVNets comparison")
-When comparing the two versions of PVNet day ahead (the new version in green) by forecast accuracy at each step on the validation dataset samples we see some small differences in the model up to 33 hours, such as first the first few steps and between steps 5 and 10, which could be explained by differences in samples seen and evaluated on between the two versions.
-However the larger difference is an improvement toward the end of the forecast horizon, from 33 hours onwards which is likely due to ECMWF data now being available for this period, when previously no NWP data was given past 33 hours due to the NWP forecast horizon of previous data and factoring in NWP initialization times and production delays.
-UKV NWP data used in the model is currently up to 30 hours, we would expect a further reduction in error from 30+ hours when training with longer horizon UKV data which would cover up to 36 hours.
-A very rough comparison is also plotted between these two PVNet model versions and the National XG model which is currently used for day ahead predictions in production.
-![](PVNEt_national_XG_comparison.png "PVNets national XG comparison")
-This comparison is rough and should not be seen as a fair comparison as the national XG numbers are just an estimate derived from backtest data on different time periods. However, it can show roughly what relative improvement could be achieved from replacing the National XG Day ahead model with a PVNet Day Ahead model.

experiments/uk/011 - Extending forecast to 36 hours (updated ECMWF data)/PVNets_comparison.png DELETED Viewed

Git LFS Details

SHA256: e604d9b403293bbac688dc9c786cb4f0c70e1a9c6b78188a1e4f228ad0ae4b1b
Pointer size: 131 Bytes
Size of remote file: 160 kB