Machine Learning for Renewable Energy Production

Abstract

Climate change is increasing the frequency, severity, and duration of wildfires in California. Smoke and ash from wildfires can decrease the power produced by photovoltaic (PV) solar systems. Therefore, improving predictions of the energy produced by PV systems during wildfires is valuable for grid operators and system owners alike. This project aims to train and evaluate models that predict solar production during wildfires. We focus on the first 30 days of three California wildfires: the Soberanes Fire in 2016, the Valley Fire in 2015, and the Rough Fire in 2015. We use the solar production measurements from PV systems in nearby counties as our target variable. The features include PV system characteristics and meteorological data as well as wildfire-related features such as PM2.5 concentration and distance from the point of ignition. Random forest was selected as the modeling approach for this analysis because it has a lower mean squared error (MSE) than ordinary least squares or ridge regression. We found that for each fire, the model that included wildfire-related features had a lower test MSE than the model that did not include those features. Our results suggest that including wildfire-related features improves predictions of solar energy production during the first 30 days of a wildfire and would help grid operators and system owners make more informed operations and investment decisions, such as whether to increase standby power generation capacity or invest in home battery backup power systems.

Prediction Modeling Results 

Our results suggest that including wildfire-related features such as distance from the ignition point and PM2.5 concentration may increase the accuracy of predictions of solar energy generation in the early stages of a wildfire. The mean squared error of predictions created with random forest models that included wildfire-related features were lower than the mean squared error of predictions created with random forest models that did not include those features, as shown in Table 2.

The results of our model can assist electricity grid operators and system owners make more informed operations and investment decisions. During wildfire events, grid operators may choose to supplement the decreased generation from solar with an increase in natural gas or other standby power generation sources. This may ultimately result in utility decisions to purchase additional standby power generation capacity over the long term, especially if their service territory includes areas historically impacted by wildfires. In addition, our model can provide guidance to system owners who rely on electricity produced by behind-the-meter solar generation to meet their energy needs. Updating solar forecasting models to include some number of wildfire days each year may increase the accuracy of predictions of long-term electricity savings, which will likely be lower than predictions made with models that don’t incorporate wildfire effects. This could help potential solar customers make more informed investment decisions, such as considering if home battery storage is a reasonable purchase for back-up generation needs during wildfire season.

Sample timeseries (left) of satellite measured ERA5 data for user visualization purposes.

Applications of Data Science in Hydrology

A Ridge machine learning model was developed to predict the sum of evaporative mass fluxes via evaporation and transpiration to the atmosphere in California. This process is a key parameter in modeling the hydrologic budget of the water cycle as it directly accounts for nearly two-thirds of the annual precipitation’s recirculation into the atmosphere in the US, and up to nearly 100 percent of the precipitation in areas of the arid southwestern US. As such, providing accurate estimates of ET rates in time and space serves as an important metric for hydrologists, biologists, agricultural economists, and more.

The development of this model utilized 145 of the California Department of Water Resources CIMIS stations for our investigation period of the calendar year 2021. Each station’s hourly ET was converted to daily average, to create 365 daily averaged ET values per station. Feature datasets were imported from Copernicus Climate Data Store. Once Test MSE was calculated, ET predictions for all grid cells were calculated. The below animation shows predicted ET for January 1-10, 2021.

Test MSE: 1.112 E-03 inches

Test R^2: 0.8613

Data Sources:

Muñoz Sabater, J., (2019): ERA5-Land hourly data from 1981 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). (Accessed on 01-12-2022), 10.24381/cds.e2161bac

https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-land?tab=overview

California Department of Water Resources, CIMIS Unit; CIMIS data website, 2022. https://cimis.water.ca.gov/Default.aspx

Left: Plot showing accuracy of ridge model predictions.

Right: Network of CIMIS ET monitoring locations throughout California