Use Machine Learning and Open Source Tools to Automate Load Prediction
- Dec 2, 2019 10:45 pm GMT
- 804 views
Several of our utility customers were interested in a solution that would add load estimation capabilities to their existing architectures. Accurate load prediction helps generators plan for future demand and participate in the energy markets more cost effectively. In the event meters are down or the telemetry to meters is down, a load estimator accurately reconciles the load as well. We knew machine learning would be the perfect tool to cull the wide range of possible variables to determine the predicted load; and, Open Source tools would allow us to quickly develop a highly accurate load estimator at a cost-effective price. We call the tool AMBLE (Automated Machine Learning Based Load Estimator). The following article discusses how we developed the system and lessons learned.
The project began during the summer on 2018 with 3 years of a customer’s load data. Our data scientist explored several machine learning algorithms and ultimately received the best results using random forest regression as the machine learning algorithm and employed open source Python packages including: numpy, pandas seaborn, datetime, matplotlib, requests, xlrd, sklearn, bs4, InfluxDB to build the model below.
With the scripting and workflow in place, we continued to refine the model through many simulations. We looked carefully at the results and the dependent variables which most influenced the model. Through a series of experiments and some great collaborative exchange of ideas, we continued to improve the correlation of the model and better understood which parameters had the greatest influence on the data. The initial work we did showed promise. 95% of the data was within ±10% error and 99.6% of data falling between ±20% error.
Figure 2 95% of data is within ±10% error
Figure 3 Original model error plot
95%% within ±10% error
99.6% within >±10% error and <±20% error
Moving into 2019 we tweaked the
Moving into 2019 we tweaked the variables used to include date, hour, and day as well as our environmental variables that include temperature, humidity, wind speed, wind direction, gust speed, precipitation, dew point, pressure, light conditions, and wind chill. The continued development effort lead to faster performance and more accurate results as indicated in the figures below.
Figure 5 Improved model error plot
99.36% within ±5% error
99.96% within ±10% error
Temperature carries the most weight in determining load
Reviewing the impact of each variable in our model was helpful for us to build confidence with the model. Standard industry practice says that temperature normally influence a load model by a factor of 70%. This is consistent with the data we observed as shown below.
Figure 6 Temperature carries the most importance in load prediction.
However, we also learned that not all cities and models are influenced by temperature in the same way. We will explore some of the abnormal model data below. The examples we present demonstrate strong evidence that black box machine learning models are a great fit to this class of problems because of the non-intuitive variance in the load over time and even over the days of the week.
Machine Learning is Adaptable
We tested the adaptability of the model by comparing data from two time periods of the same city. As shown below, the 2019 load profile changed significantly from the original 2016 training data. The peak load doubled. In addition, the weekend to weekday load ratio changed dramatically. In 2016, the load went up and down on a regular daily cycle, but when we look at the same period in 2019, Monday – Friday had almost 2X the load of Saturdays and Sundays. Our thesis is that a large industrial load entered the grid between 2016 and 2019 and they only operated Monday – Friday. Despite these changes, AMBLE adapted and yielded greater accuracy than in our earlier simulations.
Open Source Tools Enable Collaboration
Open source tools can be easily integrated to closed source solutions. We generated the above and below graphs with open source analytics tool. Once integrated, these tools enable universal data access and analysis. It is important to note that open source does not necessarily mean “free.” If you choose to integrate Open Source tools to your machine learning application, they will require either internal or 3rd party support. The good news is these solutions are scalable and reliable. We have implemented open source time series solutions in commercial settings with proven results.
Figure 8 Open source tools can easily integrate to closed source solutions. They are scalable, reliable, and can be implemented to enable universal data access and analysis.
Automated Load Prediction
In environments where communications between the customer communities and the generation plant are not 100% reliable, load prediction serves as a backup (perhaps even a primary source) in the event actual load data is not received. Automating the process reduces the probability of errors and manual processes. Universal data access across the enterprise enables generation and load to be a single entity.
Our example is one in which machine learning works with increasingly large data sets to predict load for a power plant. We made the application easy to integrate to enable analytics and data access for multiple stakeholders. As the number of devices and applications connecting to the grid continues to grow exponentially, so will the amounts and types of data available to the utilities. Machine learning will gain popularity as the tool for utilities who seek to unlock the value of their ever growing and increasingly complex data they have and continue to gather.