The mission of this group is to bring together utility professionals in the power industry who are in the thick of the digital utility transformation. 

26,993 Members


Asset Integrity Management: Case Studies in Data Science for Power Generators

Charlieaja - Dreamstime

Data Science is a multidisciplinary discipline that includes math, statistics, computer science, machine learning and domain expertise. Without domain expertise it is difficult to extract appropriate insights from the data. Data science involves:

  • Data collection and preparation
  • Visualization
  • Management
  • Analysis

Within the energy field there are four aspects where data science can be applied:

  • Energy generation
  • Transmission
  • Distribution
  • Consumption

Plants that have applied data science and machine learning have saved in operations and repair costs through modifying calculations by a few percent, which can lead to higher profit margins. As computing power has increased, data science has shifted from being a descriptive tool to a predictive one, which means using machine learning to predict failures and minimize outages. Today the objective is to be prescriptive, which means recommending actions to be taken to prevent incidents or optimize outcomes based on plant operating data trends and industry operating experience.

State-of-the-art analytics should include real time predictions and decision making, but this is still difficult to implement and not yet affordable for many companies. Evaluation Process A data scientist has to:

  • Identify the problem
  • Gather and prepare the data
  • Visualize it
  • Perform “feature engineering” by selecting the particular features for the model
  • Choose the appropriate model and build it
  • Implement the solution and test the hypothesis

At this point it is really important to correctly interpret the results, which is why domain expertise is a fundamental requirement. If the result(s) are not as expected or unfavorable, it is necessary to iterate and refine the model or its inputs.

Plant Operational and Reliability Database (INGRID)

Using data science Intertek created a database with hourly generation and emissions for the all the plants in the US that report to the EPA. This database currently has almost a billion records and allows to plot:

  • Aggregated data which allows us to see state-wide trends
  • Individual point plots which allow us to see cluster and patterns
  • Plant outcome histograms which allow us to see changes in operating regimes

Case Studies Using Plant Database

Chart 1- Statewide Generation

Using the database allowed Intertek to create Chart 1 (below) that shows the average generation of all fossil plants, solar and wind farms in California over the course of a single day during several years. This plot, called the duck curve because of its shape, shows how the mix of generation keeps evolving and quantifies the cycling that will be required in the fossil fleet if the current solar contribution keeps growing at the observed rate. This chart shows that in 2015 the entire fossil fleet in California had to ramp up 1 TWh in only 3 hours (3 pm - 6pm) despite the solar contribution represented only a 5% of the total energy generated in the state. 

Charts 2 & 3- Cycling and Output Frequency

Using the database also allowed Intertek to create Chart 2 (below left) that shows the estimated cycling damage for different unit types and sizes in Texas and Chart 3 (below right) that shows the output history and output frequency histogram of a unit which went from working a full load to lower loads, and recently, over maximum capacity, which has a big impact on cycling damage.

Chart 4- Wind Turbine

This is an example of temperature data measured on a wind turbine stator. These temperatures exceeded a maximum threshold, close to the melting point of the insulating material which led to ground faults in the generators. By using data science, Intertek could quantify the total time the material was exposed to those excessive temperatures, determine the period in which it happened and make a targeted recommendation for replacement.

Charts 5 & 6- Gas Plants

Intertek studied the change in heat rate and the efficiency loss due to cycling operation of a gas plant. Chart 5 shows the heat rate vs megawatts on days with a start (plotted in blue) compared with days without a start where the unit was working at full load (plotted in red). There is a significant loss of efficiency in those days with a start-up, and the trend of heat rate on days with a start exceeded the trend of heat rate on days without a start. The higher the heat rate, the lower the efficiency.

In Chart 6 the same unit shows how the average heat rate increased by 3% (with a corresponding loss of efficiency) over the six-year period from 2011 to 2017. The loss of efficiency is attributable to general plant and component aging, but the most significant contributor is due to increased cycling operation over those same six years.

Charts 7 - Machine Learning Applied to Gas and Coal Plant Failures

Boiler tubes are sometimes composed of two metals and the dissimilar metal welds are prone to fail due to creep caused by repeated heat-up/cool-down cycles as the plant output varies. Intertek used data from 56 fossil plants with dissimilar metal weld failures and tested different machine learning algorithms. Using the input of cycling-related variables, Intertek wanted to learn if a model could be built to predict both which welds were prone to failure and the time to failure. Intertek discovered that neural networks were the best model to predict these types of failures.

The model correlated to the data with a 95% prediction rate. Intertek was not only able to predict if failure occurred on that plant, but also to correctly estimate the time of failure which was quite accurate for failures less than 40 years. This model only used 10 cycling-related variables to predict this failure, and if more variables were included, the prediction accuracy will certainly improve. The advantage of this neural network model is the model itself decides which features are more important.

Martin Gascon's picture

Thank Martin for the Post!

Energy Central contributors share their experience and insights for the benefit of other Members (like you). Please show them your appreciation by leaving a comment, 'liking' this post, or following this Member.


Matt Chester's picture
Matt Chester on June 2, 2019

Some great insights here, Martin. For generators that might not be embracing data analytics to this extent, can you share what you think would be a good place for them to start? How can they get up to speed as quickly as they might need?

Martin Gascon's picture
Martin Gascon on June 4, 2019

Hi Matt,

Thank you for your feedback. At a minimum, plants should document failures and inspections in a searchable format;  followed by benchmarking against peer units. Data Scientists can provide insights using new tools and open source code that can change the traditional analysis performed at the plant. Tools such as allow plant owners and operators to gain strategic advantage through the understanding of their relative position in the market. 


Munir Mujawar's picture
Munir Mujawar on June 12, 2019

 Thanks, Martin for the giving me a good glimpse of application of Data Analytics for Power generators! I am working on similar studies in  Power Loads in large commercial & industrial loads. Your methodology has given me good foresight for moving in the right direction.

Get Published - Build a Following

The Energy Central Power Industry Network is based on one core idea - power industry professionals helping each other and advancing the industry by sharing and learning from each other.

If you have an experience or insight to share or have learned something from a conference or seminar, your peers and colleagues on Energy Central want to hear about it. It's also easy to share a link to an article you've liked or an industry resource that you think would be helpful.

                 Learn more about posting on Energy Central »