The mission of this group is to bring together utility professionals in the power industry who are in the thick of the digital utility transformation. 


You need to be a member of Energy Central to access some features and content. Please or register to continue.


Intelligent Utilities Are Data-Driven. Are You Ready?

image credit:


Unless you've been living a cloistered life for more than a decade, you've heard the term data-driven organization and are familiar with its many benefits:  better decision making, improved service, and operational efficiency, to name a few. But do you have the necessary data foundation to succeed as a data driven organization? Too often, organizations take a proper foundation for granted. Using data to improve outcomes takes diligence to ensure valid, accurate data and time to get it right.  Organizations unable to deliver the expected results can frequently trace the root cause to the assumptions made about the data.

So exactly what is a proper data foundation and how do you determine if you have one?

Determine Your Use Cases First

As a first step, you’ll need to clearly define what you are trying to accomplish so that you can accurately assess what data you need. Defining use cases helps frame and clarify business needs. Let’s examine a couple of use cases to help illustrate how outcomes are affected by data.

Use Case Example #1

Let’s say an organization just completed the rollout of a new asset and would like to take advantage of its advanced features to reduce maintenance costs. Implementing a condition-based maintenance (CBM) program for these assets is planned. The equipment has the capability, but the question is: “Was it configured to capture, transmit and store the necessary data?”

The ability to successfully implement a condition-based maintenance program will be improbable without data on which to base the assessment. Project leaders and stakeholders must question if other sources of data could or should be used. For example, data from maintenance performed, the cost of the work, vendor data about the asset’s life expectancy, power quality, etc.

Use Case Example #2

An organization would like to improve unplanned outage restoration times. They already have in place a system to manage outages that captures data about actual events (status, cause, etc.). This data is a good start; however, there are other factors that impact restoration time (e.g., crew availability and location, spare parts, route, weather conditions, etc.).  Other data sources must be involved to meet the intent of this use case. Your organization already has systems for:

  • Tracking crew location in real-time and determine the quickest route
  • Locating and assessing the availability of parts.

The only piece missing is critical weather data, so your organization invests in the acquisition of weather data collected at hourly intervals from ten weather stations in the service territory.

Selecting Data Sources

Perhaps the most important aspect of implementing a data-driven paradigm is the selection of data sources.  Sources may consist of structured and unstructured data. They can be electronic or paper-based. For the purpose of this article we’ll define structured data as any data formatted in a manner that a computer can easily process (e.g., XML, JSON, CSV, and Excel). Although structured data are the easiest to work with, that doesn’t mean there isn’t work involved to prepare the data for analysis and reporting. Unstructured data are data that are not formatted, such as log data or free-form text, and are not easily processed by a computer.  Naturally, unstructured data poses more of a challenge to prepare and use, but it can be well worth the effort. For example, the use of social media to quickly assess positive and negative sentiment about a proposed product or service is a use case for unstructured data. Whether you will need to mine unstructured data depends on your use case.

Collection Interval

Understanding the interval of data collection is important for the following reasons:

  1. It can limit what you can accomplish and
  2. The frequency of the data from each source must be considered when combining sources.

To illustrate these two points, let’s examine the impact of weather on outages. The available weather data provides actual hourly data on precipitation, temperature, average wind speed and relative humidity. However, to be able to improve outage restoration performance during an outage up-to-the-minute data reports and weather predictions would be necessary. Hence, the weather source data that is available would not support this use case.

Combining Data Sources

Combining data from different sources can be challenging since data changes typically occur at different intervals across data sets. Give consideration to how you will deal with the timing variances.  For example, let’s look at the condition-based maintenance use case.  The organization is in the process of evaluating its CBM program. SCADA data will be combined with Work and Asset Management data to assess the program as well as ensure that the timeframes are properly aligned so that the reading, fault and remediation data reflect a specific asset. If the asset were replaced due to a manufacturer’s defect during the evaluation period, you would not want to include the faulty asset’s maintenance records as part of the evaluation. The original asset’s records are not indicative of the current asset’s performance or the asset class as a whole.


Completeness may refer to the data itself.  Are there gaps in the data? A device may be configured to transmit data at 5-minute intervals, 24 hours a day, to the SCADA system. For the data to be deemed complete, there should be 288 entries for a 24 hour period. If this is not the case, the data is incomplete, and the impact of the incomplete data should be assessed.

Completeness may be used to describe coverage (e.g., geography, asset type/class) as well. For example, it may not be possible to collect data in remote locations due to a lack of communication infrastructure.   Again, incomplete data will impact the type of analysis you’ll be able to conduct as well as the types of inferences you will be able to conclude.

Reliability and Validity

Two fundamental concepts involved in data collection and use are reliability and validity. These concepts are crucial when considering the use of any form of data.


Reliability refers to whether something is being measured consistently. Do you get the same number repeatedly when you measure the distance between the same two objects? If so, the measurement is considered reliable.


Validity is naturally limited by the reliability of the measure, but it assesses whether you measure what you purport to measure. For example, a device sends a value every two seconds as expected yet this value has not changed in several hours to reflect conditions that should have resulted in this value changing. In this case, the measure is reliable (a value is received every two seconds), but it would not be considered valid since the value should have changed based on the altering conditions.


Two issues are critical:

  1. Are the data you need available?
  2. Can you get the data you need in the necessary timeframe?   

On the surface, these seem like simple questions, but they are not. Organizations often have data, but it isn’t in a usable form. For example, an organization may use a third party to test equipment and receive a summary of results in a paper-based report. The issue here is that this type of data format cannot be processed by a computer. The organization needs to create an electronic version of the data. Digitizing the data included in paper-based reports can be expensive, time-consuming and delay your business objectives. Even if there is a system that collects the data, there is no guarantee that a suitable interface exists so it can be extracted. Furthermore, you may also find that the data needed is not collected at the appropriate grain. For instance, if the grain of your data is 60-minute intervals and your use case necessitates 60-second intervals you’ll need to address this gap.

Exploring Your Data 

Once you've determined that you have or can get the data needed, the next step is exploring it. What is the notion or hypothesis you are attempting to prove or disprove, in other words, you need to address these questions:

  • What is your business objective? 
  • What is the structure of the data?
  • What are the data types?
  • What does the distribution look like (e.g., normal, bimodal, exponential)?
  • Are there unusual or extreme (outliers) values? Are there unexpected or missing values?

These key questions must be answered as part of the data preparation process. Before starting you will need to devise a plan for how you will deal with anomalies.  The good news is that there are plenty of tools -- both open-source and commercial -- in the marketplace to help explore and customize the data based on your objectives. Once you have identified and determined how to handle issues, automate as many of the data cleansing tasks as feasible. If your organization does not have the expertise to automate these tasks or to configure the tools, you should consider supplementing your internal assets with external expertise.


For utilities, becoming a data-driven organization is an evolution, not a revolution. To be successful, they must consider the following basic principles before embarking on this journey:

  • Define and agree upon the business use cases
  • Validate you have the data to support what's promised
  • Routinize data collection to the greatest extent possible
  • Establish and maintain data accuracy and appropriateness.

Utility organization staff must have expertise in not only the technical aspects of data collection and analysis, but also be experienced in the many pitfalls of using and combining large data sets. Management must objectively evaluate their staff to determine if the expertise needed exists in-house or if external resources with established and proven abilities should be added to augment the staff.

Don't be afraid to start with "small" steps and build from there. Becoming a data-driven organization can’t happen overnight – it’s a process that must be carefully planned and executed with the help of internal and external resources to achieve optimum success.

Stefanie Mabli, Senior Consultant at BRIDGE Energy Group, has more than 20 years of experience in helping clients define strategy as well as develop and execute tactical implementation plans. Ms. Mabli’s experience spans multiple industries including utilities, retail, telecommunications, healthcare, government and financial services.


Stefanie Mabli's picture

Thank Stefanie for the Post!

Energy Central contributors share their experience and insights for the benefit of other Members (like you). Please show them your appreciation by leaving a comment, 'liking' this post, or following this Member.


No discussions yet. Start a discussion below.

Get Published - Build a Following

The Energy Central Power Industry Network is based on one core idea - power industry professionals helping each other and advancing the industry by sharing and learning from each other.

If you have an experience or insight to share or have learned something from a conference or seminar, your peers and colleagues on Energy Central want to hear about it. It's also easy to share a link to an article you've liked or an industry resource that you think would be helpful.

                 Learn more about posting on Energy Central »