The mission of this group is to bring together utility professionals in the power industry who are in the thick of the digital utility transformation. 

27,043 Members

Post

DATA HYGIENE: THE REAL COST OF DIRTY DATA AND 5 TIPS TO IMPROVE DATA QUALITY

Greenbird.com

Data Hygiene

The Real Cost of Dirty Data and 5 Tips to Improve Data Quality

Growing up, we’re taught to keep our hands clean. As children we’re told ‘wash your hands before dinner’ and ‘wash your hands after playing outside.’ Even as adults, public health programs show us how to wash our hands effectively to prevent disease.

Now, as primary data custodians for our respective organizations, we still need to keep it clean.

This time we’re not talking about your hands, but your data.

The term dirty data is not just a catchy alliteration. It is, in fact, an extremely serious problem.

How serious, you might ask.

Numbers to Prove Dirty Data Hurts

According to Experian’s 2019 Global Data Management Research: “We see, year after year, that despite our ambitions, many businesses fail to take full advantage of the opportunity that data can provide to improve customer interactions to increase business performance.”

In the US alone, an IBM survey revealed that bad data costs the economy $3.1 trillion every year. Moreover, more than 30% of business leaders are not confident with the data they’re using to make key business decisions, while 27% of respondents are uncertain.

Image source

Dirty data typically refers to data that is poorly structured, has inaccuracies or is incomplete. It impacts various industries differently. However, whichever industry an enterprise is operating in, the negative repercussions are equally damaging to a business’ overall health.

In the financial services industry, dirty data goes beyond financial loss. Inaccurate and incomplete data can lead to regulatory breaches, delayed decisions due to manual checks, and sub-optimal trade strategies just to name a few.

Businesses that use and rely on a CRM for lead nurturing and customer segmentation are likewise negatively affected. Culled statistics show that while 67% of businesses use CRM data for customer targeting, 60% believe that their overall data health is unreliable.

Even the healthcare industry is not spared. According to Healthcare Finance, supplies management is one area that dirty data affects the most. Supply costs account for 20% - 30% of operating expenses, yet this area is often mismanaged due to incorrect or incomplete data. Dirty data also affects inventory. Making sure that medical supplies are present where they are needed which could spell the difference between life and death.

How can dirty data inflict so much damage? Data Doctor Thomas Redman explains, “The reason dirty data costs so much is that decision makers, managers, knowledge workers, data scientists, and others must accommodate it in their everyday work. And doing so is both time-consuming and expensive. The data they need has plenty of errors, and in the face of a critical deadline, many individuals simply make corrections themselves to complete the task at hand. They don’t think to reach out to the data creator, explain their requirements, and help eliminate root causes.”

Dirty data can affect all business type and industries, and it can wreak havoc even in today’s most advanced digital projects.

How Dirty Data Hampers the Progress of AI and Data Governance Projects

Initiatives involving artificial intelligence and modern data governance are the prime examples of how dirty data impacts digital transformation projects.

According to a study by market research firm Dimensional Research, 8 out of 10 AI and machine learning projects have stalled due to poor data quality, while 96%, “have run into problems with data quality, data labeling required to train AI, and building model confidence.”

Image source

This is bad news as AI, big data, and machine learning now rank highly in the top priorities of many companies’ digital transformation initiatives.

Dirty Data and the Utilities Industry: A Closer Look

The utility sector is one of the industries most affected by this AI, machine learning, and data governance stall caused by dirty data. Smart grids, smart meters, digital twins, microgrids are amazing innovations, but with them comes a deluge of data. For example, electric companies used to read meters 12 times a year. Today, smart meters relay data as frequently as every 15 minutes or even more frequently. They are creating terrabytes of valuable data which needs to be managed.

As a result, many T&D companies are finding their data handling and data provisioning capacities outpaced by the information they’re receiving.

Utility Dive’s Herman Trabish explains, “Figuring out how to manage those data could hold the key to new revenue streams and improved grid operation, if utilities can find software tools to integrate multiple grid technologies and handle ever-escalating quantities of information.”

Data Hygiene: 5 Tips to Improve Your Data Quality

Sanitizing dirty data may not be as easy as washing your hands, but it’s also not impossible. Here are 5 tips to help you clean your data:

1. Determine if you have the internal capacity and a modern platform for data provisioning

There’s no problem in admitting that you don’t have the manpower, technology, and other resources to ensure trustworthy and timely data delivery, i.e. data provisioning. There are two ways to put in place a data provisioning program, depending on your budget. You can either invest in a domain specific iPaaS, bring experts onboard and purchase data provisioning technology, or alternatively partner with a company that can perform data provisioning on your behalf.

2. Explore the feasibility of using ML

As we cited above, machine learning is one of the digital transformation initiatives that is most affected by dirty data. However, it can also offer one of the most potent solutions to prevent it. ML can create and enrich data assets efficiently. It supports data quality through proactive and reactive data maintenance protocols. Further, it encourages data use by relevant parties through the ease of data discoverability.

3. Empower data prep among those who know the information the best

Agile data preparation practices allow the experts in your organization to do the data provisioning or data preparation themselves. This ensures that the data is processed, organized, and presented accurately and in its most useful form.

4. Remove data preparation silos

Data access should never be an inter-departmental competition. Data should never be kept in silos, especially if there are multiple departments or stakeholders relying on the same information.

Establish working groups that are primarily in charge of data collection, preparation and provisioning. This will break break down data silos and promote collaboration.

5. Standardize data definitions

If your energy company defines “Report Week” as “calendar week beginning at 12:01 a.m. on Sunday and ending at midnight on Saturday,” make sure everyone who deals with data understands it the same way. Even the slightest deviation in the agreed definition can render your data incorrect and dirty.

It’s Time for a Data Detox

You’re only as good as the data you use. Muddled data equals muddled business decisions. Taking both proactive and reactive steps to ensure the completeness, accuracy, and usefulness of the data you collect allows you to gain a competitive advantage, expedite your digital transformation, and achieve better business results overall.

Article reposted with permission from Greenbird Integration Technology (Original).

Thorsten Heller's picture

Thank Thorsten for the Post!

Energy Central contributors share their experience and insights for the benefit of other Members (like you). Please show them your appreciation by leaving a comment, 'liking' this post, or following this Member.

Discussions

Matt Chester's picture
Matt Chester on July 10, 2019

Bad data is as useless as no data at all-- nay it might be worse because it could lead you down the wrong path with undue confidence. Utilities definitely must look for quality in their data, not just quantity-- thanks for some of these tips!

Thorsten Heller's picture
Thorsten Heller on July 11, 2019

Exactly ! Many utilities have too much focus on devices, sensors, IoT but forget about the importance to make something meaningful out of the data gather and the data available.

Bad data is one o fthe reason why many utilities struggle with moving the PoC for AI or ML into production, too.

Let's keep our focus on the data hygiene and "make data great again".

Get Published - Build a Following

The Energy Central Power Industry Network is based on one core idea - power industry professionals helping each other and advancing the industry by sharing and learning from each other.

If you have an experience or insight to share or have learned something from a conference or seminar, your peers and colleagues on Energy Central want to hear about it. It's also easy to share a link to an article you've liked or an industry resource that you think would be helpful.

                 Learn more about posting on Energy Central »