Welcome to the new Energy Central — same great community, now with a smoother experience. To login, use your Energy Central email and reset your password.

Importance of Data Modeling in Analytics and Beyond!

This is the second of the series of articles on data management in the utility industry. As mentioned in the previous article, utility investments in analytics and other system implementation efforts cannot be successful without proper Enterprise Information Management (EIM) Strategy. An effective data management strategy should address multiple aspects including data quality, data modeling, etc.

The focus of this paper is on the importance of data modeling, an integral part of the Enterprise Information Management (EIM) strategy.

Why data modeling matter?

It is unrealistic to expect that the various data required for developing analytical models are easily correlated and ready for use!

Existing utility IT systems across various functional domains are not designed to enable a natural correlation of the data. Often, some of the data elements required for building analytical models are not managed and maintained in enterprise systems.  

To illustrate the importance of data modeling, let’s take the example of a “Long term load forecasting” use case, which is an integral part of power system planning and operations.  One of the steps involved in this use case is “calculating the historical load and DER (generation) profiles.”

 As illustrated in the diagram above, when developing an analytical model to calculate the load and DER profiles, there is a need to integrate a wide variety of data from different sources. Before the data can be used in the analytical model, it is crucial to model and prepare the data by logically correlating the data and organizing it for use in the model and for easy access and acceptable performance.

Consider the scenario where the analytical model deployed in production is taking hours and hours to produce the results, a not-so-unusual situation. “Performance challenges” are the last thing a data scientist wants to experience.  Having the data organized efficiently will enable easy access and optimal performance while setting the stage for delivering clean, re-usable data for analysis across the enterprise.

Now the question is, what is the right approach and methodology to develop a data model?

Approach for developing a data model

The decision regarding the right approach for developing the data model is critical, and there are multiple options available.

One of the approaches is to take an existing data model that was designed for some other purpose (e.g., power applications, GIS, asset management, and so on) and scale it up to an enterprise model. Key challenges with this approach include:

  • Time-consuming
  • Difficulty arriving at the collective agreement of semantics across all uses
  • Varying formats and change rates of mapping sources (i.e., inconsistencies due to revisions, upgrades, and replacements)
  • Understanding future requirements
  • Late realization of significant flaws in those models that become ‘showstoppers’ for enhancements

Another approach is to use industry-standard information models like the IEC Common Information Model (CIM), IEC 61968/61970/62325, which provides standard terminology for enterprise semantics. One of the advantages of CIM is that it gives an excellent foundational data model for most of the functional domains in the electric utility industry. With the contribution from industry experts and product companies, CIM has become more significant and more abstract than custom-built models, and it has the benefit of not pushing the integrator or analytics designer into the proverbial corner. Also, many product vendors have implemented CIM (with a certain level of customization) as part of their software solution. It scales up well because its foundational design is built to support multiple disparate business functions simultaneously.

At SCE, we have adopted the industry-standard IEC Common Information Model (IEC-CIM) as a foundation for data model. However, our approach is not limited by what is available in the standard; instead, we use it as a foundational model and extend it to cover the enterprise information needs. Take the example of Distributed Energy Resources (DER). CIM does not have enough coverage for this functional domain but provides some foundational blocks. Hence it is essential to extend the model to include missing aspects. The extended model is referred to as the SCE Common Information Model (SCIM), which provides a shared vocabulary for all information assets to manage and facilitate various business processes! The image below is an example of an extended model for DER.

Click here to enlarge the image

Note: The extension details and information modeling for DER will be covered as part of the forthcoming DER white paper.

 

While the effort started with a primary goal of creating a data model for analytics, having an enterprise semantic model helps to enable capabilities for both data-in-motion and data-at-rest. As shown in the figure below, the SCIM serves as the logical model on which all semantically aware design artifacts are based, such as those for integration services, data warehouses, Operational Data Stores (ODS) reporting, analytics, etc. For example, the model can be easily converted into an interface exchange model (exchange model) for system integration.

Figure 1: The SCIM provides a unified data model that integrates data from disparate sources to provide an end-end view of data (Data-At-Rest & Data-In-Motion)

 

The process and framework

We all might have heard plenty of stories about failed attempts to develop enterprise semantic models, primarily around implementations. Developing a data model in isolation, disconnected from the rest of the process, and not integrated into a real business or project goals, often leads to failure. Just having an academic exercise, not in alignment with organizational goals is a recipe for disappointment.

Success can only be ensured if the model can be successfully deployed (Data-at-Rest or Data-in-Motion) and used for system implementations in a timely and cost-effective manner.

Even though IEC-CIM provides a great start, just adopting an industry-standard, like IEC-CIM, does not guarantee success. The adoption of IEC-CIM has its challenges too:

·        Gaining acceptance from stakeholders

·        Converting the logical model to implementation model

·        Additional semantic mapping to develop and maintain

·        The complexity of understanding and using standards

·        Differences in the format of mapping sources

·        Possible internal model vulnerability to external model changes

·        Integrating modeling effort as part of the overall project effort

·        Effort to develop the model

·        Extending the standards to match requirements

To overcome these challenges, we have adopted a systematic and iterative data modeling approach. Instead of jumping to develop a data model for all data subject areas applicable to the entire energy utility business domain, we followed a use case-based approach, i.e., subject areas relevant to a use case.  Going back to the use case “to calculate load and DER profiles,” we focused on modeling Asset, Connectivity, etc.  All the tasks associated with model development were included as part of the project plan, which ensured the rigor, cost & schedule impact, and visibility for the efforts.  The figure below illustrates the framework we have adopted.

Figure 2: Framework for data model development

The framework is comprised of tools, technologies, standards, governance, role/people, and processes that need to work together to achieve the desired results. At the core of this framework are the various roles involved and the processes driving the iterative data model development. It is important to note that the modeling steps illustrated above will go through multiple iterations before it is used for the analytical model deployment.

 

ROLES

RESPONSIBILITIES

SKILL SET

  • Variables or features selection
  • Work with data engineer to optimize the data structure (access, performance etc.)

 

Ability to design solutions that meet business requirements and specify system (non-functional) requirements

  • Share insight into how the selected data set is used for various decision business processes
  • Share business value and challenges (data quality) associated with the data set
  • Share data classification information

Ability to clearly articulate data input and outputs, classifications and communicate desired business outcomes.

 

 

 

 

 

  • Gathering the data - Identifying the system of record or system of truth for identified variables or features
  • Work with Data Scientist to identify the data quality requirements
  • Gap analysis (what is available or not available in a system of record or system of truth)
  • Support data modeler for logical data model development
  • Converting the logical model to the physical model (optimizing the data model for selected platform deployment)
  • Organizing/preparing the data model for ease of access and performance 
  • Work with data scientist for integrating data model with the analytical model
  • Designing the data integration process and overseeing the implementation

Understanding of exiting enterprise domain and system landscape

Utility domain knowledge (foundational)

Expert in deployment platform of choice (Data base/Data warehouse/Data lake technologies, etc.)

Expert in data integration

Good understanding of data modeling

 

 

 

 

  • Gap analysis between selected foundational model (E.g., IEC-CIM) and business requirements
  • Map data elements (covering the requirements) to system of record or system of the truth information model to understand gaps and extension requirements
  • Extend the model for the gaps identified
  • Generate the logical model
  • Make sure that the model is comprehensive (for example, when modeling a Power Transformer, it should contain all attributes applicable to Power Transformer, not just limited to attributes required for the selected use case)

Strong understanding of utility domain

Expert in industry standard model (E.g. IEC-CIM)

Expert in data modeling

Understanding of Database/Data warehouse/Data lake technologies

Note: The roles and responsibilities listed above are for developing the data model and do not cover the end-to-end life cycle of analytical model development.

 

Aligning with analytical model development methodology: CRISP-DM

Now the question is how to integrate data modeling efforts as part of the analytical development life cycle?  Even though there are different approaches for data mining and developing analytical models, CRISP-DM (Cross-industry standard process for data mining) is the widely used methodology for analytical projects, including advanced analytics.

As per the CRISP-DM process, the “Data Preparation” phase consists of activities to prepare the final data set from row data received from multiple sources. Data modeling should be part of the “Data preparation phase” as it aligns with other activities like gathering the data, discovering and assessing the state of the data, transforming and enriching the data to meet the use case needs.  A close collaboration between the data engineer, data modeler, and data scientist ensures this data preparation phase is successful.

Data modeling beyond analytics

As mentioned above, data modeling scope is not just limited to analytics, but it goes beyond that.  Take the example of system integration. The challenges with integrating different systems are many and begin with the way the systems are procured. When a specific vendor product is purchased, vendors are driven by the procurement process to meet user requirements at the lowest cost. Each of the acquired applications has a unique mixture of platform technologies, databases, communications systems, data formats, and application program interfaces. While utilities prefer products that support industry-standard interfaces, another high priority is for product vendors to supply application interfaces that remain relatively stable across product releases.

Even though it may not be practical to expect that every system to system interaction developed in the organization is using a standardized message model, having an enterprise semantic model helps to start the discussion about the data and drive towards message standardization.

Take the example of asset and grid connectivity information, which is vital across many enterprise systems such as asset management, work management, engineering, mapping and GIS, mobile applications, engineering and planning, and more.  At SCE, we have developed a system of truth for Grid Connectivity information, integrating data from GIS, Asset management, EMS, and other operational applications. The system provides the information as API’s (Application Programming Interfaces) using SCIM based standard exchange model. These APIs provide a complete set of electrical network connectivity that includes Transmission, Sub-Transmission, Distribution Primary, Distribution Secondary, and Substation Internals, serving multiple application needs using a standard based message model. 

Figure 3: Grid Connectivity Information- System of Truth - Common model used as a physical data model and exchange model.

Data modeling can also help to create a standard view of data across the enterprise, enabling data quality and data governance efforts. This includes defining common terminology, semantics, and implementation, along with developing semantic traceability and lineage of data maintained across the organization.

Summary

Data modeling not only helps to validate understanding of the data between business and IT but is also a very useful tool to analyze and extract value from available data. It constitutes a crucial step in the analytics development cycle. Focusing on this step enables electric utilities to manage data systematically through the data lifecyle viz. capture, organize, analyze, and deliver to achieve the desired outcome.

While venturing into analytics or major system integration projects, organizations need to focus on their Enterprise Information Management (EIM) strategy. A well-designed EIM requires business units and IT to look at enterprise data and information as assets to understand the nature of the information and how it is used and controlled. This effort includes addressing critical issues around data definition, quality, integrity, security, compliance, access and generation, management, integration, and governance. These issues are interrelated and systemic, which require business units and IT to work together to understand and solve the challenges. The process is iterative, which requires a holistic and evolutional EIM strategy and framework to ensure a consistent and practical approach.

While Utilities are looking ahead to incorporate unstructured data, like HD drone video and LiDar of assets to better understand their condition, there are plenty of structured data available in the organization which can enable advanced analytics, both real-time and historical. Furthermore, tremendous benefits can be derived by connecting structured data with unstructured information to enable deep and valuable insights.   Having a comprehensive Enterprise Information Model – the SCIM, along with an EIM strategy, is helping SCE take advantage of these precious data assets.