Leveraging AI for Improved Recordkeeping and Analytics at Utilities

Gas and Electric utilities operate with a vast number of unstructured documents, from project construction records with engineering drawings and materials specifications to operational documents such as inspection reports, maintenance logs, and procedural standards. These documents are often scattered across systems and formats, posing significant challenges in compliance, operational efficiency, and risk management. Artificial Intelligence (AI) offers a transformative opportunity to not only facilitate the standardized ingestion of new documents but also assist in the cleansing of historical records, so that their key data can be unlocked and extracted.

This article explores a three-phased approach to incorporating an AI records and data management solution within the organization: (1) ingesting new unstructured content according to a standardized taxonomy, (2) cleaning and structuring historical records, and (3) leveraging clean data for business intelligence and decision-making. It also briefly highlights additional key considerations—such as data governance, change management, system architecture, and ROI—which are key parts of the landscape for AI record management solutions at a gas utility.

Although this article focuses on gas utilities, these strategies and considerations apply generally to all utilities.

Phase 1: Intelligent Ingestion of New Content

Problem Statement:
New documents enter a utility's ecosystem continuously through myriad channels in hundreds of record types, such as field inspections, contractor submissions, field test results, or service logs. Without an established taxonomy for accommodating this inflow, these documents end up inconsistently named, poorly attributed, difficult to retrieve, and with key structured information locked inside an unstructured format.

AI-Powered Ingestion Solution:

Alignment to a Defined Taxonomy:
By defining a standard for structured and unstructured data, the required types, tags, and metadata fields can be applied to all documents entering the organization. Industry standards, such as ISO 15926 or CFIHOS (Capital Facilities Information Handover Specification), provide a good foundation and can even be adopted in full. Once the target state is known, AI can classify new content into structured digital repositories, and AI models can be trained to map incoming document attributes to the corporate taxonomy. Establishing this vision for corporate data standards is arguably the most important strategic component of this problem space.strategic component of this problem space.
Natural Language Processing (NLP) & Optical Character Recognition (OCR):
AI models equipped with OCR and NLP can ingest scanned documents and text files, recognize and apply document types, apply record naming standards, extract metadata required by the corporate taxonomy (e.g., date, location, financial attributes, asset attributes), and apply retention policies.
Smart Validation and Feedback Loop:
This ingestion process can include AI-based validation rules to flag anomalies, such as missing location data or mismatched pipeline IDs. Through human-in-the-loop learning, the system continuously improves its accuracy. Moreover, many data attributes extracted or predicted by AI will always require review and acceptance by a subject matter expert.

Example Use Case:
A field inspector uploads a pipe condition report containing handwritten notes on a paper form. AI processes the image, extracts the form fields, validates the asset information, categorizes the findings, generates the metadata, and associates the document with the appropriate maintenance event in the asset management system. Related sketches, photos, or test reports can also be analyzed and evaluated to extract asset condition data.

Phase 2: Cleanup and Structuring of Historical Records

Problem Statement:
Decades of historical documentation exist in disparate formats (scanned paper documents, legacy systems, network drives, etc.) without consistent naming conventions or metadata. These records are hard to search, underutilized, duplicative, contain misaligned versions, and may be non-compliant with corporate or regulatory standards.

AI-Powered Historical Cleanup Solution:

Document Clustering and Deduplication:
AI models can scan large document sets to identify duplicates, near-duplicates, and logically similar documents. This helps reduce content bloat and highlights canonical versions. Versions can be sequenced to build a revision stack, highlighting incremental changes to the documents over time.
Metadata Enrichment and Standardization:
Machine learning algorithms extract missing metadata, such as asset details, location information, work order references, or financial information, from document content and enrich the metadata fields accordingly. As paper forms transition to electronic capture, historical record attributes need to be extracted and aligned to the corporate data model.
Entity Recognition and Relationship Mapping:
NLP models identify key entities (e.g., contractors, pipeline IDs, regulatory codes) and establish relationships between documents and business processes (e.g., correlating a 2012 permit to a 2013 construction project at a specific location).
Confidence Scoring and Human Review:
Each AI-generated metadata assignment is assigned a confidence score. Lower-confidence items are flagged for manual validation, ensuring the integrity of critical regulatory records. Certain attributes or record types can be configured to always require a human-in-the-middle review by a qualified business subject-matter expert. Scoring human corrections to AI predictions provides the feedback loop for tuning within the models or imposing additional front-end decision-making structures.

Example Use Case:
Historical inspection records are digitized and processed to extract locations, pipe materials, and weld details, correlating them with existing asset records to retroactively enhance asset tracking accuracy. This thorough cross-referencing at the data layer supports the Traceable, Verifiable, and Complete (TVC) recordkeeping required by PHMSA (Pipeline and Hazardous Materials Safety Administration).

Phase 3: Leveraging Cleaned Data for Strategic Advantage

Opportunity Statement:
While AI can be applied to as-is data at any point, the above companion strategies for ongoing ingestion and historical excavation against a taxonomy and data model are necessary to more fully realize AI’s potential. Clean, structured data enables powerful analytics, compliance tracking, and operational foresight; unstructured content has been leveraged to fill the gaps of existing structured data.

Sample AI-Driven Opportunities:

Predictive Maintenance:
AI models can analyze historical maintenance logs, failure reports, and sensor data to identify patterns that predict future asset failures. This can drive adjustments to maintenance schedules and reduce unplanned outages.
Automated Summaries for Compliance Reporting:
Some regulatory reporting could be automated by AI models that understand required document types and then extract and summarize data from the corpus of vetted records required for audits or filings.
Knowledge Retrieval & Decision Support:
Advanced AI-powered search enables business users to ask complex questions, like “Show me all corrosion-related failures on mains installed before 1995 in Zone C,” with links to supporting documents provided in the result set.
Safety Assurance Analysis:
AI systems can automatically flag inconsistencies in inspections, missing documentation, or deferred maintenance. These alerts form the basis of internal audits and safety assurance programs, such as those required per API RP 1173.

Example Use Case:
For a specific location, a Generative AI extracts and summarizes the property rights from a body of right-of-way documents, streamlining a business user’s research and analysis.

Additional Considerations

When developing AI solutions for recordkeeping and analytics, there are several related topics that will be included in, or adjacent to, a successful solution implementation. The intent of this section is to briefly discuss these concepts and highlight some of the key considerations.

1. Data Governance and Taxonomy

Establish and enforce standardized data definitions and ownership to ensure consistent, compliant, and auditable data across departments.
Develop and implement a unified data taxonomy to enable efficient data classification and integration to support data-driven decision-making.
Implement audit trails and permission structures to restrict access to sensitive documents.

2. AI Governance and Data Security

Ensure AI tools comply with internal governance policies and external regulations like NERC CIP (North American Electric Reliability Corporation Critical Infrastructure Protection), PHMSA, or state oversight commissions.
Ensure AI tools deliver results appropriate to a business user’s permissions to structured and unstructured data.

3. Change Management and User Adoption

Provide training, intuitive tools, and user support.
Communicate benefits such as improved document access and regulatory readiness to motivate user engagement.

4. System Architecture

Develop an integration strategy using application programming interfaces (APIs) and modular architecture to interconnect key utility system pillars (GIS, SCADA [Supervisory Control and Data Acquisition], Asset Management, Record Management).
Choose flexible, cloud-ready platforms with support for future AI applications, like generative summaries, conversational agents, and real-time analytics.

5. Continuous Learning and Feedback Loops

Enable human feedback on AI decisions to continuously improve accuracy.
Monitor model drift and periodically re-train with new data.

6. ROI and Performance Metrics

Measure impacts, such as reduced document retrieval time, improved business performance, decreased redundant storage, and operational efficiencies.

Optimizing Record and Data Management with AI

AI has the potential to transform recordkeeping and data management at gas utilities, maturing fragmented, paper-heavy environments into streamlined, intelligent systems that support compliance, analytics, efficiency, and risk mitigation. Unlocking this potential begins with data standardization, allowing historical records to be evaluated against the same standards as new incoming documents. From this data foundation, AI’s powers can be fully applied.

Contact UDC for more information on how we can help your organization incorporate AI for recordkeeping and data management.