Note from Tom:
I have moved to Substack as my primary blog platform. If you want to see all my new posts, as well as my 1200+ legacy posts dating from 2013, please support me by becoming a paid subscriber to my Substack blog. The cost is $30 a year. Thanks!
One of the biggest concerns in the NERC CIP community today is the fact that, even though NERC entities may now store and utilize BES Cyber System Information (BCSI) in the cloud without fear of compliance consequences, there remain three important activities that NERC entities with high or medium impact BES environments are effectively not able to engage in, due to the impossibility of proving compliance with a large number of CIP requirements.
This is not because any CIP requirement prohibits cloud use, since no current requirement even mentions the cloud. Rather, it is because CIP version 5, the CIP version that introduced almost all the CIP terms used today, was drafted between 2010 and 2012, when the cloud was still considered by most of the NERC community to be experimental and too risky to use; moreover, almost nobody foresaw that the situation might be different in the future. Therefore, the team that drafted the requirements and definitions never even considered the possibility of systems being deployed anywhere but onsite.
Of course, the situation is different today, although most NERC entities make far more use of the cloud on the “IT side” of their business than on the “OT side”. What little OT use of the cloud there is includes:
A. OT systems that are not subject to CIP compliance at all (i.e., systems that do not meet the definition of BES Cyber System);
B. A modest amount of BCSI used in the cloud, mainly as data for one SaaS application. CIP-004-7, which came into effect on January 1, 2024, cleared away the three Requirement Parts[i] that had unintentionally prevented use of BCSI in the cloud and replaced them with the new CIP-004-7 Requirement 6 and the revised CIP-011-3 Requirement 1 Part 1.2. However, absent solid guidance on how to comply with these two requirements – and especially on the compliance documentation the SaaS provider will need to create for NERC entities with high or medium impact BES environments – most NERC entities have been reluctant to “test the waters” on BCSI in the cloud.
C. A handful of low impact Control Centers in the cloud. There should be no adverse compliance consequences for deploying low impact systems in the cloud, mainly because the Cloud Service Provider (CSP) should not need to provide any special compliance documentation to a NERC entity with low impact BES Cyber Systems; the entity should be able to produce all the documentation it needs on its own. However, as in the case of BCSI in the cloud, the fact that there has been almost no guidance on this topic has made these entities reluctant even to test implementing a low impact Control Center in the cloud.
“Forbidden” cloud activities
The main problem with CIP compliance in the cloud is that there are three activities that are effectively “forbidden” to NERC entities with high and medium impact BES environments today, although no CIP requirement specifically forbids any of them. This penalizes the NERC entity’s productivity, flexibility, and finances. Even more importantly, the fact that these three activities cannot be performed has led to a lower level of cybersecurity for many NERC entities, due to having fewer viable options for cybersecurity services (see below).
Of course, the NERC CIP standards aren’t supposed to diminish the power industry’s level of cybersecurity, but to improve it. Yet that is exactly what is happening in some cases.
The three “forbidden” activities are:
1. Medium and high impact BES Cyber Systems (BCS) deployed in the cloud. In my previous post, I defined “Cloud BCS” by using wording from the BES Cyber Asset definition: “A system that, if rendered unavailable, degraded, or misused would, within 15 minutes of its required operation, misoperation, or non-operation, adversely impact one or more Facilities, systems, or equipment, which, if destroyed, degraded, or otherwise rendered unavailable when needed, would affect the reliable operation of the Bulk Electric System. Redundancy of affected Facilities, systems, and equipment shall not be considered when determining adverse impact.”
Of course, “within 15 minutes” can informally be interpreted to mean “instantaneous”. Since electrical fields propagate at very high speeds, any disturbance in the BES will be felt immediately, sometimes at a long distance. In fact, the big Florida outage of 2008 was detected within microseconds in Alberta, which isn’t even in the Eastern Interconnect.
BES Cyber Systems deployed in the field have a sub-15-minute impact on the BES due to their connection to a device like a circuit breaker or voltage sensor, which controls or monitors the BES. I call this a “real-time connection to the BES”. If the system doesn’t have such a connection, it isn’t a BCS and is not in scope for the CIP standards.
Why is it a problem that medium and high impact BCS can’t be deployed in the cloud? It’s that most enterprise software products are either already running in the cloud or will be in a few years. In fact, someone who works for a major software company (whose products are heavily used by the power industry) told me recently that it’s no longer possible for a developer to make money on software that is delivered only on premises. While it’s already the case that a lot of the best security software is only available in the cloud (i.e., as SaaS), that will be increasingly true for a lot of other software used in Control Centers[ii]. Reliability is bound to be impacted sooner or later.
2. Electronic Access Control or Monitoring Systems (EACMS) deployed in the cloud. EACMS are defined as “Cyber Assets that perform electronic access control or electronic access monitoring of the Electronic Security Perimeter(s) or BES Cyber Systems. This includes Intermediate Devices.” Of course, this term refers to devices (Cyber Assets) that are installed on premises.
However, the term has no meaning with respect to the cloud, since the term Electronic Security Perimeter (ESP) itself has no meaning in the cloud. An ESP by definition is the “logical border” surrounding a NERC entity’s high or medium impact BCS. The ESP needs to be wholly contained within a Physical Security Perimeter (PSP) that is under the control of the entity.
If a NERC entity has deployed one or more BCS in the cloud, the code for those systems will typically be distributed among multiple devices (physical and virtual) in multiple data centers; moreover, that code will often migrate between devices and even between data centers. There is no way that a NERC entity can even find out where one of their Cloud BCS is located at a given moment, let alone find out exactly what physical access protections are in place for it.
The CIP compliance problem regarding EACMS arises when an on-premises cybersecurity monitoring service or product, used by a NERC entity within a high or medium impact BES environment, moves primarily or entirely to the cloud. When this happens, the on-premises service is either discontinued altogether or deprecated, meaning it is no longer guaranteed to have the same functionality as the cloud-based service. The same problem arises when a NERC entity is searching for a security monitoring service and discovers that most of the available offerings are cloud-based; in some cases, there are no equivalent on-premises services or the available services are more expensive – or both.
The problem is that, even though a cloud-based cybersecurity monitoring service may offer greater functionality and/or lower cost than an on-premises service, if it meets the definition of EACMS – i.e., it monitors electronic access to an on-premises ESP or BCS – then, just like an on-premises EACMS, it must comply with each of the 50+ CIP Requirements and Requirement Parts that apply to EACMS. Therefore, the cloud service provider needs to furnish each NERC entity customer with documentary evidence of compliance with each of those Requirements and Requirement Parts.
Moreover, since the EACMS definition refers to devices, not “systems” or “services”, this probably means the CSP will have to provide evidence to each CIP customer that every physical or virtual device in any cloud data center that supports the cloud-based security monitoring service, or has supported it at any time within the three-year audit period, was compliant with each Requirement and Requirement Part throughout the audit period. Of course, this would be an astronomical amount of documentation, so no CSP will agree to provide it, even if they could do so.
Because of this, NERC entities with high and medium impact BES environments have never been able to utilize a cloud-based security monitoring service since CIP version 5, in which the term EACMS was introduced, became enforceable in 2016. In fact, one CIP auditor told me around 2017 that he had had no choice but to require a large electric utility in his NERC Region to discontinue use of a widely used security monitoring service they had been using for several years.
The utility had to replace it with a purely on-premises monitoring product that was less functional (since an on-premises product can’t make use of worldwide threat intelligence in real time, as some cloud-based services can). The auditor said it “broke his heart” to have to do this, because he knew the utility’s Control Centers would be less well protected going forward than they were previously. I’m sure this story is far from unique.
This is the worst aspect of the “CIP/cloud” problem: In many cases, it results in a NERC entity (usually a large one) having a lower level of security than an organization of similar size that does not have to comply with the CIP standards and can freely use cloud-based security services.[iii]
3. Physical Access Control Systems (PACS) deployed in the cloud. A PACS is defined as “Cyber Assets that control, alert, or log access to the Physical Security Perimeter(s), exclusive of locally mounted hardware or devices at the Physical Security Perimeter (PSP) such as motion sensors, electronic lock control mechanisms, and badge readers.”
The CIP compliance problem with PACS in the cloud is almost a mirror image of the problem with EACMS in the cloud. As with EACMS, the PACS definition refers to Cyber Assets – physical devices – that are installed on-premises. However, if a NERC entity with a high or medium impact BES environment utilizes a cloud-based service that controls, alerts, or logs access to a PSP, the service will have to provide the entity with evidence that they complied with every CIP requirement that applies to PACS. Moreover, they will need to provide this evidence for every physical or virtual device that at any time executed or stored any part of the PACS code during the three year audit period. This is just as impossible for a PACS service as it is for a service that meets the EACMS definition.
How – and when - can we fix these three problems?
The fact that the above three activities are “forbidden” constitutes the primary problem preventing some NERC entities from fully utilizing the cloud today. However, the good news is I believe the problem can be fully addressed with a) a new requirement in the CIP-002 standard, b) slight changes to three other CIP requirements, c) four new definitions (three of which are based on existing definitions), and d) slight changes to three existing definitions. There also needs to be one new requirement that I will describe in another post soon.
The best part of this news is that there might not be a need for any changes to the NERC Rules of Procedure, which I previously thought might be the case. While nobody has been able to tell me how the Rules of Procedure can be changed, it will probably require the NERC Legal department to draft changes and get them approved by NERC entities, the NERC Board of Trustees, and FERC. That could easily be a two-year process in itself.
Here are the changes I’m suggesting:
1. The term “System” needs to be defined to include both on-premises and cloud-based systems (it’s not defined now).[iv]
2. A new term “Cloud BES Cyber System” (or similar wording) needs to be defined (perhaps using my suggestion above of basing this definition on the BES Cyber Asset definition, while replacing the reference to Cyber Asset with System).
3. A new requirement needs to be added to CIP-002, which requires the Responsible Entity to identify its Cloud BCS. Because there is no reason why a Cloud BCS needs to be classified as high, medium or low impact, the requirement does not have to reference the “bright line criteria” in Appendix 1 of CIP-002. Therefore, neither Requirement R1 nor Appendix 1 will need to change.
4. A new term, “Cloud Electronic Access Control or Monitoring System” (“Cloud EACMS”), needs to be defined. It will probably be based on the current EACMS definition, but with “Cyber Asset” replaced by “System”.
5. A new term, “Cloud Physical Access Control System” (“Cloud PACS”), needs to be defined. It will probably be based on the existing PACS definition, but with “Cyber Asset” replaced by “System”.
6. The existing definitions for BCS, EACMS and PACS need to be revised to make clear that they only apply to on-premises systems. This will allow almost all the existing CIP requirements to remain unchanged. If there are ever CIP requirements that apply to Cloud BCS, Cloud EACMS, and/or Cloud PACS, they will be separate from the on-premises requirements.
7. Almost no changes need to be made to the existing CIP requirements, meaning they will continue to apply only to on-premises systems. There are three exceptions: the three “BCSI requirements”, CIP-004-7 R6, CIP-011-3 R1 and CIP-011-3 R2. These requirements need to be made applicable to a) BCS and Cloud BCS, b) EACMS and Cloud EACMS, and c) PACS and Cloud PACS. The three requirements were revised effective January 1, 2024, to apply to BCSI stored both on-premises and in the cloud. However, when they were drafted (starting in 2019), there was still no discussion of having cloud-based BCS, EACMS or PACS.
The looming deadline
The changes described above (as well as the new requirement that I will describe in a future post) should be fairly easy to get drafted and approved. That’s good, because there is now a relatively near-term deadline by which these changes need to be made. That deadline is October 1, 2028, the date that compliance with CIP-015-1 will become effective for Control Centers.
As you may know, CIP-015 is a new standard – approved by FERC earlier this year – for Internal Network Security Monitoring (INSM). Because some of the worst cyberattacks in recent years might have been avoided if the organization attacked had INSM in place, the NERC CIP community has shown a lot of interest in this standard, during both drafting and the approval process.
Like some other types of security monitoring, INSM is probably best performed from the cloud; this is why a large percentage of INSM services are exclusively cloud-based. However, it is also possible that many of those cloud-based services could be considered by a CIP auditor to perform “electronic access monitoring”, meaning they are covered by the EACMS definition. If that turns out to be the case, NERC entities will be afraid to use those services, since the service provider will never be able to document their compliance with the 50+ CIP Requirements and Requirement Parts that apply to EACMS. Thus, there is a real concern that, if the EACMS problem isn’t fixed (or at least on a clear path to being fixed) before the compliance date, NERC entities with high or medium impact BES environments will not be able to find quality INSM service options.
Can the EACMS problem, along with the BCS problem and the PACS problem, be fixed by October 1, 2028 (which is about two years and 11 months from today) – meaning changes to the CIP standards are drafted, approved by NERC and FERC, and have gone through whatever implementation period is specified in the Implementation Plan? I think it’s possible, but it will require the Risk Management for Third-Party Cloud Services Standards Drafting Team (SDT) to shift from what they’re working on now and to start drafting changes like what I’ve proposed. If they aren’t willing to do that, perhaps a new Standards Authorization Request (SAR) for these changes can be submitted and a new SDT constituted to address that SAR.
You may have noticed that the steps I’m proposing have little to do with cybersecurity and everything to do with fixing problems in the wording of the CIP requirements that have inadvertently prevented cloud use; these problems can be fixed with simple steps. However, I’m not denying there are fundamental risks inherent in cloud use that don’t apply in the on-premises world and thus are not addressed in the current CIP standards.
In fact, I’m not sure if these “purely cloud” risks are adequately addressed in any cybersecurity risk management frameworks, mandatory or otherwise. A good example of these risks was the risk that became apparent to the whole world a week ago; another is the “multi-tenancy” risk, which is huge but also very hard to define, let alone design mitigations for.
The problem with risks like these is it’s difficult even to state them properly, let alone to describe realistic mitigations for them; it’s even more difficult to draw up mandatory requirements (with potential million-dollar penalties!) based on them. Besides these difficulties, it’s certain that any requirements that the SDT drafts for these risks will meet with a lot of criticism from various points of view. This is almost as bad as the situation in the current US Congress, where even a bill to name a post office after George Washington might go down in a hailstorm of criticism.
This post started out as an attempt to revise my last estimate (from December 2024) of the time it will take for the current SDT to draft new standards, get them through the lengthy NERC balloting process, get them approved by FERC (who isn’t likely to approve them quickly), and finally to wait for what’s likely to be a 2-3 year implementation period: I came up with 6 ½ years.
However, as I started to revise my estimate, I looked at the 17 or 18 “purely cloud” risks that I and the SDT have identified so far (the SDT’s 12 or so cloud risks are listed in their revised SAR approved last December. I identified three more in this post and another in this post, plus two more risks I described in two posts that were linked just above). I realized that coming up with mandatory requirements (and at least a few new definitions related to each risk.
Both new definitions and new or revised standards need to be balloted by NERC entities, in a complicated process that will take at least a year and involve at least four ballots) to mitigate each of those risks will easily take six months of the SDT’s time per risk. That means it will take the SDT 9 years just to deal with cloud risks. Moreover, I’m sure there are many other cloud risks that should be addressed, but neither I nor the SDT has identified so far. Plus, more cloud risks are being identified all the time (they will continue to be identified in the future[v]); to be safe, I’ll round off this estimate to an even dozen years.
Coupled with everything else that’s on the SDT’s plate[vi], it’s safe to say the SDT will struggle to complete everything in much less than two decades. Of course, that’s ridiculous. In fact, even my previous estimate of 6 ½ years is ridiculous. NERC entities (admittedly not all of them) have been clamoring for years about their need to be able to make full use of the cloud. They might be persuaded to wait three or four more years, if NERC can show them there’s a clear path to fixing the problem in that time frame. If they wait 3-4 years and then NERC tells them that - Scout’s Honor - the problem will be fixed for sure in another 3-4 years, there’s going to be h___ to pay.
However, there’s no way that the current SDT – or any SDT, for that matter – could deliver both my “quick fix” solution and a comprehensive treatment of all major cloud risks (which to my knowledge the SDT hasn’t discussed at all since they drafted the SAR) within 3-4 years, unless the SDT were composed of Merlin, Houdini, Penn and Teller, and David Copperfield (and I’m not even sure they could do it).
If NERC entities are finally going to be allowed to make full use of the cloud, this process will need to be removed from the NERC standards development process altogether. That is, there will need to be some process in place by which the power industry, along with CSPs and SaaS providers, identifies current cloud risks to OT systems, assesses these for their relevance to the electric power industry, and gathers these into a comprehensive risk management framework; as new cloud OT risks are identified in the future, they will be assessed for relevance and (if found to be relevant) incorporated into the framework (obviously, this body will have to be constantly “in session”, although this will mostly involve discussing new risks through a secure portal, with maybe one in-person meeting every year).
Unlike the CIP standards, compliance with this framework will be the job of the Platform CSPs and SaaS providers, not the electric utilities and IPPs. It will be voluntary, since neither NERC nor FERC has any jurisdiction over any organization outside of the electric power industry (in NERC’s case) or interstate energy production and distribution (in FERC’s case). Because the CSPs and SaaS providers will be encouraged to participate in this body from the get-go, they should have plenty of incentive to incorporate this framework into their OT risk management process (you’ll notice I didn’t say “comply with” the framework, since “compliance” with a voluntary risk management framework is an oxymoron like “British cuisine” or “artificial intelligence”).
I expect to be discussing this idea a lot in the future. For one thing, there will need to be a nonprofit organization (either existing or new) that hosts this group and can accept tax-deductible contributions to fund it. I have a few ideas for who that organization might be, but I’d like to hear yours. The best way to do this would be to put them in the chat for my blog, since that will reach all of my subscribers, both free and paid (you will have to be a subscriber yourself, either free or paid). No promotions, please; I will remove any that I find. You can also email me if you don’t want to make what you have to say public.
See you in the chat!
If you would like to comment on what you have read here, I would love to hear from you. Please email me at [email protected] or leave your comment on this blog’s Substack community chat.
I’m now in the training business! See this post for more information.
[i] These were CIP-004-6 R4.1.3, R4.4, and R5.3.
[ii] Systems that runs in substations and generation facilities, with the possible exception of renewables generation, usually are so time-sensitive that physically locating them elsewhere (like in a Control Center or the cloud) would introduce too much latency. This is why I think Control Centers and renewables generation facilities (mainly wind and solar farms) are most likely to utilize software that could be outsourced to the cloud.
[iii] Another way in which the fact that NERC entities can’t make free use of the cloud in their OT environments actually harms the security of those entities is described in this post.
[iv] There may need to be a NERC definition of “cloud” as well, unless there’s agreement in the NERC community that the term is well enough understood that no definition is needed.
[v] This is another big question mark. Since new cloud risks are being identified all the time, we can’t rely on the NERC standards development process to address them – it’s way too slow. For example, the risk of a laptop being connected to a network and spreading malware to the other devices attached to the network was identified in the later 1990s. But when was CIP-010-1 R4, the CIP requirement that first addressed that risk, implemented? 2017, about 19 years later.
And how long did it take between when ransomware was recognized as a big risk and the CIP ransomware requirement was implemented?...That’s a trick question, since there still is no ransomware requirement, and none has even been proposed. The NERC standards development process has become such an exhausting slog that even protection against ransomware isn’t deemed worthy of the effort – this despite the fact that in 2018 the Control Centers of one of the largest electric utilities in the country were completely shut down for close to 24 hours, forcing the staff to run a multistate power grid in real time via cell phones. Moreover, the ransomware never even penetrated the Control Center.
Meanwhile, the bad guys, with the help of AI, are continually reducing the time it takes to develop code to exploit a new vulnerability; in fact, that time, which used to be measured in months, is now down to…14 minutes.
Clearly, the NERC community can no longer afford to wait for the CIP standards to be revised whenever a new risk is discovered. There needs to be some body, consisting of NERC entities, NERC and Regional staff members, CSPs and SaaS providers, and maybe the general public, that identifies new cloud risks and decides whether they should be mitigated by the electric power industry, based on the risk profile of each electric utility and independent power producer.
[vi] This includes making the revisions I described earlier, which will take at least three years, as well as a new task that the SDT seems to have added recently (even though it’s not included in their SAR): rewriting the current CIP standards as objectives-based and incorporating both on-premises and cloud-based systems. I agree that needs to be done, but that process alone will easily take six years (which is the amount of time that was required the last time the CIP standards were completely rewritten, for CIP version 5).
However, my guess is that rewriting the CIP standards as the SDT wants to will take significantly longer than six years, since achieving their goal of making the new standards applicable both to on-premises and cloud-based systems is likely to entail multiple changes to the NERC Rules of Procedure. As I stated earlier in this post, nobody I’ve talked to seems to know how the RoP can be changed, but the process is likely to include balloting by the NERC Registered Entities, as well as approval by both the NERC Board of Trustees and FERC. Given this uncertainty, I think two years is a conservative estimate just for this step. So, 8 years is a conservative estimate just for rewriting the current CIP standards.