Three Big Band-Aids
- Posted on August 25, 2011
- 56 views
Self-Healing is a networking concept behind many designs intended to avoid loss of signal when a router or other relay device is not available to do its job. This works to get the data to the destination, but does not repair or replace the out of service node. Eventually the broken device has to be repaired and returned to service. Repair of each communications node, large or small, will require a technician and probably a "truck roll". The more inaccessible the locations (those atop power poles or radio towers), the more costly the repair.
Fault-Tolerant is synonym for Redundant. A fault tolerant system has additional components intended to take over in the event of hardware failure. As more and more applications are developed that require 7 x 24 availability, more and more equipment is being purchased with redundant parts. In these cases -- high availability is being achieved with buying increasingly large band-aids mitigating system downtime, but without improving reliability. Any fault tolerant system failure requires repair or replacement or the system is no longer buffered for failure. If the equipment is located in a data center, the IT department is paying a hefty premium for maintenance services by the vendor on the order of 12% -15% of original cost each year. If the equipment is located in the field, the O&M department will be dispatching a truck.
Truck rolls are extremely costly. NARUC estimated for purposes of discussion that each truck roll costs a utility $275 without considering the cost of parts. The more redundant parts that fail -- the more costly the system. We all understand that having a spare tire in the trunk is great backup, but once the spare is in use one still has to replace the spare. We would strongly prefer not to have a flat. Everyone considering deployment of self-healing or redundant systems must take into account that these systems do not physically repair themselves and to take repair costs into consideration just as if there were no fault-tolerant parts.
Locating equipment that is suitably solid is a tremendous challenge at this time. There are no "Consumer Reports" to guide choices and no central clearinghouse (yet) for utilities to share information. There are efforts afoot to create these essential collaborations using private resources, but in the meantime utilities are on their own to make good decisions. The best advice we can offer at this time is for utilities to evaluate all their current equipment as to the current failure rate of the hardware, and then continue to make the same calculations for every product in test or deployment to compare them on the basis of hardware failure rate. Even very small scale deployments will yield useful results when tracked over time.
Utilities interested in a wider collaboration and standardized sharing of equipment experience should contact their appropriate trade organization (APPA, NRECA, EEI, and regional), Grid21 -- a new non-profit outgrowth of the Gridwise Alliance, or TekTrakker.