Operational Resilience - What Is It and Why Now?
Operational resilience has become one of the buzzwords of 2023, particularly within enterprise leadership and technology teams. It has become one of those rare business phenomena to cut across Boards of Directors, business units, and technology equally. And that is because operational resilience represents an existential imperative to the digital enterprise.
As businesses continue their relentless march toward digital transformation, new opportunities for growth, efficiency, and customer engagement have emerged. Yet, with the advantages comes a complex array of challenges that enterprises must face to ensure their operations remain stable and resilient. Operational resilience has never been more relevant than today as companies navigate the dynamic landscape of cyber threats, data privacy concerns, rapid technological changes, and complexity. Here’s why operational resilience is becoming increasingly vital in the digital age.
Defining operational resilience
Operational resilience can be defined as the ability of an organization to anticipate, prepare for, respond to, and adapt to incremental changes and sudden disruptions. In simpler terms, it is about being able to bounce back and adapt when things don’t go as planned, to avoid causing problems through everyday operations, and to consistently recognize and mitigate potential risks as they occur and before they cause impact.
This concept isn’t limited to technological systems alone, but since most enterprises are now digital, the solution often involves technology. In an enterprise setting, this resilience ensures that critical business functions and customer services can continue despite adverse conditions – safeguarding the enterprise, its customers, and the society and system in which it operates.
Operational Resilience protects against adversarial incidents caused by cyberattacks and non-adversarial, often operational events. In some technology circles, the term cyber resilience is used to describe the technical aspects of operational resilience, and to an extent, both terms are interchangeable.
A brief history of operational resilience
The term operational resilience began to be used frequently following the financial crisis of 2007 as a holistic way to consider an institution’s ability to withstand various kinds of disruptions, be they technological, financial, or otherwise.
In more recent times, particularly since 2019, operational resilience has focused more specifically on digital systems since they form the basis of modern enterprise operations and we have experienced an increasing number of incidents where digital systems failure has caused systemic, institutional, and societal harm.
Before 2000 – Military publications and doctrines describing the ability of forces or systems to maintain capability in the face of adversity. The military term situational awareness informs many modern and digital operational resilience strategies. National, societal, and governmental planning to avoid and respond to natural disasters and other physical threats.
The 1990s through to 2010s – Enterprise Disaster Recovery programs designed to respond to prescribed disasters and physical/environmental events. While earthquakes and tornadoes have occurred, we more frequently found that incidents not typically predicted in disaster recovery planning (such as pandemics, degradation by cyberattacks, or complex systems failure) were the cause of the most significant impacts.
2007 onwards – Following the 2007-08 financial crisis, operational resilience principles were enshrined within financial services regulations and laws such as the Basel and Sarbanes Oxley Act, which doesn’t specifically seek to improve operational resilience but does result in such an outcome. Simultaneously, the North American Electric Reliability Corporation (NERC) produced its Version 5 Standards to address reliability and resilience in critical infrastructure around 2013.
From 2019 – We began to see a slew of consistent guidance and regulation regarding digital operational resilience, particularly relating to the financial services industry:
- 2019 – US FFIEC “Business Continuity Management Handbook”
- 2019 – European Banking Authority “ICT and Security Risk Management Guidelines”
- 2019 – US Federal Reserve / OCC “Sound Practices to Strengthen Operational Resilience”
- 2021 – National Institute for Science and Technology (NIST) “Developing Cyber Resilient Systems (2.1)” utilized broadly across critical infrastructure sectors
- 2022 to 2023 – Singapore MAS “Guidelines in Business Continuity Management (BCM)”
- 2022 to 2025 – UK FCA / PRA “Building Operational Resilience Rules”
- 2022 to 2025 – European Union “Digital Operational Resilience Act” (DORA)
From 2019, we see regulations and guidance specifically focusing on resilience as the ability to anticipate, withstand, recover from, and adapt to adverse conditions, stresses, attacks, or compromises instead of narrowly addressing recovery planning and testing. Recovery is a necessary aspect of any resilience program, but alone it is insufficient. The approaches taken within the financial services industry since 2019 also align with emerging best practices within other areas of Critical National Infrastructure, and we expect this approach to become the standard for all digital enterprises.
We will be talking specifically about digital operational resilience.
Digital operational resilience – the global common factors
The EU’s Digital Operational Resilience Act (DORA) represents a relatively prescriptive and comprehensive consolidation of operational resilience best practices. It is also top of mind for leadership due to its broad scope and severe penalties. We believe that any systemic approach to address the articles of the DORA legislation will position organizations well to address other global legislation and provide a significant strategic advantage in terms of resilience, excellence in risk management, customer service, and ability to innovate and take well-informed risk-based decisions.
So, let’s look at the key tenets of this risk management framework for operational resilience.
As you can see from the reference architecture, Governance and a process for ICT Risk Management form the cross-cutting core of any operational resilience initiative. This approach ensures;
a. Ownership and accountability are assigned and enforced from the senior executive level down to practice owners within risk and operational teams, consistent with the criticality identified by regulators and governments.
b. ICT Risk Management becomes a sustainable process that informs IT and business operations, mainly focused on critical business functions and essential business services to ensure that situational awareness is preserved, risks can be effectively managed, and that due diligence has occurred to ensure identified business functions can meet their availability goals under severe but plausible disruptions.
c. Organizations have implemented capabilities to anticipate, prepare for, respond to, and adapt to changes and sudden disruptions.
Many organizations are struggling to know where to begin with these requirements. We’re committed to helping you make your way through this process in the most painless way possible.
More reading and references