Measure Success: Key Cybersecurity Resilience Metrics
Cyber resilience is focused on recovery while cybersecurity is charged with defense against attacks. But in truth they are two parts of the same protective strategy. No matter how strong the cybersecurity defenses are, eventually an attack will get through. Cyber resilience means having a plan to counter, resolve, and rebound when it happens.
“Cybersecurity is ‘insecure’ by nature. As humans, we make mistakes every day whether it’s code bugs during programming or configuration errors during implementation. Often cyber resilience is more about how you limit the impact of a breach than being able to eliminate the risk of a cyberattack altogether,” says Gaurav Banga, Founder and CEO of Balbix, a provider of Al-powered cyber risk management products.
Cyber Resilience and Your Company
Strengthening a company’s cyber resilience is an ongoing exercise as attackers launch more sophisticated attacks nearly every day. It’s getting tougher to cope now that bad actors have added malicious AI to their arsenal. But surviving these attacks rests almost entirely on the competence and reliability of the cyber resilience strategy.
But how can you know if your cyber resilience is strong enough for your company to withstand relentless attacks?
First, make sure all the bases are covered. Second, put cybersecurity metrics in place to measure the effectiveness of the tools and tactics you put in place.
“Before setting KPIs, organizations must first define its resilience objectives, such as what key assets and processes — aka the “Critical Path” — you need to protect, what potential threats and scenarios are relevant to it, and what levels of risk and impact the organization can allow itself to handle. These objectives should be set by the organization’s leadership, on the C-suite level, and by the board of directors, and dictated from the top down,” says David Kellerman, field CTO, of Cymulate, a breach and attack simulation platform.
Understanding the Importance of Cyber Resilience
Sometimes the lines between cybersecurity and cyber resilience are so blurred that people have trouble telling what activity or rating applies to which of those categories.
“Cyber resilience is a newer concept. It can get thrown around when one really means cybersecurity, and also in cases where no one really cares about the difference between the two,” says Mike Macado, CISO at BeyondTrust, an identity and access security company.
“And to be fair, there can be some blurring between the two. For example, is a hardened server secured? Is a patched server secured? Is it resilient to attack? Is it both secured and also resilient to attack? Maybe one but not necessarily the other? Maybe both but one with a higher level of confidence than the other?” Macado adds.
The upshot is that cybersecurity is a detect, defend, and repel response, whereas cyber resilience is the cyber battle medic that stops the bleeding and puts you back in action. But just like that battle medic must learn trauma medicine and prepare treatment plans to use in the heat of battle, so too must plans and tools be put in place well ahead of a cyber-attack.
First Step to Cyber Resilience
The first step is to identify what you need to recover the fastest to remain operational. Which data sets do you need back online first? Which people need access first? And so on. Identify everything you’ll need to make a fast and successful rebound after an attack. Does this sound similar to making business continuity (BC) plans? Cyber resilience should be an important part of your overall BC strategy.
“There is some overlap in the areas of cyber resilience and business continuity, which means that business continuity metrics can also be a source of ideas for cyber resilience metrics,” says Macado.
Making cyber resilience efforts to prepare for the eventual defense failure is one thing and succeeding in bouncing back afterwards is quite another. You don’t want to find that the perfect plan perfectly failed when the time comes. To make sure your cyber resilience strategy measures up ahead of an attack, you’ll need to use a few key metrics.
“The best KPIs are those capable of measuring enduring elements of an organization’s security posture. The objective should be establishing KPIs that predict future performance. They should be rooted in outcomes rather than simply measuring activities,” says Matthew Butkovic, technical director of risk and resilience in the CERT Division of the Software Engineering Institute at Carnegie Mellon University.
Core Metrics for Gauging Cyber Resilience
The work to identify what actions to take — and which to measure — begins with listing your organization’s resilience objectives.
“Once the resilience objectives are clear, KPIs can be set to measure them. While there are many abstract possible KPIs, it is crucial to set meaningful and measurable KPIs that can indicate your cyber resilience level and not only tick the box,” says Kellerman.
And what are the meaningful, core KPIs?
“These include mean time to detect, mean time to respond, recovery time objective, recovery point objective, percentage of critical systems with exposures, employee awareness and phishing click-rates, and an overall assessment of leadership. These KPIs will properly assess your security controls and whether they are protecting your critical path assets, helping to ensure they’re capable of preventing threats.” Kellerman adds.
Metric #1: Mean Time to Detect (MTTD)
This metric generally tops the list as it gets right to the heart of the meaning of cyber resiliency. In short, it measures the time between the start of an attack and the victim organization’s awareness of it.
“MTTD is a key indicator that can be used to determine whether an organization is properly prepared to respond to threats in a timely manner. A lower MTTD indicates better detection capabilities and an effective way of reducing the potential impact spread in a breach,” explains Jon Miller, CEO and co-founder of Halcyon, maker of an anti-ransomware and cyber resilience platform.
Metric #2: Mean Time to Acknowledge (MTTA)
This metric measures the second step in the sequence: The average time it takes from when an alert is triggered to when defensive action is taken. It is calculated by dividing the total time taken to acknowledge all incidents by the number of those incidents in each timeframe.
Kellerman points to a recent Change Healthcare cyberattack by the ALPHV/Blackcat Ransomware group as a prime example of the usefulness of MTTD and MTTR. He says it was clear by judging the impact and damage, the attack wasn’t immediately detected but rather it was discovered due to service disruption.
“The time it took to contain and scope the incident demonstrates a poor score for its MTTD or MTTR KPIs and could have reflected differently if these were taken more seriously. Having such a metric in place would have prevented Change Healthcare from taking days to regain full functionality and recover all the services it provides,” Kellerman says.
Measuring Incident Response Effectiveness
Response begins once an attack is detected and acknowledged.
“Unlike cybersecurity, which focuses on prevention, resilience acknowledges that attacks cannot be fully controlled. It emphasizes preparedness for potential failures despite security measures,” says Tanu Kak, global marketing head, cybersecurity services at HCLTech, a global information technology provider.
“Resiliency solutions must utilize certain key protocols to detect a company’s vital business applications, infrastructure, and dependencies to build a robust cyber resilience framework,” Kak adds.
Metric #3: Mean Time to Contain (MTTC)
MTTC is a measure of the amount of time it takes your incident response team to detect and acknowledge the incident and prevent a cybercriminal from doing more harm.
Put another way, it is the average time it takes your response team or security provider to contain a threat. Obviously, taking the least amount of time for this process is preferable as the more time a threat has to work, the more damage it does.
Metric #4: Mean Time to Resolve (MTTR)
“To have an effective cyber resilience strategy, it is vital that an organization’s response plans — including containment time, communication effectiveness, and coordination among response teams — are effective and followed in order to decrease the time required to respond to and efficiently mitigate the issue,” says Miller.
The mean time to resolve is the average time it takes to fully resolve a failure or threat. This includes the time it takes to ensure the failure won’t happen again as well as time spent detecting, diagnosing and repairing the issue.
“This metric measures the time to fully recover an organization’s critical data assets and return to full operation after a successful cyberattack. There are several Mean times to metrics, including Mean time to Detect, Mean time to Acknowledge, Mean time to Contain, Mean time to Resolve. However, the most important metric for an organization is how quickly the organization can recover to full operations for the business,” says Russ Kennedy, chief product officer at Nasuni, a privately held hybrid cloud storage company.
Performance and Compliance Indicators
“The ability to recover from a cybersecurity attack within a reasonable time that guarantees business continuity is a crucial indicator of resilience. This timeframe is typically a product of the organization’s cybersecurity hygiene,” says Joseph Nwankpa, Farmer School of Business Director of Cybersecurity Initiatives, Associate Professor of Information Systems & Analytics at University of Miami/Ohio.
But that’s not to say there aren’t some serious obstacles to overcome along the way.
“The complexity and number of applications, such as a combination of legacy systems and new emerging technologies, continue to be a key proxy for cyber resilience. As the complexity of systems increases, the level of cyber resilience diminishes, as it becomes increasingly challenging to tackle misconfigurations, data leaks, system vulnerabilities, multiple attack vectors, and attack surfaces,” Nwankpa adds.
Metric #5: Security Policy Compliance Rate
This is a measure of how well the organization is adhering to its own security policy. This is typically measured by audit or automated evaluations. But there are those who argue that security policy compliance rate is less effective than security performance rates.
The security compliance rate is calculated as the ratio of complying business entities, expressed as a percentage. For example, if you have 100 employees and 25 comply with a specific security requirement, then you have a security compliance rate of 25% for that requirement. By comparison, security performance rates measure performance in key actions such as the number of security incidents detected and resolved within a specific period.
Metric #6: Access Management and Authentication Success
Access management ensures user identities are authenticated and access is granted to the degree warranted for that person or role.
It’s important to take measure of your success rate by auditing whether access has been revoked from employees who have left the company, and how well access matches the job role.
Prevention and Defense Metrics
In terms of cybersecurity KPIs, metrics that include the percentage of incidents prevented by proactive security measures, the number of false positives and false negatives, and the level of employee security awareness are essential.
Metric #7: Number of Cybersecurity Incidents Reported
“Comparing security metrics with industry benchmarks or past performance helps CISOs understand how their organization’s security maturity compares to others. This information can guide the development of realistic security targets and strategies,” says Frank Kim, fellow at SANS Institute, the leading cybersecurity education organization.
“Security process improvement metrics like number of incidents with the same root cause are also beneficial in gauging effectiveness of a security strategy over time, by driving continuous improvement in security practices,” Kim adds.
Metric #8: Intrusion Attempt Frequency
It’s important to understand where vulnerabilities exist and how often specific areas are targeted in attacks.
“Monitoring metrics such as intrusion detection/prevention system (IDS/IPS) alerts, firewall rule effectiveness, and malware detection rates give you a good read on your security controls effectiveness,” says Miller.
The Financial Perspective on Cybersecurity
Even if an organization successfully rebounds in an acceptable amount of time, there are costs to track, analyze and pay.
“Developing a comprehensive cost model that takes into account all aspects of the cost of an incident should be a top priority in an organization’s pursuit of a cybersecurity strategy,” says Kennedy.
Metric #9: Cost per Incident
Cost per Incident or cost avoided by mitigating an incident is a primary metric for evaluating threats and tools.
“Cost of an incident or cost avoidance by preventing an incident is critical to justifying the investment in cybersecurity tools and processes to ensure continuous operations for the business,” says Kennedy.
It’s important to account for all the costs associated with an attack in the final calculation.
“There are the obvious costs of time and tools to assess the situation, mitigate the attack, recover from the attack and/or fines and fees that are encountered. However, there are less obvious costs that also need to be taken into account, including loss of productivity and loss of reputation in the market,” Kennedy adds.
Metric #10: Phishing Attack Success Rate
This KPI measures the percentage of employees who fall victim to phishing attacks. They make the company a victim to attack by clicking on malicious links or downloading malicious attachments from phishing emails.
Lower click-through rates indicate better resilience against phishing attacks. In addition, this KPI can also measure the percentage of employees who completed security awareness training and passed an exam,” says Kellerman.
Cyber Resilience Metrics: A Summary
Cybersecurity and cyber resilience are two separate things. The first is designed to detect and defend against attacks. The second is designed to help the company rebound quickly after inevitability an attack gets past cybersecurity defenses. There is some overlap between cyber resilience and business continuity plans as the former is one part of the latter and their goals dovetail.
It’s not enough to have a plan for cyber resilience. Testing and key metrics must be used to ensure the organization is resilient regularly.
“During a cyber event the two biggest unknowns for most organizations are whether they will be able to recover their data and environments, and how long it will take,” says Anneka Gupta, chief product officer at Rubrik.
“The first unknown requires zero-trust data protection for all critical assets. The second requires robust testing of scenarios to determine how quickly the organization can recover,” Gupta adds.