Key Vulnerability Management Metrics

Published: Aug 23, 2023

2.28K

JupiterOne’s analysis of over 291 million cyber assets shows that security vulnerabilities have grown by 589%. Edgescan, on the other hand, reports that 33% of vulnerabilities identified in 2022 had either high or critical severity. Combining these reports tells us two things — organizations face over five times more risks to IT assets and a good chunk of these risks expose organizations and their infrastructures to high-cost incidents.

Vulnerability management is the key to tackling this rise in critical risks. An indispensable part of the management process is monitoring vulnerability metrics, but sadly organizations face a problem here. Organizations either don’t have a clearly defined set of metrics, or they fail to measure the most critical vulnerability metrics.

Keep on reading as we tell you what vulnerability management is, the key metrics organizations and enterprises need to take note of, and how important these metrics are to vulnerability management. Let’s get right in.

Vulnerability management and the role of metrics in vulnerability management

Vulnerability management is a series of practices and processes for identifying, prioritizing, and remediating vulnerabilities in software products or systems. Its main focus is to reduce an organization’s IT risk exposure to the bare minimum. It ensures vulnerability remediation workflows use increasingly lower amounts of time and cost, improving IT efficiency.

Effective vulnerability management, however, requires IT teams to work with the right metrics. These metrics are quantitative values that represent the performance of IT components, and it is through these metrics that IT teams measure the health of the overall IT infrastructure.

More specifically, tracking key metrics helps with risk awareness, software planning, and compliance. With risk awareness, metrics inform you about the impact of each vulnerability and the risk exposure of each IT component.

For software planning, tracking metrics provides organizations with the following:

Historical data for determining appropriate risk appetites
Optimal threat detection baselines
Adequate resource allocations
Safe vulnerability prioritizations.

By facilitating efficient planning and response workflows, vulnerability metrics then help organizations to meet both internal and external compliance requirements. This simply means internal compliance with service-level agreements (SLAs) and external compliance with the standards of regulatory agencies.

Examples of regulatory agencies include:

The National Institute of Standards and Technology (NIST)
The International Organization for Standardization (ISO)
The General Data Protection Regulation (GDPR).

Key metrics for vulnerability management

Now, what are the key metrics your organization needs to measure? Here are 15 key performance indicators (KPIs) to pay attention to.

1. Average number of vulnerabilities per asset

The average number of vulnerabilities per asset is a metric that measures the average amount of risks each asset class faces over a distinct period. As opposed to taking a measure of your overall identified vulnerabilities across your IT infrastructure, the “vulnerability per asset” metric is a more granular key performance indicator (KPI). It helps you identify the number of risks each asset class faces, what asset class faces the most risks, and the types of vulnerabilities common with each asset class.

It's through this metric that JupiterOne identifies in its report that the most vulnerable asset, based on the number of vulnerabilities faced, remains to be data. Data accounts for 60% of overall vulnerabilities. Granular information like this helps with more efficient vulnerability management measures, as the most critically-threatened assets are identified and prioritized.

2. Mean Time to Detect

The mean time to detect (MTTD) metric measures the average amount of time it takes to identify a vulnerability or threat after a security incident occurs. Also known as the mean time to identify or the mean time to discover, it is a post-incident metric that tells you the amount of time a vulnerability has existed within the system before the IT team discovered it.

The MTTD metric is simply calculated by the formula:

MTTD = time between incidents and when related vulnerabilities were identified / total number of incidents

New results may be compared with results from previous management periods. This helps to gauge IT performance and measure the effectiveness of improved vulnerability management strategies.

3. Mean Time to Action

The mean time to action (MTTA) metric measures the average amount of time it takes the IT team to take steps toward remediating a vulnerability. It is the average amount of time between when a vulnerability is detected and when the team takes the first action in isolating and eliminating it. The MTTA metric is also known as the mean time to acknowledgement, and helps to gauge the IT team’s responsiveness to security issues.

To measure MTTA, organizations use the formula:

MTTA = total amount of time between detection and action for all incidents / number of incidents

Of course, the MTTA should be kept as low as possible.

4. Mean Time to Repair

The mean time to repair (MTTR) metric measures the average time it takes to eliminate identified vulnerabilities. Also called the mean time to remediation, it is a metric that measures how long it takes to fix compromised or downed IT systems. The longer the mean time to repair, the longer the IT downtime is and, hence, the less likely organizations are to meet service-level agreements (SLAs).

A crucial part of measuring MTTR is to also identify critical infrastructure components and each’s risk appetite for vulnerabilities. This helps with planning, as data on MTTR can be combined with vulnerability appetites to understand which components need improved response workflows. The formula for calculating MTTR is:

MTTR = time between incidents and when related vulnerabilities were remediated / total number of incidents

SOAR tools are typically utilized to automate and make detection, response, and remediation workflows more effective. They are crucial to reducing MTTD, MTTA, and MTTR times.

If you are not familiar with SOAR, please check this guide for a deeper understanding.

5. Vulnerability Age

Vulnerability age is a measure of the amount of time a vulnerability has existed in the IT system without remediation. It is a metric used after detection but before remediation. This metric helps understand both the duration a vulnerability has existed and the potential damage it could have exposed the infrastructure to. The longer the vulnerability age, the higher the number of actors that exploit it and the higher the costs accrued on the vulnerability.

The vulnerability age helps to monitor how close a critical component is to exceeding its vulnerability appetite. With this, you can prioritize and speed up remediation efforts toward bringing this compromised infrastructure component back up.

6. Patching Rate

Patches in IT security management are the changes applied to IT software code to fix bugs, eliminate vulnerabilities, or update software composition.

The patching rate is the number of times these changes to the IT infrastructure are made. It is a measure of the tenacity of patch management workflows, showing how frequently the IT team changes IT composition to make it more secure and reliable over a given time.

The following formula is used to calculate the patching rate:

Patching Rate = total number of patches installed / the period in review

7. Asset Inventory

Asset inventory, also known as asset coverage, refers to the total assets over which IT vulnerability detection and remediation processes apply. It shows what systems are covered by an individual patch, as well as the total systems covered by the security infrastructure.

Asset coverage informs IT teams about the number of components outside the protection of vulnerability management systems. It also helps to measure the effectiveness of auto-detection processes for new systems.

Other important patch management metrics include patch coverage (number of components patched), patch latency (amount of time between patch release and installation), patch success rate (ratio of patches successfully installed), and patch cost (the cost of patch management activities), among others.

8. Risk Score

A risk score is a numerical measurement of the risk exposure of IT components. It indicates how much operational and financial risk an IT vulnerability presents to an organization. Obviously, a higher risk score requires higher prioritization by the IT team.

The risk score is calculated with the formula:

Risk = Probability × Impact

However, different cybersecurity frameworks have different risk-scoring methodologies.

For instance, the NIST has a Common Vulnerability Scoring System (CVSS). Under its version 3.0, vulnerability risk scores are calculated through base impact exploitability metrics, temporal metrics, and environmental metrics. Vulnerabilities are then deemed to have either critical severity (9.0-10.0), high severity (7.0-8.9), medium severity (4.0-6.9), or low severity (0.1-3.9). Vulnerability management tools help to automatically determine risk scores, and the IT team only focuses on prioritizing vulnerabilities based on these scores.

For more on NIST, please read this guide on the NIST cybersecurity framework.

9. Accepted Risk Score

The accepted risk score is the risk appetite of an IT component — the risk level considered acceptable for continued IT operations. Where a vulnerability exists but its accepted risk score has not been exceeded, this vulnerability may be ignored for other critical threats.

An accepted risk score is typically relative to the IT asset or component it is associated with. Nonetheless, it is generally determined by considering regulatory compliance requirements, SLAs, and data on past incidents.

10. Internal vs external exposure

Metrics on internal and external exposure help with prioritizing vulnerabilities. While internal exposure is concerned with the risks of systems and teams working within the organization, external exposure refers to the risks of external-facing components exposed to external actors.

By comparing internal exposure against external exposure, organizations understand what area of the IT infrastructure experiences the most breaches. They then prioritize IT management workflows and focus security strategies on the most exposed components. For instance, improving access control may be prioritized to reduce internal exposure if this is the most vulnerable, or priority may be placed on fortifying IT against external DDoS attacks.

11. Vulnerability Recurrence Rate

Also called the vulnerability reopen rate, the recurrence rate measures how many times a previously resolved vulnerability arises in the same or a different component/asset.

From a bird’s eye view, it tells organizations how successful vulnerability remediation strategies and processes are, and how much better IT patch management needs to be to reduce IT risks.

A more specific use of the vulnerability recurrence rate is to identify which vulnerabilities reappear the most and focus IT efforts toward eliminating them for as long as possible.

12. Total Risk Remediated

The total risk remediated (or total closed vulnerabilities) is a metric that measures the number of vulnerabilities repaired over a given period. It is mainly used for comparison purposes. First, organizations may compare remediation results against that of previous periods to understand the increased effectiveness of remediation processes.

Secondly, total remediated risks may be compared against new risks to understand how fast the organization eliminates vulnerabilities within its IT systems. Where new vulnerabilities constantly exceed closed vulnerabilities over multiple reporting periods, the IT system is plagued with a rising amount of open vulnerabilities, increasing the chances of critical failures. Hence, working with information on the total risks remediated helps to measure efficiency and work towards reducing the overall risk exposure.

13. False positive rates

False positives are security alerts that indicate that a vulnerability exists when there is no vulnerability or threat in the IT system. They are alerts generated by, for instance, SOAR tools, that classify activities as malicious or classify components as breached when none of this is the case. These alerts are typically based on rules set by the IT team to define risks and automatically notify or respond to these risks.

The pitfall with false positives is that they populate security alerts and divert the IT team’s attention away from actual vulnerabilities. Precious time that would’ve been used for detecting, responding to, and remediating breaches is spent attending to false vulnerability events. The false positive rate measures how many false positives are generated over a reporting period, and vulnerability management aims to reduce this as much as possible.

It is important to note, however, that false positive rates should not be extremely low. Extremely low false positive rates indicate that baselines may be too high to detect an actionable number of vulnerabilities. Statistics.com shows that the optimal false positive rate is 10%.

14. Cost per vulnerability

The cost per vulnerability is the amount of financial resources expended toward remediating vulnerabilities. From a general standpoint, it is the average amount of money spent on detecting, mitigating, and remediating vulnerabilities. More specific cost metrics help you measure the overall cost of a certain class or type of vulnerability. With this, organizations know the type of vulnerabilities or threats that cost the most and can prioritize detection and remediation efforts based on cost findings.

15. Service Level Agreement (SLA) adherence rate

SLAs are the promises organizations make to customers regarding the quality of services to be delivered to them. For instance, if there is a SLA of 99% uptime, there is a promise that customers will have access to services 99% of the time. A service level objective (SLO) is the internal goal of the IT team to meet this agreement.

Hence, the SLA Adherence Rate measures how often organizations meet service-level objectives and how well they satisfy service-level agreements. It is calculated by the following formula:

SLA adherence rate = (number of agreements met / total number of agreements) × 100%

An optimal SLA adherence rate is 100%, as organizations need to fulfill all promises to customers to build loyalty and achieve critical business goals. By monitoring SLAs, organizations also know where to set optimal MTTR baselines and how to prioritize vulnerabilities to reduce downtime or SLA deviations.

Also Read: MSP Contracts Explained

The critical importance of vulnerability management

Through vulnerability management, organizations satisfy customer expectations and improve business outcomes.

The catch, however, is that threat landscapes, stakeholder expectations, and regulatory requirements change regularly. This dynamism translates to equal changes in vulnerability management requirements, giving rise to a need for continuous improvement.

It isn’t enough that you track, centralize, and implement base metrics. You also need to go the extra step to update baselines and keep the utilization of metrics adaptive enough to fit ever-changing requirements.

Aug 23, 2023

2.28K