The impact of downtime is costly for businesses that rely on IT services across any industry you can think of — manufacturing, health, finance, ecommerce, retail, transport & logistics, name it.
Take for example the 12-hour Apple store outage that cost them $25 million in March 2015, or the Facebook crash that cost the social media giant about $65 million.
As these examples illustrate, even a short-lived downtime can cost your business dearly. Besides the financial costs, there are more catastrophic costs including reputational damage which can destroy your company in no time.
Perhaps the painful part about downtime is that it’s often difficult to predict and prepare effectively. But the good news is that it’s possible to significantly minimize downtime costs by understanding the factors that accelerate this cost, types and remedies.
First, we define downtime cost.
What is downtime cost?
Downtime cost is the direct and indirect cost to an organization when systems or processes are unavailable due to an unexpected event. In some cases, downtime cost can even exceed the value of the damaged system. For this reason, it is important to have a plan in place to minimize the impact of disruptions.
Most importantly, understanding the potential factors that contribute abundantly to the cost of downtime puts you in a good position to manage the risks. This will keep your business out of troublesome situations such as litigation and irreversible damage to your company’s reputation.
What factors contribute to the cost of downtime?
The cost of downtime can vary significantly depending on a couple of factors. Here are the main ones:
1. Type of downtime
There are two major types of downtime that a business can experience: server downtime and system downtime.
Server downtime occurs in the servers, which are literally the nerve center of data — more like the source of fuel for the entire business. Server downtime can be particularly lethal, because it means your systems cannot access the data they need to function. Server downtimes can quickly escalate to bring down critical internal systems and multiply the downtime cost to unimaginable levels.
A 2021 server survey by ITIC, a security intelligence firm, established that 44% of corporations reported that server downtime cost them over $ 1 million per hour while 91% said server downtime cost them about $300,000 per hour. This clearly demonstrates the importance of server uptime.
Systems refer to every other form of IT that a company relies on to run. This is basically the hardware and software networks that your employees, customers and partners use on a daily basis.
It could be the computers, websites, mobile devices, internal software such as ERP or security. So a system downtime means any of these functions going down.
It’s important to understand the difference between these two and the impact on cost across servers and systems. Unplanned is when an IT function suddenly stops operating when it should be. Unplanned downtime is the most devastating and this is what most businesses are struggling with. It’s severely costly, threatens business existence and worst of all it’s highly unpredictable.
Planned downtime on the other hand is never a cause of major concern because it’s normally scheduled, and most companies send out messages to all their customers to inform them of the scheduled downtime. The only time planned downtime can slip into a costly nightmare is when it gets out of hand meaning the maintenance fails to complete within the scheduled time. It means you will have to pay more and aso risk the wrath of customers who are waiting to resume activity at the time you promised them.
2. Recovery time
How quickly your company can get back up and running after a disruption is a critical influence of the downtime costs. The longer the downtime lasts the more the costs keep rising and this can mean millions for some companies as indicated by the ITIC survey. So you want to make sure that your recovery time is as short as possible.
These are some of the ways recovery time contributes to the downtime cost:
- Lost productivity: Downtime means lost productivity, and the longer it takes to recover, the more productive hours are lost.
- Lost customers: If customers feel that they have waited for too long for services to resume, they can easily switch to your competitors. This is most common among customers who have pressing needs or those who tend to be impatient..
- Damage to reputation: Extended outage can damage reputation and make it difficult to regain the trust.
- Overtime expenses: Extended recovery time can accelerate the overtime pay for employees or emergency repair teams.
- Lost sales: This is the mother of all costs, worse if your business depends on 100% uptime to make sales. For every second you are down, your sales revenue is also going down.
Of course how fast you recover will also depend on the severity of the downtime and what it takes to get out of it. Some downtime can force you to bring in outside help especially if you do not have sufficient in-house resources to handle the disruption.
3. Business model
The business model dynamic depends on how heavily a company’s model relies on systems. For example, businesses that rely on just-in-time delivery or those with a high amount of online customer interaction are more likely to incur more costs.
A SAAS company or social media company is more dependent on its service website or app than a consulting company that relies more on physical interactions. Internet outage in a SAAS or social media company will therefore run into more costs compared to a traditional consulting company. If you're an online retailer, a database crash can mean missed sales and angry customers.
4. Size of business
A small business can typically expect to lose less per hour of downtime, while a larger company can lose as much, often running into hundreds of thousands or even millions.
The size component plays out in three major ways: number of employees, customer base and servers/systems ;
- Number of employees: The more employees you have, the greater the cost of downtime. Each employee represents a loss of productivity and come pay time, you are always going to pay their full salaries including the hours they never worked during downtime.
- Customer base size: A large customer base means more potential lost sales. It also means massive pressure from the thousands, hundreds of thousands or even millions of customers that are affected by the downtime.
- Number of servers/systems: The more servers and systems you have down, the more costs you will incur to bring them back to operation.
5. Geographical location
There is a reason why many companies, especially the big corporations, take a lot of time to consider location. It’s a subtle but costly factor if gotten wrong, and downtime is one of those events that can expose a wrong choice of location.
Location can play a critical role depending on what is needed to restore services. A downtime that requires paying employees overtime may cost more for a company located in Los Angeles than a company located in Iowa. The cost of living is higher in Los Angeles, so companies typically pay more and this means the overtime rates will certainly be higher for most companies.
Here is a quick rundown of how location affects downtime costs:
- Transportation costs: If your company is based in a rural area, it will cost more to get technicians and replacement parts to your location than if you're based in a city where those parts are actually produced.
- Labor costs: Wages are higher in some parts of the world than others, so the cost of labor can be a significant factor in the cost of downtime.
- Taxes and regulations: Government regulations and taxes can vary widely from country to country, and can have a big impact on the cost of downtime.
6. Day of week and time of day
The cost of downtime varies greatly depending on the day of the week and the time of day that the downtime occurs. Generally speaking, downtime tends to be more expensive on weekdays and during peak hours than on weekends and off peak hours.
The day of the week: Businesses typically see a decrease in productivity on Fridays. Workers are eager to start their weekends, and many managers are out of the office. This means that companies have to pay more for downtime over the weekends starting Fridays than they do on other days of the week.
The time of day: The cost of downtime decreases as the day goes on. So morning hours are typically more expensive than evening hours. This is because there are a limited number of slots available and businesses often have more customers during morning hours..
The day/time factor is why companies schedule most maintenance tasks during off peak hours or over the weekends, because the systems are not at peak use.
Some of the industries with the highest downtime costs include manufacturing, retail, healthcare, information technology, government and finance. According to IBM’s X-Force report, manufacturing is now the leading industry in downtime as attackers target it more.
This is linked to the COVID-19 pandemic stress factors that have attracted attackers to take advantage of the emerging weaknesses in the manufacturing sector.
8. The rollover effect: one factor can lead to another
The rollover effect occurs when one outage leads to another, creating a cascade of problems that can be difficult to repair. The longer the outage lasts, the more damage it does, and the greater the cost of recovery.
The reason the Facebook crash escalated is because the initial disruption to network traffic also had a cascading effect on the way the company's data centers communicate. Even the internal systems that the company’s engineers would have used to resolve the outage were affected as a result of the rollover effect.
This phenomenon can also be caused by a lack of resources or insufficient capacity. For example, if a business does not have enough servers to handle the demand during an outage, the resulting increase in traffic can lead to even more downtime.
What causes downtime?
It's probably impossible to exhaust the causes of downtime. Fact is they can easily run into hundreds and keep varying from business to business, industry to industry. But here are the top 3 major categories of downtime triggers that you need to pay attention to.
1. Human error
From incorrect shutdown procedures to missed updates and patches, human error can cause all sorts of downtime problems for your business.
These are the top ways human error can lead to downtime.
- Employee fatigue: When employees are tired, they're more likely to make mistakes that can cause system outages or process slowdowns. In some cases, employee fatigue can even lead to accidents or injuries. Try to create a workplace culture that encourages employees to take breaks and get enough sleep. You can also promote healthy eating habits and provide ergonomic workstations to help reduce fatigue.
- Inattention to detail: Careless mistakes do lead to increased downtime and the costs thereof. For example, an employee may not properly follow safety procedures, which can lead to an accident that causes downtime. Or, an employee might accidentally delete important files or corrupt data, leading to a system crash. And, of course, this can also include simply forgetting to do something important, like shutting down a machine for maintenance.
- Communication breakdown: In many cases, it's simply a matter of employees not being aware of the potential consequences of poor communication. For example, technicians may incorrectly assume that another team is aware of an issue they're working on, or a manager might make a decision without checking with their team. This type of communication breakdown can be avoided by making sure all employees have a clear communication protocol and that all understand their role in the company's overall operations.
- Lack of employee awareness training: All too often, teams are given inadequate or incorrect training. This can lead to mistakes that cause system outages that could have been avoided with awareness education on matters like cybersecurity and other important aspects.
2. Hardware and software failures
Hardware and software go hand in hand in most businesses, and it’s fair to say that most downtime incidents emerge from these two.
The downtimes attributed to hardware failure are those that emerge from devices, and majorly manifest in the following ways;
- Lack of redundancy: If you don't have redundancies in place, a hardware failure can lead to downtime. Imagine if your company relied on a single server to store all its data. If that server failed, the company would lose access to all its data—and could potentially take weeks or months to recover it. You should always have multiple backup types of your data in different locations. This way, if one location experiences a hardware failure, you'll still have access to your data.
- Poor maintenance: A system that is not properly maintained will eventually fail, and when it does, it can take down other systems with it. This can cause a domino effect that leads to hours, or even days, of downtime. In manufacturing alone for instance, the world’s largest manufacturers are losing to the tune of $1 trillion per year due to machine failure. This is a scary cost of downtime that ought to be addressed.
- Improper change control: Unfortunately many companies either do it wrongly or simply don't have a formal process for changing or upgrading their hardware, which can lead to mistakes that breed downtime. For example, an engineer might forget to properly test a new piece of hardware before putting it into production, or they might not back up the data before making changes. These kinds of mistakes can easily lead to consequential downtime and data loss. In the Facebook crash of 2021 that we have already used extensively, the downtime was actually caused by configuration changes in the backbone routers.
- Inadequate capacity: Failure to have enough servers, storage, or networking hardware to meet the needs of your business can lead to overloads, slowdowns, and ultimately, hardware failures.
Software failure is even more worrying. In fact, a study by Synopsys in partnership with the Consortium for Information & Software Quality (CISQ) found that the average total cost of poor software in the US alone is over $2 trillion, a staggering figure indeed.
Software failures can manifest lead to downtime in the following ways:
- Unpatched vulnerabilities: Failure to patch system vulnerabilities is the largest contributor to the total cost of poor software quality among businesses in the US, according to the CISQ report.
- Lack of modularity: Some software failures can often be traced back to a lack of modularity. This occurs when a piece of software is not designed in a way that allows individual components to be replaced or updated easily. As a result, when a problem arises, it can be difficult or impossible to fix without causing other parts of the software to break.
- Inadequate functionality: This can manifest in a number of ways, such as software not being able to handle the expected load, not being able to communicate with other systems, or simply not working as intended. This can often lead to IT staff being forced to quickly come up with workarounds, which can be both time-consuming and costly. In some cases, it can even lead to systems having to be taken offline for emergency repairs.
- Outdated software: If a software is outdated, it may not be compatible with the latest hardware or operating systems, which can lead to downtime.
3. Cyber attacks
According to a downtime survey conducted by ITIC, a security intelligence firm, 76% of companies cited security and data breaches as the leading causes of downtime in servers and systems. There is every reason to pay attention to cyber security developments as this is an area that criminals are now costing businesses a fortune..
Cyber attacks can take many forms, from malware and ransomware to DDoS attacks. And they can have a devastating downtime impact on businesses of all sizes. The cost of downtime caused by a breach can be even higher. And it’s not just large businesses that are at risk. Small businesses are also targets for cyber attacks, and the cost of downtime can be even more damaging to them.
There are a number of factors that determine the amount of downtime cost a company is likely to incur as a result of a cyber attack, including the following:
- The amount of data lost or stolen.
- The number of employees affected.
- The length of time the company is down.
- The impact on customers or clients.
- The cost of recovering from the attack.
- The cost of damage to reputation or brand.
Some of the prominent downtime costs that result from cyber attacks include the cost of response and recovery. When your business is hit with a cyberattack, the initial cost of downtime is only the beginning. You also have to factor in the costs of response.
Part of the response might include hiring a cybersecurity firm to help you investigate the attack, determine its source, as well as to help you rebuild your defenses. It may also include the cost of notifying customers or clients that their data may have been compromised, as well as providing them with protection services. All these costs can quickly add up, making the impact even more costly.
Further reading: Dark web attacks
How to calculate cost of downtime
So, how much does downtime cost your company? While as we have seen it’s possible to estimate the general costs of downtime across industries, you might want to calculate the true cost of downtime to your organization.
Before we get to the formula, please recall that there is a difference between revenue losses as a result of downtime and downtime cost which takes into account the lost revenue plus other losses such as recovery and invisible costs. Most businesses tend to forget the other costs when calculating the cost of downtime, and this can complicate your accounting.
Here is the standard formula:
Cost of Downtime = Lost Revenue + Other Losses (Recovery Costs, Lost Employee Productivity Costs, Invisible Costs etc.)
Lost Revenue: This is the revenue your company earns, calculated per hour for ease of computation. A key item that you must always include when calculating this cost is uptime i.e the number of hours you normally need systems to be running per day in order to generate revenue. This is important because not all businesses utilize 100% uptime. Your employees could be coming to work 8 hours per day but maybe they only need 60% uptime to serve customers. But if your business depends on 100% uptime, like an online store, then you need to take into account 100% uptime.
Lost Revenue = Hourly Revenue x Number of Downtime Hours x % Uptime
Let's say your company needs 60% uptime per day, makes an average of $4000 per hour and experiences a downtime equivalent to 10 hours per month. This is how you will calculate your lost revenue per month:
Lost Revenue = $4000 x 10 x 60% = $24,000
If your business needs 100% uptime to generate revenue then the monthly loss will come to $40,000
Other losses: These are all the losses besides revenue. They can vary based on company size and industry. So in addition to lost revenue, the cost of downtime includeslost productivity costs, recovery costs, and any other costs that might not be immediately quantifiable but significant (invisible costs).
i). Lost employee productivity costs: This is the portion of employee salary that represents the amount of time they didn't work because of downtime. You'll still pay your employees at the end of the month, regardless of the extent of downtime your company experiences. This cost is calculated the same way lost revenue is calculated. Simply compute the average hourly pay rate per employee, then multiply this figure by the total number of downtime hours and the percentage uptime.
Lost Employee Productivity = Hourly Pay Rate Per Employee x No. of Employees x Number of Downtime Hours x % Uptime
Using the same example for lost revenue, this is how you’ll calculate your lost productivity cost, if say your average hourly pay rate is $15 per employee with a total of 40 employees:
Lost Productivity = $15 x 40 x 10 x 60% = $3,600
ii). Recovery costs: These are the costs you incur to clear the downtime and get back to work, such as data restoration, repairs, replacements, consultation, overtime costs, and penalties. Recovery costs can escalate depending on the magnitude of the downtime.
iii). Invisible costs: These are the kinds of costs you might not be able to determine upfront, but potentially huge. Examples include lost future opportunities, litigations and reputational damage which can drive customers away in large numbers. The best you can do is to establish the impact the downtime will cause now and in the future, then assign a reasonable estimate to each impact. Consider using expert help, because these are critical costs that might come to haunt you in future if you fail to get them right.
Of course this approach of calculating downtime cost might be simplistic for some companies, especially those that don’t necessarily compute their revenue by the hourly model.
Types of downtime costs
Loss of revenue is the most familiar type of downtime costs, rightly so because revenue is the heart of any business.
But as illustrated by the downtime cost calculator above, there are other types of downtime costs that companies incur besides revenue. Here are the top costs in addition to revenue.
1. Lost opportunity
It could be a missed meeting, a delayed reply to a crucial email, or a dropped lead. This type of cost can be difficult to quantify, as it's often impossible to know exactly what would have happened if the business had been running as usual. However, it's still an important cost to estimate and use when calculating the total cost of downtime. .
2. Reputational damage
Any kind of incident that harms your company's reputation can be extremely costly in terms of the damage to your relationships with customers and partners. According to Kaspersky, the reputational cost of a single incident is about $200,0000 and $8000 for enterprises and small businesses respectively. This was back in 2015, meaning these costs have now soured proportionately through the years.
In the worst-case scenario, IT downtime can even lead to a full-blown public relations crisis. Angry customers can use any means to vent their frustrations, including targeting employees. Worst of all, news outlets may start writing negative articles about your company. This kind of publicity can be very costly to overcome and can do serious damage to your reputation.
3. Penalties, fines and litigations
Downtime costs can also come in the form of penalties and fines from government agencies or your service level agreement (SLA) provider. For example, if your company's website is down and it's affecting your ability to meet your SLA, you may be charged a penalty.
Litigation costs are the fees and settlements that a business pays as a result of being involved in a lawsuit that is triggered by a downtime that happens to be costly to customers or investors.
4. Third party services
When your business shuts due to downtime, you're still paying for all the IT services that help in running your operations. It could be cloud services such as file storage, communication, or even computing power. In some cases, you may also need to hire a third party to come in and fix the problem, which can be very expensive.
5. Employee churn
When downtime occurs frequently in your company, employees can start to feel disoriented. As a result, they may start looking for other jobs, which can add to the cost of replacing them.
A sudden departure of many employees can actually lead to more downtime as it means key systems will be left unattended.
It's important to remember that employee churn isn't just costly in terms of the time and money it takes to find and train new employees. It also has a significant impact on productivity and morale. When employees are constantly leaving because of downtimes, it creates a negative work environment that can be difficult to overcome.
How to reduce downtime costs
It’s extremely difficult, or let's just say impossible to fully stop downtimes. Not even the big corporations with massive resources have been able to do this. So don’t feel alone, downtimes are inevitable in business. But there are certain actions you can take to reduce the downtime costs to minimal or negligible levels, here;
1. Establish the root causes of downtime
What is causing most of your downtime? Is it faulty hardware, human error?
Once you know the causes of downtime, you can start to brainstorm solutions. For example, if you find that a particular piece of hardware is causing most of the downtime, you can invest in a new one or make sure that the existing one is repaired and well-maintained.
2. Train employees on downtime procedures
Make sure they understand the necessary emergency protocols such as where to find and shut down systems, and how to report any problems they encounter.
A clear delineation of what should be done in the event of failure or other disruptions will go a long way to keep downtime costs at a minimum.
3. Use technology
Try as much as possible to automate tasks and processes that are prone to human error. This will reduce the likelihood of errors that can lead to downtime. It’s equally important to invest in monitoring and management tools to identify potential issues in real-time before they get to the point of causing severe downtime.
IT services such as online backup, remote access and remote IT support can also go a long way to minimize downtime costs. And if you're ever faced with a hardware failure or other technical issue, you can simply call the support line for help.
Further reading: Pricing guide for managed IT services
4. Enhance incident communication
When an incident occurs, it's critical that everyone is on the same page and knows what steps need to be taken to get the business back up and running as soon as possible.
A good way to improve communication is to establish a dedicated incident response team. This team should be composed of individuals from all departments within the company, so they can quickly communicate and coordinate their efforts during an incident.
In addition, everyone in the company should be familiar with your Incident Response Plan, so they know what to do if an incident occurs.
5. Stamp out defined factors
Some contributing factors to downtime costs can be rooted out completely. This could be a problem with some hardware, an issue with the workers, or repeated shortage of supplies.
Certain factors exist because they have been left to flourish either due to negligence or lack of strict procedures. For example, if you experience frequent server downtimes just because the technicians continuously forget to conduct routine maintenance, you can train the technicians, hold them accountable or replace them.
6. Get help from experts
You may be wondering if it's worth it to utilize experts to help minimize the cost of downtime in your company. The answer is YES. Here's why:
- Experts know what they're doing. They have the training and experience to help your company get back on its feet as quickly as possible.
- They can help you prevent future downtimes. Experts on your side means you're essentially arming yourself with the knowledge and resources you need to keep a check on avoidable downtimes.
- Experts are affordable. Compared to the average cost of downtime, experts are a relatively low cost solution that can save you a lot of money in the long run.
Use these tactics to go about finding the right experts:
- Ask your business network for recommendations. Chances are, someone you know has worked with a consultant or service provider who can help.
- Search online. There are a number of directories and databases full of qualified experts in different fields.
- Contact professional organizations. Many organizations have databases of their members, which can be a great resource for finding qualified professionals.
How to troubleshoot system/servers for downtime
First, make sure that your servers are adequately sized and configured to meet your current and projected needs. If you're having problems with CPU or memory usage, you may need to upgrade your hardware.
Next, make sure that your servers are running the latest software updates and security patches. Remember that out-of-date software is one of the most common causes of system and server downtime.
You should also have a disaster recovery plan in place in case of an unexpected outage. This plan should include procedures for backup and restoration of data, as well as for restarting services and restoring functionality.
If you're having recurring problems with system or server downtime, it may be time to consult with systems professionals. They can help you to identify the root cause of the problem and recommend solutions.
How to prevent server downtime
One of the best ways to prevent server downtime is to create and follow a regular maintenance checklist and employ server monitoring. This checklist can include tasks like server reboots, checking memory usage, and running virus scans.
Here's an example of a server maintenance checklist. You can tailor the checklist to use it daily, weekly or monthly depending on your company’s capacity;
Name of Technician/Engineer
Server scanned for malware
Empty logs and temporary files emptied
All doors to server room locked
All user accounts safe
Steady power supply on
Backups working properly
OS is up to date and working properly
Control panel updated
All remote management tools working properly
Server usage checked (Disk, RAM, CPU, Network)
Server passwords changed
A/C unit working properly
All power supplies and fans working properly
RAID fault tolerance checked
Cable integrity checked
The cost of downtime remains a real and pressing concern for businesses of all sizes. And as we have seen, it’s not just the financial losses that can be devastating, but also the impact on areas such as customer trust and loyalty.
Since it’s technically impossible to get to 100% in terms of eliminating downtime, please strive to deploy protocols that align the entire organization towards the goal of minimal downtime at all times.