Business continuity - IT resilience and disaster recovery for beginners
Like most businesses, you are probably highly reliant on your computer systems and your data to function properly. But what will you do if something goes wrong? Having a plan for a range of scenarios is what business continuity including disaster recovery is all about. Alongside this, if you improve the resilience in your IT systems to begin with, you’ll be able to prevent and minimise any disruption and keep your business running smoothly.
Skip to section
What are the major potential risks to your systems and how can you mitigate them? Your systems will include your hardware or equipment, your software, your data storage and internet connections – essentially, all of the tech you use. The major risks to think about are:
- Physical risks – fire, flood, theft and other physical risks to your premises, which could see equipment damaged, destroyed or removed.
- System failure – no technology is 100% perfect; computers, servers, routers and devices can breakdown and stop working completely. Crucial information can be lost through equipment failure if it’s not backed up elsewhere.
- Cyber attacks – malicious attackers can cripple systems with a wide variety of malware, from ransomware that locks your systems until you pay the hackers, to programs that steal card details from the online forms your customers complete. You may have heard about attacks on large companies, but according to cyber security specialists Symantec1, employees of smaller organisations are more likely to be hit by email threats - including spam, phishing, and email malware - than those in large organisations.
- People – companies can be at risk from employees who are careless, incompetent or malicious.
Evaluate your current system capacity and performance and then identify with a business impact assessment what could happen in the event of severe disruption or failure of the system, along with what the recovery requirements would be.
Managing risk in IT systems means thinking logically and strategically about how you can:
- reduce risk
- transfer and share risk
- prepare for potential problems and put in place strong management controls
- respond and recover efficiently
For example, you may be able to reduce risk through staff training, transfer or share risk by using a cloud provider to store your data, and prepare for problems by getting cyber insurance and writing a plan to follow in the event of each risk becoming a reality.
You'll also need to consider the unknown risks – new threats or problems that you can’t yet identify. How will you keep yourself informed about emerging risks and what to do about them?
If your data is backed up regularly, you will be able to get back most of your information after any disaster. You should:
- Identify your essential data – the information that your business couldn't work properly without.
- Back-up these files separately - back-up your computer or local computer network, for example in the cloud, or on different premises, so if premises are destroyed you can still recover your data and business.
- Identify how much data you could afford to lose - put timely routines in place to ensure files are backed up regularly. Make back-ups automatic if possible, otherwise put in place a daily/weekly manual routine.
- Use antivirus software – choose a reputable product and keep your software updated.
- Implement a firewall – firewalls put in place protections between your network and external networks such as the internet. Most of the popular computer operating systems now include a firewall, so make sure this is activated.
- Keep your systems up to date – apply updates to software and firmware promptly and make sure your staff do too. Set your systems and devices to 'automatically update' where possible. If the vendor or manufacturer has stopped supporting a system, then ideally replace the product with a newer equivalent.
- Use multi-factor authentication for key systems- so you are not reliant on a single password.
- Set staff policies – create strong guidelines so that staff know what is expected from them. Make sure these cover the potential risks you have identified, such as downloading unsafe apps, visiting risky websites, plugging in potentially infected USB drives or using easy-to-crack passwords.
- Control access – give users of the system (including staff and third-party suppliers) enough access to do their jobs and no more. This can help limit damage.
- Carry out vulnerability scans and penetration tests - this will help you identify possible system weaknesses – for example, by sending mock-phishing emails. Many providers are able to support this.
- Put in place warning systems - these will help alert you to attacks at an early stage.
- Understand any dependencies and single points of failure in your system - see how you can make them more robust – particularly in business-critical systems. For example, making sure your web servers can cope with surges of traffic or having spare equipment that you can call on in case of breakdown.
Many businesses now rely on mobile phones, laptops and other connected devices. All these need to be kept secure. Business-level security software can help block malicious apps, limit access to sensitive parts of your system and help keep invaders out or contained. As well as investing in security protection, take other practical steps:
- Password protect all devices and change any default passwords that come with the device
- Implement tools that help protect lost and stolen devices. Phones and laptops can be easily lost or stolen, especially when your staff are travelling. Make sure that you can track and lock the device. Ideally you should also be able to retrieve, back-up and then erase any data on the device remotely too.
- Keep on top of all critical security updates to software and operating systems, ideally automatically.
To build resilience you also need to think about the people involved in your IT security – not just your technical teams, but everyone who works for and with your organisation.
- Make sure staff know what is expected of them and are trained to minimise potential risks to systems when using email, apps, portable devices and so on. Think about other people risks too, such as talking about your information security with others or allowing access to your premises.
- Prevent system administrators from using an account that carries system privileges for day-to-day tasks such as reading email or accessing the web. This helps reduce the possibility of hackers accessing those accounts that allow wide access to your systems.
- Make sure your team have the knowledge and information needed to be able to deal with problems, including details of any service suppliers that might be called on to help with disaster recovery.
- Encourage staff to report any cyber security problems such as attacks or malware as soon as possible, as this can help minimise damage.
- Ensure you do not have any human single points of failure within your organisation, e.g. knowledge, capability, access.
Use your risk assessment to create a recovery plan. Against each risk consider how you would deal with the risk to mitigate and fix the problems.
- Set priorities for recovery based on the business-critical needs so that everyone understands which problems need addressing first.
- Decide on recovery strategies for each of the different parts of the system, such as hardware, software, operating systems, data, connectivity, power supplies and physical premises such as computer rooms.
- Make a list of everyone who will need to be contacted in the event of a serious incident along with their contact details, including your staff, customers, bank, insurers, regulators and suppliers and those who will be able to help you resolve your issues.
- Make sure everyone understands their role and responsibility in helping to fix the problem and get back up to speed as soon as possible.
- Your plan should cover what happens immediately after a serious incident and then longer-term steps. Your IT recovery plan may be part of an overall disaster contingency plan and needs to integrate with that where necessary.
- Keep your plan safe and make sure you have a copy stored securely away from your workplace.
Following any business continuity incident:
- Review what took place and how your recovery plan worked.
- Make any updates to improve the speed and the effectiveness of recovery.
- If you can identify the root causes of the disruption, see what steps need to be taken to avoid a repetition, if possible, and to mitigate the risk of a similar event in the future.
- Keep a record of your activity, so that there is a log of the incident, the response and the lessons learned.
Even if you are lucky enough to avoid any disruption, your plan still needs to be tested and reviewed regularly.
- There are many levels of testing from full back-up recovery to desktop walkthroughs to make sure the plan makes sense and is usable.
- Consider doing this every six months – you may have updated or changed your IT system or added new software or new staff members in that time, and contact details at suppliers for support are likely to change too.
- Post-testing – update your plans with any learnings.