Before we really understand Disaster Recovery (DR) as a term, would like to run through some of the other terms and domains of which this DR belong and history of how these terms and concept were evolved and how it is applicable in the current scenario of sophisticated cloud and data center based IT infrastructure in the market.
Before we use the Term DR, let us understand the term – Business Continuity Planning (BCP)
Business Continuity Planning is best described as the processes and procedures that are carried out by an organization to ensure that essential business functions continue to operate during and after a disaster. By having a BCP, organizations seek to protect their mission critical services and give themselves their best chance of survival. This type of planning enables them to re-establish services to a fully functional level as quickly and smoothly as possible. BCPs generally cover most or all of an organization’s critical business processes and operations.
Conceptually the thinking for the test of if it is a Business Continuity Plan is; “if we lost this building how would we recommence our business?”
As part of the business continuity process an organization will normally develop a series of DRPs. These are more technical plans that are developed for specific groups within an organization to allow them to recover a particular business application. The most well-known example of a DRP is the Information Technology (IT) DRP.
The typical test for a DR Plan for IT would be; “if we lost our IT services how would recover them?”
IT DR plans only deliver technology services to the desk of employees. It is then up to the business units to have plans for the subsequent functions.
A mistake often made by organizations is that ‘we have an IT DR Plan, we are all ok”. That is not the case. You need to have a Business Continuity Plan in place for critical personnel, key business processes, recovery of vital records, critical supplier’s identification, contacting of key vendors etc.
It is critical that an organization clearly defines what sort of plan it is working on.
So concisely, BCP is a bigger domain and DR is sub domain of the overall BCP. DR is used for the IT infrastructure part of the Recovery from a disaster.
We always find people used the terms like HA, DR, BCP, Backup site while each of these are independent and needs to be understood in the context of the solution mapping.
Let us understand conceptually how they are different
HA – When a typical server/network/storage fails and for that, a redundancy to be built in is termed as HA (high availability) these are not disaster!
DR – Disaster term is used for natural calamities like Flood, Earth Quake, Fire etc., when we do not have an access to the premises (it can be data center or office building)
BCP as explained earlier encompasses all of the above and defining the plan for each of the area when it fails what actions are required to continue the business. This includes some of the manual processes, new premises where employee can sit together and work and IT infrastructure availability.
There is one classic example for the BCP/DR, which I normally explain when there were no sophisticated clustering and replication technologies were not developed.
A 5 Star hotel were running legacy application for their complete solution (Hotel Management System) and since it is hotel there will always been group arriving and departing (tours, Airline Crew etc) and system was all integrated with other outlets in the hotel like room service, F&B, Bar, Restaurants inside hotel. Therefore, when guests checking out the billing is perfect.
In the event of Hardware or Software Failure, Most of the functions were possible to perform manually and then when the system is up, it can be entered into the system. However, “Guest Checkout” function can have some impact, as the billing figures were not available in manual records.
With the limitation of the technology in terms of building redundancy, there was no other alternative then to improve some process. After some brain storming sessions, it was agreed that system should generate every hour a guest balance report that will provide guest ledger every hour. This means at the reception when guests’ checkout, up to last hour their balance is available and based on that billing was available.
With the above example, we can understand that in order to define a proper BCP, a thorough understanding of all the business processes is must to see what impact it can have on running the business in the event of failure.
If we just visit the history of the computerization we need to go back to the era of mainframe type of machines, which were developed, and the initial usage of computers were more for academic and scientific calculations. As the technology evolved in terms of, hardware and software, the use of computers were shifted more towards the commercial applications. Slowly Computers became smaller and with more powerful processing. Initially most of the usage of all applications were of Batch applications and hence there were always a source documents which were prepared manually were available at all the time in the paper form in files.
The rise of digital technologies also led to the rise of technological failures and as mentioned earlier. prior to this, the majority of the businesses were keeping manual / paper records which although susceptible to fire and theft, did not depend on the reliable IT infrastructure, it was in 1970s they become more aware of the potential disruptions caused by the technology downtime. The 1970s saw the emergence of the first dedicated disaster recovery firms.
As in any industry when some concept is introduced, and technologies are developed, series of organizations are trying to adopt it to get some competitive edge. When many firms starts adopting these concepts and trying to implement for their IT services companies, many vendors also jump into this business and hence some regulatory body is formed (often by some group of top vendors initially) and regulations are defined to standardize the processes. US Regulations are introduced in the US in 1983 stipulating that national banks must have a testable backup plan. Other industry verticals soon followed suit, driving further growth within disaster recovery businesses.
Later in 1990s, the development of three-tier architecture separated data from the application layer and user interface. This made data maintenance and backup a far easier process.
In 2000s – 11 September attacks on the World Trade Centre’s has a profound impact on disaster recovery strategy both in the US and abroad. Following the atrocity, businesses placed greater emphasis on being able to react and recover quickly in the event of unexpected disruption.
In particular, businesses looked to ensure that their critical processes and external communications could be recovered, both for altruistic and competitive reasons.
Server virtualization makes the recovery from a disaster a much faster process. With traditional tape systems, complete restoration can take days, but virtualized servers can be restored in a matter of hours because businesses no longer need to rebuild operating systems, servers and applications separately.
With server virtualization, the ability to switch processes to a redundant or standby server when the primary asset fails is also an effective method for mitigating disruption
The rise of cloud computing has allowed businesses to outsource their disaster recovery plans, also known as disaster recovery as a service (DRaaS). As with other cloud services, this provides a number of benefits in terms of flexibility, recovery times and cost.
DRaaS is also easily scalable should businesses expand and usually less resource intensive, as the cloud vendor, or MSP, will allocate the IT infrastructure, time and expertise to ensuring your disaster recovery plan is implemented properly.
Now recovery isn’t about back-up and standby servers, but about Virtual Machines and data sets, that may have been replicated within minutes of the production systems and can be running as the live system within minutes. What was once a solution for only the largest organizations with the deepest pockets is now available for all?
However, along with the improved offerings such as DRaaS that new technologies offer, comes new types of threat. With employees connected to both the internet and corporate systems, companies will see an increase in demands from Auditors and Insurance Companies to protect against developing threats such as ransomware, which is now being targeted at company servers in addition to the desktop.
”The #DRaaS market predicted to be worth $6.4 billion by 2020″
The recognition by all organizations of the importance of maintaining Business Continuity and having a credible Disaster Recovery plan is reflected in the growth forecast for the sector, with the DRaaS market predicted to be worth $6.4 billion by 2020.
Types of Disaster Recovery
Business continuity and disaster recovery are the processes and procedures that return your business systems – hardware, software and data – to full operations following a natural or man-made disaster. As businesses increasingly rely on IT for their mission-critical operations, it is essential to have plans in place to ensure your business viability is not at risk from a critical incident. Here, we look at a few different levels of data recovery:
- No disaster plan at all
- No disaster plan, but good backup procedures
- A disaster plan, with no resources in place
- A ‘cold site’ disaster recovery solution
- A ‘split site’ disaster recovery solution
- A ‘warm site’ disaster recovery solution
- A ‘hot site’ disaster recovery solution
- What level of protection is right for you?
No disaster plan at all
Despite the risks, millions of businesses globally have no formal business continuity or disaster recovery plan in place. Should a disaster occur, panic and confusion tend to be the result and timely recovery of data, software and hardware is not possible. The chances are very high that these businesses will never recover.
A simple server crash, equipment failure, power surge or human error is all it takes for a critical database to be wiped. Fires, floods, viruses, unauthorized users or hackers can play havoc with your entire business systems. Unlike hardware or software, data is an even more valuable asset that cannot be replaced.
If you work in a company, which has an IT department that does not plan for disasters, then it is essential that an effective plan is developed before it is too late. Fortunately, there are many reliable and cost-effective solutions available to safeguard your business.
No disaster plan, but good backup procedures
The absolute minimum companies must do – even the smallest business – to prevent a disaster from wiping out business information is to back up the data on your computers daily and store the back-ups offsite at a secure archival company. Never store it at employee’s homes.
That way even if your hardware and software is ruined, you can still replace it and load it up with all your irreplaceable data. If your IT department is not making good backups of at least the critical systems at least every single day, then it is simply not doing its job.
Another important thing to remember about backups is that they must be tested regularly to make sure they are working. Nothing is more frustrating than to need a backup and find that the data is corrupt or non-existent.
Another smart and reasonably simple step is to build fault tolerance into all of your critical systems. This means installing RAID drives – disk drives, which are redundant copies of each other – clustered systems and other types of local recovery procedures that at least provide an extra layer of protection.
A disaster plan, with no resources in place
Once you have a good backup and archival procedure and your critical systems are fault tolerant, the next step is to put together procedures for remote disaster recovery. This simply means you ask and answer the question, “What do we do if the computer center is utterly destroyed?” You might, for example, make arrangements with another division or company to share equipment and space if either is struck by disaster. Agreements need to be made with critical computer vendors to quickly ship new systems in the event of an emergency. This kind of planning is a good first step, although recovery would be slow in the event of disaster so you need to be sure your business can afford a few days of downtime if required.
A ‘cold site’ disaster recovery solution
A simple yet effective business backup solution, a cold site is simply a reserved area on a data centre where your business can set up new equipment in the event of a disaster. This is a popular disaster recovery method because it tends to be less expensive than other options, yet still gives a company the ability to survive a true disaster.
If you outsource your disaster recovery to a third party, then odds are they will establish this form of disaster recovery solution. This will work as long as your planning is good, your backups are sound and your documentation is excellent. Of course, extended downtime in the event of a disaster must be acceptable for a cold site to be a valid option. Plan on 24 hours for critical systems and as long as a week for less important functions.
A ‘split site’ disaster recovery solution
If your organization is large enough, it may be feasible to house the IT department across more than one location. In the event of a disaster to one site, operations can then reasonably simply shift to the other site and any new equipment needed could be purchased as necessary as long as the backups were properly maintained. The advantage to this method is it eliminates the need for the major up-front costs of building a dedicated disaster center.
As your organization will need to purchase or lease the equipment in the warm site, this option does involve more set-up costs than a cold site, but has the advantage of being able to get your business systems up and running much faster. Even sites with multiple applications can generally be back to full operation within 24 hours.
A ‘hot site’ disaster recovery solution
A hot site is a premium level of disaster recovery where the business IT systems and up-to-date data are duplicated and maintained at a separate data center. In this scenario, a duplicate computer center is set up in a remote location with communication lines set up and actively copying data at all times. The site has a duplicate of every critical server, with data that is up-to-date to within hours, minutes or even seconds. At the highest level it even has desks, phones and whatever else is necessary for operations to continue if the worst happens.
Following a disaster, your business can very quickly ‘switch’ to the hot site with minimal disruption. This is the ultimate in disaster preparation, reserved for companies with excellent management and highly skilled IT staff. Hot sites are expensive, difficult to set up and require constant maintenance, but in the event of a disaster, operations can continue with a minimum of downtime. This is a popular option for institutions such as finance companies and stock exchanges where downtime is not an option.
What level of protection is right for you?
Before determining, exactly what business continuity and disaster recovery plans you need in place for your business it is essential to analyze your systems, data and requirements and develop a solution that cost effectively meets your needs today and into the future.