IT disaster recovery
IT disaster recovery is the process of maintaining or reestablishing vital infrastructure and systems following a
IT service continuity
IT Service Continuity
Principles of backup sites
Planning includes arranging for backup sites, whether they are "hot" (operating prior to a disaster), "warm" (ready to begin operating), or "cold" (requires substantial work to begin operating), and standby sites with hardware as needed for continuity.
In 2008, the British Standards Institution launched a specific standard supporting Business Continuity Standard BS 25999, titled BS25777, specifically to align computer continuity with business continuity. This was withdrawn following the publication in March 2011 of ISO/IEC 27031, "Security techniques — Guidelines for information and communication technology readiness for business continuity."[7]
ITIL has defined some of these terms.[8]
Recovery Time Objective
The Recovery Time Objective (RTO)
According to business continuity planning methodology, the RTO is established during the Business Impact Analysis (BIA) by the owner(s) of the process, including identifying time frames for alternate or manual workarounds.
RTO is a complement of RPO. The limits of acceptable or "tolerable"
Recovery Time Actual
Recovery Time Actual (RTA) is the critical metric for business continuity and disaster recovery.[9]
The business continuity group conducts timed rehearsals (or actuals), during which RTA gets determined and refined as needed.[9]
Recovery Point Objective
A Recovery Point Objective (RPO) is the maximum acceptable interval during which transactional data is lost from an IT service.[11]
For example, if RPO is measured in minutes, then in practice, off-site mirrored backups must be continuously maintained as a daily off-site backup will not suffice.[13]
Relationship to Recovery Time Objective
A recovery that is not instantaneous restores transactional data over some interval without incurring significant risks or losses.[11]
RPO measures the maximum time in which recent data might have been permanently lost and not a direct measure of loss quantity. For instance, if the BC plan is to restore up to the last available backup, then the RPO is the interval between such backups.
RPO is not determined by the existing backup regime. Instead
Data synchronization points
A data synchronization point[14] is a backup is completed. It halts update processing while a disk-to-disk copy is completed. The backup[15] copy reflects the earlier version of the copy operation; not when the data is copied to tape or transmitted elsewhere.
System design
RTO and the RPO must be balanced, taking business risk into account, along with other system design criteria.[16]
RPO is tied to the times backups are secured offsite. Sending synchronous copies to an offsite mirror allows for most unforeseen events. The use of physical transportation for tapes (or other transportable media) is common. Recovery can be activated at a predetermined site. Shared offsite space and hardware complete the package.[17]
For high volumes of high-value transaction data, hardware can be split across multiple sites.
History
Planning for disaster recovery and information technology (IT) developed in the mid to late 1970s as computer center managers began to recognize the dependence of their organizations on their computer systems.
At that time, most systems were batch-oriented mainframes. An offsite mainframe could be loaded from backup tapes pending recovery of the primary site; downtime was relatively less critical.
The disaster recovery industry[18][19] developed to provide backup computer centers. Sungard Availability Services was one of the earliest such centers, located in Sri Lanka (1978).[20][21]
During the 1980s and 90s, computing grew exponentially, including internal corporate timesharing, online data entry and real-time processing. Availability of IT systems became more important.
Regulatory agencies became involved; availability objectives of 2, 3, 4 or 5 nines (99.999%) were often mandated, and
IT service continuity became essential as part of Business Continuity Management (BCM) and Information Security Management (ICM) as specified in ISO/IEC 27001 and ISO 22301 respectively.
The rise of cloud computing since 2010 created new opportunities for system resiliency. Service providers absorbed the responsibility for maintaining high service levels, including availability and reliability. They offered highly resilient network designs. Recovery as a Service (RaaS) is widely availability and promoted by the Cloud Security Alliance.[22]
Classification
Disasters can be the result of three broad categories of threats and hazards.
- Natural hazards include acts of nature such as floods, hurricanes, tornadoes, earthquakes, and epidemics.
- Technological hazards include accidents or the failures of systems and structures such as pipeline explosions, transportation accidents, utility disruptions, dam failures, and accidental hazardous material releases.
- Human-caused threats that include intentional acts such as active assailant attacks, chemical or biological attacks, cyber attacks against data or infrastructure, sabotage, and war.
Preparedness measures for all categories and types of disasters fall into the five mission areas of prevention, protection, mitigation, response, and recovery.[23]
Planning
Research supports the idea that implementing a more holistic pre-disaster planning approach is more cost-effective. Every $1 spent on hazard mitigation (such as a
2015 disaster recovery statistics suggest that downtime lasting for one hour can cost[25]
- small companies $8,000,
- mid-size organizations $74,000, and
- large enterprises $700,000 or more.
As
Control measures
Control measures are steps or mechanisms that can reduce or eliminate threats. The choice of mechanisms is reflected in a disaster recovery plan (DRP).
Control measures can be classified as controls aimed at preventing an event from occurring, controls aimed at detecting or discovering unwanted events, and controls aimed at correcting or restoring the system after a disaster or an event.
These controls are documented and exercised regularly using so-called "DR tests".
Strategies
The disaster recovery strategy derives from the business continuity plan.
Common strategies include:
- backups to tape and sent off-site
- backups to disk on-site (copied to off-site disk) or off-site
- replication off-site, such that once the systems are restored or synchronized, possibly via storage area network technology
- private cloud solutions that replicate metadata (VMs, templates and disks) into the private cloud. Metadata are configured as an XML representation called Open Virtualization Format, and can be easily restored
- hybrid cloud solutions that replicate both on-site and to off-site data centers. This provides instant fail-over to on-site hardware or to cloud data centers.
- high availability systems which keep both the data and system replicated off-site, enabling continuous access to systems and data, even after a disaster (often associated with cloud storage).[29]
Precautionary strategies may include:
- local mirrors of systems and/or data and use of disk protection technology such as RAID
- surge protectors — to minimize the effect of power surges on delicate electronic equipment
- use of an uninterruptible power supply (UPS) and/or backup generator to keep systems going in the event of a power failure
- fire prevention/mitigation systems such as alarms and fire extinguishers
- anti-virus software and other security measures.
Disaster recovery as a service
See also
- Backup site
- BS 25999
- Business continuity planning
- Business continuity
- Continuous data protection
- Disaster recovery plan
- Disaster response
- Emergency management
- High availability
- Information System Contingency Plan
- Real-time recovery
- Recovery Consistency Objective
- Remote backup service
- Virtual tape library
References
- ^ Systems and Operations Continuity: Disaster Recovery. Georgetown University. University Information Services. Retrieved 3 August 2012.
- ^ Disaster Recovery and Business Continuity, version 2011. Archived January 11, 2013, at the Wayback Machine IBM. Retrieved 3 August 2012.
- ^ [1] 'What is Business Continuity Management', DRI International, 2017
- ^ M. Niemimaa; Steven Buchanan (March 2017). "Information systems continuity process". ACM.com (ACM Digital Library).
- ^ "2017 IT Service Continuity Directory" (PDF). Disaster Recovery Journal. Archived from the original (PDF) on 2018-11-30. Retrieved 2018-11-30.
- ^ "Defending The Data Strata". ForbesMiddleEast.com. December 24, 2013.
- ^ "ISO 22301 to be published Mid May - BS 25999-2 to be withdrawn". Business Continuity Forum. 2012-05-03. Retrieved 2021-11-20.
- ^ "ITIL glossary and abbreviations".
- ^ a b c "Like The NFL Draft, Is The Clock The Enemy Of Your Recovery Time". Forbes. April 30, 2015.
- ^ "Three Reasons You Can't Meet Your Disaster Recovery Time". Forbes. October 10, 2013.
- ^ a b c d "Understanding RPO and RTO". DRUVA. 2008. Retrieved February 13, 2013.
- ^ a b "How to fit RPO and RTO into your backup and recovery plans". SearchStorage. Retrieved 2019-05-20.
- ^ Richard May. "Finding RPO and RTO". Archived from the original on 2016-03-03.
- ^ "Data transfer and synchronization between mobile systems". May 14, 2013.
- ^ "Amendment #5 to S-1". SEC.gov.
real-time ... provide redundancy and back-up to ...
- ISBN 978-1118050637.
- ISBN 1349101370.
- ^ "Catastrophe? It Can't Possibly Happen Here". The New York Times. January 29, 1995.
.. patient records
- ^ "Commercial Property/Disaster Recovery". The New York Times. October 9, 1994.
...the disaster-recovery industry has grown to
- ^ Charlie Taylor (June 30, 2015). "US tech firm Sungard announces 50 jobs for Dublin". The Irish Times.
Sungard .. founded 1978
- ^ Cassandra Mascarenhas (November 12, 2010). "SunGard to be a vital presence in the banking industry". Wijeya Newspapers Ltd.
SunGard ... Sri Lanka's future.
- ^ SecaaS Category 9 // BCDR Implementation Guidance CSA, retrieved 14 July 2014.
- ^ "Threat and Hazard Identification and Risk Assessment (THIRA) and Stakeholder Preparedness Review (SPR): Guide Comprehensive Preparedness Guide (CPG) 201, 3rd Edition" (PDF). US Department of Homeland Security. May 2018.
- ^ "Post-Disaster Recovery Planning Forum: How-To Guide, Prepared by Partnership for Disaster Resilience". University of Oregon's Community Service Center, (C) 2007, www.OregonShowcase.org. Retrieved October 29, 2018.[permanent dead link]
- ^ "The Importance of Disaster Recovery". Retrieved October 29, 2018.
- ^ "IT Disaster Recovery Plan". FEMA. 25 October 2012. Retrieved 11 May 2013.
- ^ "Use of the Professional Practices framework to develop, implement, maintain a business continuity program can reduce the likelihood of significant gaps". DRI International. 2021-08-16. Retrieved 2021-09-02.
- ISBN 978-0-07-148755-9. Page 480.
- ^ Brandon, John (23 June 2011). "How to Use the Cloud as a Disaster Recovery Strategy". Inc. Retrieved 11 May 2013.
- ^ "Disaster Recovery as a Service (DRaaS)".
Further reading
- Barnes, James (2001). A guide to business continuity planning. Chichester, NY: John Wiley. OCLC 50321216.
- Bell, Judy Kay (2000). Disaster survival planning : a practical guide for businesses. Port Hueneme, CA, US: Disaster Survival Planning. OCLC 45755917.
- Fulmer, Kenneth (2015). Business Continuity Planning : a Step-by-Step Guide With Planning Forms. Brookfield, CT: Rothstein Associates, Inc. .
- DiMattia, Susan S (2001). "Planning for Continuity". Library Journal. 126 (19): 32–34. OCLC 425551440.
- Harney, John (July–August 2004). "Business Continuity and Disaster Recovery: Back Up Or Shut Down". AIIM E-DOC Magazine. OCLC 1058059544. Archived from the originalon 2008-02-04.
- "ISO 22301:2019(en), Security and resilience — Business continuity management systems — Requirements". ISO.
- "ISO/IEC 27001:2013(en) Information technology — Security techniques — Information security management systems — Requirements". ISO.
- "ISO/IEC 27002:2013(en) Information technology — Security techniques — Code of practice for information security controls". ISO.
External links
- "Glossary of terms for Business Continuity, Disaster Recovery and related data mirroring & z/OS storage technology solutions". recoveryspecialties.com. Archived from the original on 2020-11-14. Retrieved 2021-09-02.
- "IT Disaster Recovery Plan". Ready.gov. Retrieved 2021-09-02.
- "RPO (Recovery Point Objective) Explained". IBM. 2019-08-08. Retrieved 2021-09-02.