Disaster Recovery Overview, Research Paper Example
Words: 2027Research Paper
In the process of developing a disaster recovery policy there are many aspects that must be considered for the business and the appropriate actions and systems that must operation or be set into motion once a disaster is expected or has occurred. The disaster recovery plan includes all of the documented processes, procedures, plans, practices, roles, responsibilities, resources, and structures that are used to protect the IT infrastructure as well as the information that is utilized in the business (Johnsom and Merkow 2011). The disaster recovery plan in essence provides for the systematic procedure to bring the business back up and running after a disaster but maintaining a proactive approach and putting safeguards into place to mitigate risk to the IT resources can provide a quicker response for the business. The return to service or RTS timeframe is the primary objective and key performance indicator of the disaster recovery plan. This return to service timing is impacted by the ability of the disaster recovery plan to mitigate potential risks prior to disasters, safeguard the assets during a disaster and ultimately return all IT services and assets back to operational status within the shortest timeframe possible. These key areas are impacted by the business’s infrastructure, composition and organizational goals.
The corporation that is currently implementing an enhanced and stakeholder supported disaster recovery plan will impact approximately fifty personal. Each of the personal require the basic functionality of laptops, desktops, tablets and smartphones that run key business applications such on IT assets such as mail servers, local area networks, data centers, and other typical network infrastructure that hosts business applications and services. The other key aspect that should be taken into consideration is the increasing requirements on data and information availability to make business decisions as well as the web-based and hosted applications for internal business users as well as customers and suppliers to the corporation. The goals and objectives of the business include providing the key products and services to the business’s customers meeting the demands of quality, availability and support. The business objectives that are supported by the leadership in regard to returning the business as soon as possible is driven by understanding the goals and objectives of the business. Since we understand that the business consists of approximately fifty personnel and that there is a heavy reliance upon key business applications that impact internal and external users the disaster recovery plan can focus on the prioritization of IT resources and how they are interconnected with each other regarding functionality and impact to business operations. By tying the strategic, operational and tactical goals and objectives of the business to the IT resources a roadmap for recovery starts to come to fruition.
The core goals of the business are to provide key services to their customers across multiple time zones spanning a global marketplace. The impact of multiple time zones increases the time the systems must be available for customer orders, account updates, shipment verifications, accounts payable and receivable transactions, invoicing, scheduling, shipping or any other business critical function that internal or external users may need. The system that provides a high level of availability must have the functionality to survive and maintain operations during disasters. In order to maintain this functionality there are inherent costs and labor associated with that level of service. To stratify and justify this cost of quality these types of key business applications must tie directly to the business’s goals and objectives.
The highest level assets are tied directly to the primary objectives of the business and the lower tiers while important to the efficiency of the business do not contribute directly to the sustainment and effectiveness of the core competencies so they would fall further down the hierarchy to bring back to normal operational status. The progression through the recovery process is dependent upon a workflow of achieving operational status from the higher tiered assets as well as a time phased aspect. If the higher echelon assets are up and running and the business is maintaining an acceptable level of operations then the next tier can become operational to increase the business efficiencies and make everything much more effective.
Disaster Recovery Policy
In order to implement a disaster recovery plan there must first be a disaster declared. This is accomplished through a Disaster Declaration. A disaster is an event that dramatically reduces the ability for IT assets to function and provide the much needed tools to perform business actions. The disasters come in multiple venues such as tornadoes, floods, earthquakes or other varying events. In the case of this business a disaster will be declared when the IT resources encounter an outage extending 24 to 48 hours depending on the level of availability that is contractually with internal or external business units as well as the level of availability established in the goals and objectives tied to the business operations. The disaster recovery plan is in effect for disasters. Disasters are not encompassing of technology outages such as hardware failures, network failures or software issues that were not cause by a disaster but as an effect of faulty equipment, wrongly implemented releases or other issues arising over the normal lifecycle of the hardware or software. The disaster declaration would occur when there are threats to the safety fo the people working on or around the IT assets, a threat to the building housing these assets or there is a need to activate redundant systems due to a threat to these IT resources or people. These areas of concern can justify the disaster declaration and the implementation of the disaster recovery plan.
Assessment of Security
The security of the company is of utmost importance and can be measure and defined by utilizing a few tools and techniques. The Capability Maturity Model Integration or CMMI utilizes a process improvement method to iteratively increase the maturity of specific functions or systems within an organization. The CMMI follows a stair step approach with five individual and distinct levels of maturity as they progress (CMUSEI 2011). The levels are initial, managed, defined, quantitatively managed and optimized. Each level has distinct goals and objectives to meet prior to reaching the next level ultimately pushing the system into the optimized position for future process improvement. Each organization could be appraised to receive a level of CMMI and from that appraisal a maturity rating of 1-5 is awarded. The lowest possible level is the initial phase. In this phase the processes are unpredictable and each section has little if any control on the process. Another key aspect of the initial phase lies in the fact that all of the precautions and solutions generated by the company are reactive and become “fire drills” to quickly mitigate the issue at hand. While the CMMI appraisal does not guarantee solutions to the issues it does provide a framework for solutions to be created. There are specific process areas that are associated with the type of CMMI that is being performed (Zimmie 2004).
Potential Disaster Scenarios/Process
There are multiple scenarios and what-if situations that a business can prepare themselves for. In the instance of disaster recover there are a few key areas that are important to understand as well as the hierarchical disposition of the situation based on the impact experienced by the personnel and IT resources (Calder 2009). The first is a catastrophic level of impact that includes the structural loss of the business’s facility and the potential long term loss of power to the main facility of the operations. This disaster recovery plan focuses on the IT assets and the ability to bring the business back up and running as quickly and safely as possible. With the loss of structures there must be a secondary location that has a concurrent ability to house and utilize the data that was available at the primary location. This secondary data center will have the core functionality for the business to operate and would be in a geographically disparate area as to not endure the same cataclysmic event of the primary data center.
The next level involves the maintaining of the structural integrity of the business but the operations would be negated for an extended period of time. This means that core functionality would need to move to the remote and secondary data center but a core group of IT personnel would maintain a foothold in the primary location to ensure the operations are maintained in a way to move the secondary operations back to the primary location.
The third level of disaster includes only a couple days of extended loss of ability. This may not require full move of resources to a secondary location and may be able to utilize other web-based applications to maintain operations while working in the primary location to bring the systems back up in a short amount of time. Throughout the disaster recovery process the maintenance of the integrity of the data is imperative. Re-establishing operations is critical but ensuring the data is correct is also a primary objective.
Incident management is a process of establishing awareness of certain events that would impact the business, initiating the appropriate response to those key triggers and returning the business back to operating levels in a timely fashion. The Incident Response Team (IRT) will recognize the threat, react to mitigate that threat and resume normal operations for the business. This team is acting in a way to triage the damage from a disaster and implementing the appropriate level of effort to negate further damage and return the business back to normal operations.
Respond, React and Repair incidents impacting business operations in order to continue the business’s mission.
An incident can follow the same protocol of hierarchical stratification but on scaled down version of a full blown disaster. An incident is any event that impacts that business adversely and is not dependent on the function or business unit. Incident management also is not strictly quarantined into a specific function of the business but acts in an enterprise wide role crossing multiple functions and business units. An incident is declared officially if the breadth of the event crosses multiple functions or units in the business and causes a disruption to a significant portion of the end users.
The organization will be led by an incident manager and one key IT resource from each function. The incident manager will hold this role in a full time position and report directly to the Chief Information Officer (CIO) and the Incident manager will have one key full time employee as a Deputy Incident Manager. The incident manager will be responsible for the incident management process as well as the key performance indicators associated with keeping the business operational. The deputy incident manager will assist the incident manager in key tasks while also maintaining communication efforts to the key points of contacts within each business unit.
Communication will be established through teleconference lines that will open during incidents. There will also be web-based forums to share data and open another line of communication.
The incident management team will provide the ability to identify issues, address those issues and provide a solution to return the business back to operational status. These services will be coupled with finding the root cause of the issues and the appropriate framework for addressing these root causes will be handled accordingly.
Cappelli, P. (2012). How to get a job? beat the machines. Time: Business & Money. Retrieved: http://business.time.com/2012/06/11/how-to-get-a-job-beat-the-machines/
Carnegie Mellon University Software Engineering Institute. 2011. CMMI for development, version 1.3. Retrieved from http://www.sei.cmu.edu/library/abstracts/reports/10tr033.cfm
Calder, A., 2009. Implementing information security based on ISO 27001/ISO 27002 (best practice) Van Haren Publishing.
Calder, A., 2008. ISO27001/ISO27002: A pocket guide IT Governance Publishing.
Chrissis, M, Konrad, M., and Shrum, S., 2011. CMMI: guidelines for process integration and product improvement. Addison-Wesley Professional.
Commerce, O. G. C. O. G. (2007). Service design Stationery Office. Start with security policies, n.d. Retrieved 8/25/2012, 2012, from http://www.altiusit.com/files/blog/StartWithSecurityPolicies.htm
Johnsom, R. & Merkow, M. (2011). Security policies and implementation issues (First.). Jones & Bartlett Learning, LLC.
Project Management Institute, P. M. 2008. A guide to the project management body of
knowledge. (4th ed.). Newtown Square: Project Management Inst.
Zimmie, K., 2004. Secure and mature: combining CMMI SCAMPI with an ISO/IEC 21827(SSE-CMM) appraisal. Retrieved from http://www.sei.cmu.edu/library/assets/zimmie-secure.pdf
Time is precious
don’t waste it!