In Step 1, your department determined what IT assets are critical to the functioning of your department. In Step 2, you analyzed risks to those assets, and determined how to mitigate those risks or accept them where mitigation was infeasible or unaffordable. Now in Step 3, you will identify short- and long-term plans for continuing to provide your mission-critical functions in the event that the mitigation responses from Step 2 prove insufficient or if an unmitigated risk becomes a reality.
What is the impact of your department being down for hours or days? Do you have a way to restore your systems if they are destroyed? Do you have a manual way of performing critical functions in the meantime?
The current state IT security standard describes continuity planning in general:
The purpose of…continuity planning is to provide for the continuation of critical… functions in the event of disruptions…[,] minimize the effect of disruptions [and] to ensure that sensitive information is not compromised….
It also describes some of the common features of such planning:
Arrangements, procedures, and responsibilities, including data backup, off-site storage and contingency safeguards, that ensure critical operations can be continued and that sensitive information can be protected….
Recovery procedures and responsibilities to facilitate the rapid restoration of normal operations at the primary site, or if necessary, at a new facility….
Interim manual processes to enable the continuance of critical operations in the absence of data processing support.
In the event of a true disaster, entailing widespread damage to buildings and people, the University would activate its Critical Incident Management Plan (CIMP). However, departments are expected to plan for and coordinate recovery when problems are localized (what the CIMP refers to as a Level 1 incident). CIMP requires critical incident planning at the departmental level, the IT component of which is included in this process.
The point of disaster recovery is to have your critical functions back up and running as quickly as possible. Interim manual procedures need to be prepared for highly critical processes that need to be performed before full recovery may be possible. Create (or update) a response plan for your department to use in the event that critical IT assets are lost, unavailable, corrupted or disclosed. Below are a series of questions to help you prepare and test this plan. (A copy of this template, as well as all the other templates required to complete your department’s report on the ITS-RM process, is available in Word format here and Adobe PDF format here.)
Note: The costs associated with mission continuity preparedness can be significant, and they increase dramatically the more rapid the recovery that is required. Such efforts do benefit from economies of scale, however, allowing larger organizations to put measures in place that would be cost-prohibitive for smaller ones. Having ITC or HS/CS host services or servers for your department can pay for itself when continuity preparedness costs are factored in, even in cases where the financial case is marginal based simply on day-to-day operational costs.
Unit Name: ___________________ Sub-Unit Name: ___________________ |
|
Mission Continuity QuestionsThe development of a plan for restoration of resources identified in the mission impact analysis and for interim manual processes for continuing critical mission functions during the restoration process. |
|
Documentation Location and/or Decision |
|
A. Interim Manual Process Components (aka Downtime Procedures) |
|
1. Does the department know how long it could function without department computers, servers, or network access? |
|
2. For each mission-critical departmental function, what is the maximum time the department can wait on recovery efforts before proceeding with manual alternatives? Note: Some functions may vary in criticality depending on the time of the year. Example: Class registration procedures may have a long recovery window some weeks, but a very short window in other weeks. |
|
3. How does the department proceed manually with mission-critical functions if critical IT assets are lost, unavailable, corrupted, etc.? How long can this be maintained? Repeat for each identified function. |
|
4. In the event of partial damage or disruption, are the department computers standardized so that users could work from another department or University computer without difficulty? |
|
B. Disaster Recovery Components |
|
1. Who are the members of your designated recovery team? Include name, title, responsibility, e-mail address and telephone number(s) of each member. |
|
2. Do you have the necessary University and departmental personnel contact lists?
See CIMP for official University notification procedures. (Those in the Health System should route notification through HS/CS.)All contacts with the public regarding the incident should be routed through University Relations (Media Relations in the Health System). |
|
3. Do you have hardware diagrams and system configurations, including physical and data security issues? |
|
4. Do you have infrastructure information about your facilities (requirements for power, cooling, network cabling, etc.)? |
|
5. Are installations and changes to those critical physical configurations governed by a formal change management process? (This will wary from simple chronological logging of changes to assist in troubleshooting or back out, to a multilevel review involving significant testing for more complex and highly critical systems.) |
|
6. Do you have the necessary hardware and software vendor contact lists? |
|
7. Do you have a current inventory of your hardware, software and critical data files? Is it updated in real time? |
|
8. Does the department securely escrow passwords for accounts that may need to be accessed in the absence of their normal administrator or in an emergency situation? |
|
9. Do you have a plan for emergency procurement? |
|
10. Do you have recovery plans for each service to be restored (specific, complete, up-to-date)? Do they include a list identifying all system, application and data file systems that must be recovered for each system? |
|
11. Are all important data backed up, with secured off-site rotation? (Off-site rotation involves periodically and systematically moving backup media to a physically and environmentally secure facility at a significant distance from the asset being backed up.) |
|
12. Is system and recovery information stored off-site in a secured location?
|
|
13. Do you test your plan annually? When was the last test? |
|
14. Do you update your plan after each test, or when there is a significant technology change? |
|
15. What training do you have for staff involved with the plan, including communicating and testing the plan? |
|
16. Do departmental personnel know what to do and whom to contact within the department and /or University if a computer security or a disaster incident should occur? |
|
17. Are recovery and continuing operations instructions written in simple, clear, complete sets of steps that upset, fatigued people could follow correctly? |
|
18. Do you have faculty or staff (e.g., researchers) who have critical data (e.g., on which valuable grants depend or which contain legally protected data) but provide their own computing support outside departmental resources? How do you ensure they are included your plans or adopt plans of their own? |
|
Prepared by: Administrative contact Name: __________________________ |
Prepared by: Technical contact Name:__________________________ |
|
Name: _________________________ Signature: ______________________ |
|
Below are simple checklists outlining the key steps in disaster recovery and interim manual procedures. Any plan you develop will need to address these issues.
Disaster Recovery Plan Checklist
- Assess damage
- Notify all appropriate University personnel
- Assemble recovery teams
- Provide infrastructure (space, power, cooling, network, etc.)
- Secure needed hardware and supplies
- Return backup information from off-site storage (backup tapes, documentation)
- Install operating systems on restored servers
- Restore applications and institutional data
- Thoroughly test before going on-line
Interim Manual Procedures Checklist
- Identify the procedure
- Identify those with the knowledge, skill and ability to complete the procedure manually
- Determine how long the process can be interrupted before proceeding manually
- Develop detailed documentation on how the procedure will be performed
- Determine how data is reintegrated once the IT-based system is restored
Based on your answers to the Mission Continuity Questions and the steps outlined in the checklists, create (or update) your IT Mission Continuity Plan using the template below. (A copy of this template, as well as all the other templates required to complete your department’s report on the ITS-RM process, is available in Word format here and Adobe PDF format here.) This template was borrowed and adapted from a model created by HS/CS. The template is intentionally thorough to allow its use in complex situations, so some sections may not be applicable for simple and lower priority items. For example, many departments will not have items deemed critical enough to require interim manual procedures, assuming recovery can be completed within a few days. For an example, an executive summary of ITC’s disaster recovery plan is available at <http://www.itc.virginia.edu/security/disaster.phtml>.
Your department may also take advantage of any general disaster recovery or mission continuity plans you have in place, inserting or integrating IT assets and strategies as appropriate. A copy of your department’s general disaster recovery plan should be on file with the U.Va. Police Department, and your IT Mission Continuity Plan should be included in that filing.
Unit Name: ________________ Sub-Unit Name: _______________ |
|
IT Mission Continuity Plan TemplateBased on your answers to the Mission Continuity Questions, replace the plain text below with the appropriate information. |
|
A. Mission Continuity Requirements 1. Mission Continuity Plan Overview An overview of the departmental plan, identifying the systems it includes and the mission impact of their unavailability. 2. Scope of the Mission Continuity Plan What the plan covers and does NOT cover. 3. Mission Continuity Plan Assumptions Any assumptions implicit in the plan—e.g., nature of the service interruption; availability of staff; what backups are available…. This section should identify existing downtime procedures and include the time tolerance during which they may be used by departmental personnel. 4. Interfaces List of any inbound or outbound interfaces to other systems required for the departmental application’s operation. 5. Escalation Plan Steps taken to evaluate an outage, declare a disaster, and notify departmental and senior management of the event and the decision to invoke this plan. 6. Decision Timeframes for Plans The timeframe in which an event is assessed for mission impact; if a disaster is declared, the timeframe in which staff must respond; the timeframe for notifying senior management. 7. Interim Manual Procedures (aka Downtime Procedures) References to existing documented procedures to be used during a system outage. B. Team Structure, Contacts, and Call Lists 1. Team Structure and Tasks A description of the major activities that must be completed as part of the plan and the departmental teams that must be assembled for their completion; these teams may include people and vendors outside the department and the University. 2. Emergency Notification Plan/Call Lists Lists of documentation required by the teams to accomplish the plan, including their physical location as both electronic and paper documents; contact information for all team members, including office, home, and pager telephone numbers. 3. Vendor Contact List Contact information (names, phone, email, US Postal Service, web sites, etc.) for each vendor that may require contact during a mission continuity event. Include in an appendix a description of all software and hardware products with version and, if applicable, server/CPU serial information. 4. Assembly & Command Centers Designation and description of locations to which staff should report in the event of a disaster or a required evacuation of a building housing departmental equipment subject to recovery; alternate sites should be included; these will be focal points for mission continuity activities when a disaster is declared. 5. Recovery Site(s) Detailed information describing any alternate sites at which computer equipment will be located for recovery purposes; if these locations are provided by an organization outside the department (HS/CS, ITC or a Hot or Cold site vendor), notification procedures should be included. C. Backup Procedures 1. Backup Procedures Detailed description of tools/products used to regularly back up departmental software and data; location of any off-site tape libraries or tape storage; backup schedules; reference to any backup tasks performed by HS/CS, ITC or other entity on behalf of the department. 2. OS/Application Backup/Recovery Procedures Step-by-step actions to be taken to recover operating system, application software, and departmental system data using the tools/products outlined in the previous section; this should contain enough detail so that a knowledgeable person unfamiliar with the daily backups could complete the recovery. 3. Hardware/System Software Plan Overview Describes the computer hardware and operating system software necessary to restore a departmental system in the event of a disaster; includes procedures and controls to assure efficient and timely restoration at an alternate site; appendices may be used to list existing hardware and software and to detail what is available or required at an alternate site. 4. Operating Systems/Other Software Technical references to required OS and application software that will be restored; these should include both electronic and paper copy references as well as material available at vendor web sites. 5. Data Communications Plan Detailed requirements for alternative network connections that must be established in the event of a disaster; if common carrier connections are required, these should be detailed and contracted for in advance; departments should work with the HS/CS or ITC network team to detail and diagram any alternative network connections required. D. Recovery Procedures 1. Hardware/Software Recovery Overview An overview of the general steps to be taken to restore a departmental application’s operation; in general, this would include hardware configuration, OS restoral and initialization, application restoral, data restoral, and application operability. 2. System Recovery Procedures Step-by-step actions to be taken to recover the hardware and operating system; this should contain enough detail so that a person with only general knowledge of the OS could complete the recovery. 3. System Initialization Procedures Step-by-step actions to be taken to initialize the operating system; this should contain enough detail so that a person with only general knowledge of the OS could complete the initialization. 4. Storage Restore List A list (or references to auxiliary documentation) identifying all system, application and data file systems that must be recovered for each system included in the plan. 5. Applications Recovery Step-by-step actions to be taken to restore the departmental application; this should contain enough detail so that a person with only general knowledge of the application could complete the restoral. E. Implementation Plan 1. Types of Recovery Tasks Definitions of task types to be accomplished by the recovery teams; examples are recovery (hardware, OS, application) and support (security, transportation, procurement, etc.). 2. Recovery Team Tasks A detailed listing of all recovery tasks needed to fully restore the departmental application of operability on an alternate (or redundant) computer platform. Each task should include an estimated start time after a disaster occurs; estimated time to complete the task; identification of the team responsible for the task; predecessor tasks that must be completed before each task is started; and a description of the task. Step-by-step instructions for completing each task are contained in previous section of the plan. F. Mission Continuity Plan Testing 1. Mission Continuity Plan Test Objective Departmental disaster plans should be periodically tested. This section defines testing objectives and frequency. 2. Plan Test Requirements and Methodology Testing may be accomplished in many ways (paper walk-throughs, scheduled tests, unannounced tests, tactical exercise, etc.). This section defines the plan testing requirements determined to meet the department’s needs to insure plan success. G. Mission Continuity Plan Maintenance 1. Plan Maintenance Objectives Any disaster plan must be maintained. This section specifies departmental objectives for keeping the plan current and maintaining staff awareness of it. 2. Mission Continuity Plan Maintenance Maintenance of the plan will be required on a scheduled basis (periodic reviews to detect the need for plan changes) and on an unscheduled basis (due to events—an OS upgrade, an application upgrade, a network change, etc.). Periodic reviews should include verifying that recovery hardware capacity is sufficient to meet increasing application transaction processing volume. 3. Interdepartmental Relationships Any required relationships with other departments necessary for the successful completion of a mission continuity plan should be included here. Examples include HS/CS or ITC, Procurement (Material Support Services in the Health System), Legal, and University Relations (Media Relations in the Health System). 4. Mission Impact Analysis (MIA) Departments should periodically perform a Mission Impact Analysis on their operation of the effect of a departmental application failure. This section should contain a summary of the most recent MIA the department has conducted. H. Relocation Plan 1. Returning to Normal Operations Factors affecting a return to normal operations should be included here if temporary relocation to a Hot/Cold Site is part of the recovery plan. I. Appendices 1. Appendix A: Call Lists/Contact Information 2. Appendix B: Equipment Inventory 3. Appendix C: Software Inventory 4. Appendix D: Network Diagrams 5. Appendix E: Mission Continuity Contracts |
|
|
Name: ______________________ |
Approved by: Unit head Name: ______________________ |
