EZ Messenger maintains, and tests, a Business Continuity and Disaster Recovery Plan. Details of which are included here.
Business Continuity and Disaster Recovery Plan
EXECUTIVE PROCESS, LLC
Executive Process LLC dba EZ MESSENGER
9210 S. Western Ave., Suite A3
Oklahoma City, OK 73139
The overall objective of the Business Continuity/Disaster Recovery (BC/DR) Plan is to provide reassurance that in any situation and/or disaster, EZ Messenger will perform the necessary actions outlined in this document. To aid in the understanding of the BC/DR plan, consider the following terms:
Table 1: Terms
Daily the CLIENT will be providing data via web services, SFTP or physical documents (data entry) to EZ Messenger’s data warehouse hosted at LightEdge, a colocation provider. Also daily, the CLIENT field offices will be providing digital documents via EZ Messenger’s customer web portal, SFTP or via physical documents. The EZ Messenger web portal is also hosted on hardware at LightEdge.
Daily, EZ Messenger will be providing data and documents from its data warehouse back to the CLIENT via web services, SFTP or the customer portal, these will also originate from the LightEdge facility.
At the primary facility EZ Messenger staff will be reviewing the data and documents received and distributing those service requests electronically via the F.A.S.T. application to EZ Messenger’s other locations for service of process to be effected. Through this methodology the risk profile for various disasters is greatly reduced because no single location is carrying a substantial part of the workload.
For information regarding data retention and destruction, refer to EZ Messenger Data Destruction and Retention Plan.
In the event of a disaster at the Primary Facility, EZ Messenger will shift the electronic handling of inbound service requests to our Disaster Recovery Facility. The nature of the F.A.S.T. application and EZ Messenger procedures will allow the seamless transition of the production aspects of the batch import/web job queues. EZ Messenger’s cross training and similarity of procedures will allow a multitude of staff to assist with the Primary Facility downtime.
In the event of a disaster at the Primary Data Center, EZ Messenger will shift the F.A.S.T. application to the Amazon AWS Cloud Service. EZ Messenger maintains an Amazon Cloud Services account with an offline version of its application for this purpose. In the event of a disaster at the Primary Data Center, the Amazon AWS Instance will be enabled, DNS records updated, and the most recent data backup restored. Once complete, any EZ Messenger staff not impacted by the disaster will have immediate access to F.A.S.T. In the event a disaster was to occur at the Primary Data Site, some programming would be required on the CLIENT side to update the paths for the web services and/or SFTP paths in use. These updates could take as long as several days and could interrupt the ability to send and receive data via web services or SFTP. However, these delays would not impact service requests already placed with EZ Messenger and the EZ Messenger customer web portal could be used in an emergency to send new service requests via the Send Us Work Wizard.
The focus of this document is to provide a plan to respond to situations that adversely affect the continuity of critical business operations. Two (2) categories of situations are covered:
- ‘SOFT’ Disasters and
- ‘HARD’ Disasters.
A ‘SOFT’ disaster is a situation in which the critical business functions of the CLIENT contract are affected but can be restored within less than one (1) Business Day.
A ‘HARD’ disaster is a situation in which one (1) or more critical business functions are interrupted and the return to normal business operations is anticipated to take one (1) Business Day or more.
The following Mission Essential Functions will be addressed:
1) Web Services - web services are required to call the CLIENT application and retrieve service requests. Additionally, the web services are required to send data back to the CLIENT application. Alternatively, SFTP may be utilized and associated CLIENT and EZM services may be reading and writing to an EZ Messenger hosted SFTP with paths that may require updating.
2) Customer Portal - the EZ Messenger customer portal is utilized for the CLIENT offices to upload service documents as well as check on service status.
3) F.A.S.T. Application - the EZ Messenger F.A.S.T. application is critical for use by the EZ Messenger staff and process servers in the field in order to proceed with the service requests.
4) Production Staff - maintaining capacity of the production staff is critical to processing the inbound service requests in order to transfer the requests into the field with a process server.
5) Service Staff - service team staff must assist with assigning and managing the inventory with the process servers.
6) Courts Staff - courts team members are required to facilitate the electronic filing of the affidavits of service with the courts.
7) Production Hardware – hardware including computers, scanners and printers must be available for all staff to function.
1 BC/DR Plan Objectives
The following contact list will be utilized by employees or vendors to notify the respective EZ Messenger staff that a potential disaster has occurred that will impact the ability of either the Primary Facility or the Primary Data Center.
Immediately following the disaster, EZ Messenger planned events are as follows:
1. Key personnel and recovery teams are grouped to implement the plan.
2. Staff and Corporate Personnel will be informed of the situation and provided updates.
3. Employees at the Primary Facility in Oklahoma City, OK will be alerted of the situation and provided updates.
4. Employees at the Disaster Recovery Facility will be alerted of the situation and informed of their designation as the active recovery facility.
EZ Messenger will ensure that recovery workers are provided with resources to meet their physical and emotional needs. This plan calls for the appointment of a person in the Executive Team whose job will be to secure these resources so workers can concentrate on the tasks at hand.
A Safety Evacuation Plan is being drafted for each of the EZ Messenger sites to address unique floorplan and evacuation options. This plan will be updated on/ before June 15, 2020 with said plans.
Initial efforts are targeted at assessing the wellbeing of the personnel at the site, then damage of the site, and then gauging the disaster as either Soft or Hard in nature. Each of the 7 Mission Essential Functions must be evaluated.
After completing the assessment of the Primary Facility and/or Primary Data Center, management personnel will determine whether the Primary Facility, Primary Data Center or both are jeopardized and gauge the disaster as soft or hard. The soft or hard designation may not be the same for the Primary Facility as it is for the Primary Data Center.
Assessments will include review of all resources such as power, water, communications, and staffing. In the event of a disaster that physically damages the Primary Facility or otherwise makes it uninhabitable, EZ Messenger will begin work quickly to repair, rebuild, or replace the Primary Facility. If the Primary Data Center is unusable, EZ Messenger will begin immediately sourcing and vetting a new location for the Primary Data Center.
Depending upon the nature of the disaster, EZ Messenger may choose to invoke an alternate facility or alternate data center as compared to the named alternates provided in the chart above. EZ Messenger has the ability to accommodate Remote Staff in multiple states and depending on the nature of the disaster it may deem a temporary switch to Remote Staff to be the most suitable solution at that point in time.
The recovery process may rely heavily upon vendors to quickly provide replacements for the resources that cannot be salvaged. EZ Messenger will mitigate this risk by maintaining redundant equipment which can be brought in as needed. From time to time EZ Messenger will enact mock recoveries by having Remote Staff or staff at the Disaster Recovery Facility perform portions of the daily responsibilities of the Primary Facility as it relates to the CLIENT contract.
Contractually stipulated thresholds for periods of unavailability will dictate whether operations will be resumed at the Primary Site (‘SOFT’ disaster) or at the DR Site (‘HARD’ disaster).
- In the event of a ‘SOFT’ disaster, operations will be resumed at the Primary Facility using the equipment and services remaining after the disaster event.
- In the event of a ‘HARD’ disaster, operations will be transferred to the Disaster Recovery Facility. In the event of a ‘HARD’ disaster at the Primary Data Center, systems will be restored in the Amazon AWS Cloud while a new Primary Data Center is sourced and vetted.
In both circumstances, personnel will follow the instructions contained in this Plan. Since all plans of this type are subject to the inherent changes that occur in the computer industry, it may become necessary to update this plan to accommodate industry changes as well as a change in EZ Messenger’s facility map.
For more details, see Appendix A – Disaster Recovery Plan for Resumption of Operations.
Disaster at the Primary Facility will not require any restoration of data or programs as it relates to the CLIENT contract. Because EZ Messenger has the ability to immediately accommodate Remote Staff with redundant equipment there will be zero down time encountered due to a failure at the Primary Facility.
Disaster at the Primary Data Center will require the restoring of both the F.A.S.T. application for daily operations of existing service requests as well as restoration of web services that retrieve and place data with the CLIENT. The Primary Data Center is well fortified against ‘soft’ disasters with multiple power supplies, backup generators, multiple internet supplies and fire suppression. If a ‘soft’ disaster is encountered, temporary suspension of operations may be experienced while systems are restored onsite. With a ‘hard’ disaster at the Primary Data Center, EZ Messenger will restore a database backup to its Amazon AWS Cloud system, update DNS records and enable services to allow operations to recover. CLIENT web service access will be impacted in the event of a ‘soft’ or ‘hard’ disaster at the Primary Data Center.
Since some time may have elapsed between the time that the off-site backups were replicated and the time of the disaster, a Database Engineer will compare the data within the data warehouse to determine the last effective backup. This date and time will be reported to the CLIENT Contract Manager via the EZ Messenger Project Manager or Contract Manager. The CLIENT and EZ Messenger will work out details to identify the missing data and determine if it will be possible and feasible to replace the missing data from the last effective backup to the time of the disaster.
Certain events may require the restoration of network connectivity or, in the event of a ‘HARD’ disaster that affects the entire operation or critical business component, all or part of the network will be compromised. EZ Messenger has deployed redundant domain controllers to each of its facilities which will allow each facility to operate independently of one another in the event of a disaster at the facility level. The loss of a single facility has limited impact to the local process serving and field operations. Restoring connectivity to a local facility can be done by restoring internet and delivering a router/firewall device to the location with limited resources and time.
Disaster at the Primary Data Center will cause temporary disruption to the F.A.S.T. application as well as EZ Messenger phone hardware. Phone lines will auto forward in the event of failure to key employee cell phones. Facilities will remain connected to one another in the event of Primary Data Center disaster via the mesh VPN network in place between local domain controllers. Email will be disrupted in the event of a Primary Data Center disaster and will need to be restored.
If the recovery process has taken place at the Disaster Recovery Facility after a ‘Hard’ disaster at the Primary Facility, EZ Messenger will first confirm that all systems, employees and other resources are fully functional at the restored Primary Facility. By allowing the restored Primary Facility to begin working virtually on a variety of work within the F.A.S.T. application, EZ Messenger can determine that all tools, such as scanners, phones, printers, copiers, faxes and all forms of hardware and software are operational. At such time EZ Messenger will reinstate the Primary Facility as the lead office with regards to the CLIENT Contract.
If recovery to the Disaster Recovery Data Center has occurred after a ‘Hard’ disaster EZ Messenger will source and vet a replacement colocation facility, acquire necessary hardware, install and configure hardware and perform testing of all F.A.S.T. application elements as well as network connectivity, email and other tools. Once all systems are fully vetted EZ Messenger will migrate data and systems from the Amazon AWS Cloud to the newly defined Primary Data Center. Once the original Primary Data Center is fully restored and recovered the process will work as if it was an entirely new location.
To facilitate recovery from a ‘HARD’ or ‘SOFT’ disaster, advance preparations have been made to provide quick and orderly restoration of the services.
The following sections highlight the key components of our Disaster Recovery strategy:
- DR Site Arrangements;
- Replacement Equipment;
- Data Protection Strategy;
- Notification List.
2.1 DR Site Arrangements
In the event of a ‘HARD’ disaster (an event which adversely affects the availability of Service for a period estimated to exceed twenty-four (24) hours), EZ Messenger will immediately begin preparing disaster recovery sites. If a ‘Soft’ disaster occurs but no firm approximation of restoration is available within four hours of the disaster occurring, EZ Messenger will also begin preparing the disaster recovery site.
For the Disaster Recovery Facility limited arrangements are required. First, the Department/ Office Manager will notify all employees within the Disaster Recovery Facility that they will be acting as the Disaster Recovery facility and provide them with an ETA of the recovery point. Second, procedures will be distributed and reviewed with employees at the facility to ensure they are up to date on the specifics of the CLIENT Contract. Third, adjustments in the F.A.S.T. system will be made, if necessary, to give those employees access to the CLIENT data. Next, applicable phone numbers and email addresses will be forwarded to those at the Disaster Recovery Facility. Then, the Disaster Recovery Facility will engage in performing the necessary production work for the CLIENT Contract. Finally, EZ Messenger will make sure the CLIENT Contract Manager is aware of the recovery facility and is also aware of implications in the field with process servers, in the disaster area, and locally.
In the event of Disaster Recovery at the Data Center, several arrangements will be made. First, the Sr. Systems Analyst will notify all IT Staff and Vendors that we are beginning the restoration process. Second, the network team will start the Amazon AWS Instance and appropriate necessary resources. Third, the network team will begin transferring the last backup to the Disaster Recovery Data Center. Fourth, the database engineer will review the backup restoration and confirm the data availability and date and time of the backup that was restored. The Sr. Systems Analyst will report the span between the backup and the disaster to the EZ Messenger Contract Manager and Project Manager. Next, the network group will patch all DNS records to the new IP addresses of the Disaster Recovery Data Center thus allowing functional facilities to resume access to the F.A.S.T. Application. Following F.A.S.T. restoration the Network Team will work with the CLIENT to restore web services at the Disaster Recovery Data Center. Operations on existing service requests will resume as soon as the F.A.S.T application is restored, tho web services restoration may be delayed.
The inevitable changes that occur in the systems over time require that the plan be periodically updated to reflect the most current configuration. Where possible, EZ Messenger has entered into agreements with vendors to supply replacements on an emergency basis.
To avoid problems and delays in the recovery, EZ Messenger has on hand critical backup equipment such as routers, firewalls, server hardware and workstations. Additionally, EZ Messenger maintains an Amazon AWS Cloud Instance to accommodate the F.A.S.T. application and its associated packages.
The EZ Messenger backup process ensures that:
- All the critical data necessary to support the CLIENT Service of Process Project is backed up daily.
- The backup data is stored on encrypted drives at Stream IT’s location in Austin, Tx.
- The daily backups will be retained for a minimum of 10 days.
The EZ Messenger Facilities do not retain any contract related data at the local level. All data is retained within the F.A.S.T. application at the Primary Data Center along with all emails, network directories and other file storage. Each EZ Messenger facility follows strict guidelines for not storing documents or data locally unless specifically required to complete Service of Process. The Primary Data Center follows the following schedule:
Table 2: EZ Messenger Standard Backup Procedures
Table 3: Texas/CLIENT Backup Procedures
EZ Messenger will use Backup For Workgroups for Windows. Linux systems are backed up via internally built Chron scripts. Backups are stored both on-site as well as off-site. Off-site backups are 256 bit AES encrypted.
A complete set of automated restoration procedures and accompanying documentation will be created. A hard/soft-copy version of each will be kept at the DR Site. Additionally, the CLIENT Contact will have an up-to-date set to use for restoring systems when necessary.
The disaster notification list for the State of Texas/CLIENT Project is shown below. These people are to be notified as soon as possible when disaster threatens or occurs.
Table 4: Disaster Recovery Notification List
1. Temporary loss of power;
2. Temporary loss of communications;
3. Loss of HVAC, Elevators, General Building Safety;
4. Severe Weather;
5. Short term staffing shortages caused by illness or ingress challenges;
6. Hardware error or failure; and
7. Software error or failure.
The Project Manager will be responsible for notifying the CLIENT of the disaster. Should the Project Manager be directly impacted by the disaster rendering them unavailable, the Chief Operating Officer and/or Client Services Director will perform the notification. The Project Manager, or the interim Project Manager, will work directly with the IT Team to assess the disaster and estimate the recovery.
EZ Messenger will establish notifications and procedures for staff to follow in the event a system outage or interruption occurs. Generally, these procedures will follow the following outline:
Table 5: Soft Disaster Outline
In the event that one (1) or all of the functions of the Service of Process Project is not available, a broadcast message can be recorded and placed on the phone system to provide information to callers. This function is available in the production or the disaster environments. In the event of temporary phone system outage, EZ Messenger key staff will utilize their cell phones for communication. EZ Messenger’s phone system will be automatically configured to fail over and forward calls to key personnel cell phones.
Due to the nature of EZ Messenger’s distributed workforce and resources, moving to the DR site will be limited in impact. Employees at the DR site include the Director of Compliance, Director of Court Services, and COO and are intimately familiar with the Service of Process[GG17] Project. A SOFT disaster at the Primary Facility will naturally fail over to the Disaster Recovery Facility. The 60+ employees in the Disaster Recovery Facilities and the associated hardware are more than sufficient to absorb the 15 full-time employee workforce in Oklahoma City for a short window during a SOFT disaster. A SOFT disaster at the Primary Data Center at LightEdge will affect all EZ Messenger locations equally and will require necessary downtime to restore at the Primary Data Center. The Soft Disaster Outline in Table 5 explains the timeline for that recovery process.
Should either initial or evolving circumstances dictate that a ‘HARD’ disaster be declared and the provisions for relocating operation to the Disaster Recovery Data Center or Disaster Recovery Facility be put into motion, predefined steps occur. First, the Recovery Manager will be defined based on impacted locations in the Disaster. The primary Recovery Manager will be the COO. If the COO is unable to perform the duty, the Director of Compliance will become the Recovery Manager. In the event the Director of Compliance is unable to perform the duty, the Project Manager will become the Recovery Manager.
Examples of such events include:
1. Prolonged loss of power with no known restore date;
2. Prolonged loss of communications with no known restore date;
3. Irreparable building damage;
4. Long term staffing shortages resulting from a pandemic or mass casualty;
5. Catastrophic Hardware error or failure; and
6. Catastrophic Software error or failure.
One of the Recovery Manager's important early duties is to determine the status of personnel working at the time of the disaster. Safety personnel on-site after the disaster will affect any rescues or first aid necessary to people caught in the disaster. However, the Recovery Manager will, as soon as possible, prepare a list of all employees and begin contacting those employees to determine their wellbeing.
The Recovery Manager will also update the employee list to provide the operations group with a known list of employees that are still able to perform duties under the Service of Process Project.
One of the keys to a successful operation is to keep key people informed. Therefore, regularly scheduled update meetings will take place where the Recovery Manager, or the assigns of the Recovery Manager, will update employees on the disaster and the impacts to the operations as well as the progress of the Disaster Recovery Plan.
The Recovery Manager will be the point of contact with the CLIENT Contract Administrator to establish a schedule for these meetings. Key personnel will provide updates on the progress of recovery and timelines for completion. These meetings will start within two (2) to four (4) hours after a declaration of a ‘Hard’ Disaster.
The Recovery Control Center is the location from which the disaster recovery process is coordinated. The Recovery Manager will designate where the Recovery Control Center is to be established. The Recovery Control Center will be virtual to maximize recovery ability.
A primary goal of the recovery process is to restore all computer operations without the loss of any data. The Computer Recovery Team will begin analyzing the damage at the Primary Data Center as soon as access is permitted to the site. In the event no access is permitted within 4 hours of the disaster a total loss will be assumed at the Primary Data Center and recovery to the Disaster Recovery Data Center will commence.
As soon as practical, all salvageable equipment and supplies will be assessed at the Primary Data Center, Primary Facility and Disaster Recovery Facility. If the equipment has been damaged but can be repaired or refurbished, repair will commence immediately.
As soon as is practical, a complete inventory of all salvageable equipment, media, and remittance instruments must be performed and measures taken to secure them for later use. This inventory list will be delivered to the Technical Coordinator and Administrative Coordinator who will use it to determine which items from the DR hardware and supply lists must be procured to begin building the recovery systems.
The Recovery Manager, Casey Cox (COO), sets the plan into motion. Early steps to take include:
1. The Recovery Manager will call up a recovery team.
2. The Recovery Manager’s team will consist of a Corporate Oversight Team, Recovery Coordination Team, Facilities Team, Supplies and Office Recovery Team, Personnel and Staffing Team and Emergency Finance Team.
3. The Recovery Manager is to call a meeting of the Recovery Management Team at the Recovery Control Center or a designated DR site. The CLIENT Contract Administrator or his designee will be invited to this meeting. The following agenda is suggested for this meeting:
a. Each member of the team is to review the status of their respective areas of responsibility.
b. After this review, the Recovery Manager will clarify any questions or concerns with the team and, if necessary, replace team members who have stated that they are not able to perform the duties as discussed.
c. The Recovery Manager briefly reviews the Disaster Recovery Plan with the team.
d. Any adjustments to the Disaster Recovery Plan to accommodate special circumstances will be discussed and amended with the team.
e. Each member of the team is charged with fulfilling his/her respective role in the recovery and to begin work as scheduled in the Plan.
f. Each member of the team will review the makeup of their respective recovery teams. If key individuals are unavailable, the Recovery Manager will determine if other resources are available to assist and allocate those resources accordingly.
g. The next meeting of the Recovery Management Team is scheduled. It is suggested that the team meet at least once each day for the first week of the recovery process.
4. The Recovery Management Team members are to immediately start the process of executing the Disaster Recovery Plan and fully restoring operations at the Disaster Recovery location.
5. The Recovery Control Center should be supplied with the following equipment:
a. Desks/Tables and Chairs;
b. Land Line/Cellular Telephones;
c. Personal Computers/Laptop;
e. Wireless/Wired network;
Most of the key individuals that will be involved in the recovery carry both cellular telephones and laptops with them. Everything that is needed here will be highly dependent on the type of disaster that has occurred. As an example, there may not be cell phone communication available – in which case, in the event internet is available, communication will occur via email or Microsoft Teams. The personal satellite phone of the COO may be used in extreme circumstances as well.
If there were no land line phone service, there would also be no internet access. In that case, restoration of the WAN network will be delayed until internet access is restored. In lieu of internet recovery, all other aspects will be completed so that business can commence as soon as internet is available. The disaster recovery plan will be updated.
5.1 Recovery Teams Overview
This section describes the teams and their scope of responsibility for dealing with a ‘HARD’ disaster.
The division and assignment of tasks is to be performed according to a best practices and knowledge based approach. For example, staff that is technically proficient at Local Area Network (LAN) and/or Wide Area Network (WAN) connectivity are assigned to the Computer Recovery Team and charged with that recovery task. Staff familiar with capacity and staffing will be assigned the Personnel and Staffing team.
The teams that are used in this Plan include:
1. Corporate Oversight Team, COO as lead,
2. Recovery Coordination Team, COO as lead,
3. Facilities Team, VP of Operations as lead,
4. Computer Recovery Team, Network Engineer as lead,
5. Supplies and Office Recovery Team, VP of Operations as lead,
6. Personnel and Staffing Team, Human Resources Manager as lead and,
7. Emergency Finance Team, CFO as team lead.
This team is responsible on the EZ Messenger corporate level for general oversight and status of the recovery efforts. They will also assist other teams as necessary in ensuring the teams are fully prepared and adequately appropriated for the recovery effort. This team will be directed and coordinated by the COO.
The COO is the designated Recovery Manager/Coordinator for on-site recovery efforts. The responsibilities for this team include ensuring that all teams are assembled and staffed appropriately as well as monitoring the ongoing status of the recovery and reporting status to the Corporate Oversight team, client services and the project and contract managers to the CLIENT.
This team is responsible for ensuring power, water, internet/phones, and transportation are available for the disaster recovery location.
The tasks to be performed include validating with the electric company that power is available at the disaster recovery location, confirming with telecom provider that phone/internet are in place and ensuring that transportation to and from the disaster recovery location are available. This team will be directed and coordinated by the VP of Operations.
The responsibilities of this team include restoring the LAN/WAN and providing necessary hardware in the form of computers, printers, copiers, monitors and other devices for the staff at the Disaster Recovery location. The team will be directed by Matt Wash, Systems Support, and assisted locally by the Supplies and Office Recovery Lead.
The responsibilities of this team will include providing necessary supplies such as paper, pens, postage, desks, chairs, and other items required in the day to day operation of the business.
The team will be directed by the VP of Operations, with local coordination from the Office Coordinator.
The team’s first priority is to ensure staff is accounted for and safe. The team also assesses availability of staff to resume work at the disaster recovery location and a schedule for resuming that activity. The team will be headed up by the Human Resources Manager. Local coordination and resources will be provided by the Office Coordinator.
The team’s responsibilities include contacting investors, banking relationships, merchant providers, vendors, and payroll providers. It will be the team’s priority to ensure that finance lines are in place to continue providing payroll to staff and key vendors as well as ensuring depositories are available for customer payments.
The team will be headed up by the CFO. Local coordination and resources will be provided by the Project Manager.
See Appendix B for a current Contact List.
The success or failure of this Plan's ability to ensure a successful and timely recovery of the central computer and network facilities hinges on the ability to purchase goods and services quickly.
Prior recognition of the need for emergency procurement, coupled with extensive Business Interruption Insurance, provides the Recovery Manager with a sound basis for aggressive recovery actions.
The Emergency Finance Team is responsible for all emergency procurement for the Service of Process Project. All Disaster Recovery Team members must provide an itemized list to the Emergency Finance Team that they believe will need to be acquired during the disaster recovery process. Where possible EZ Messenger will maintain sufficient backup hardware and devices at various locations to ensure prompt recovery. Hardware and devices not kept in ready supply on site will be noted and acquired on an as needed basis for recovery.
This section focuses on the preparation of the designated DR Site for the recovery of primary computing and network facilities after a disaster has occurred. In the event of a ‘Hard’ disaster, this site will be used for recovery after the disaster.
The WAN used to ensure the availability of data and connectivity is detailed in the diagram in Appendix C – WAN Specifications.
To be prepared for a disaster, the following procedures are in place to maintain availability and readiness.
1. 24 Hour on-call IT staff
2. Multiple potential DR locations available
3. Multiple daily offsite backups of all critical servers
The DR Site location will provide the following:
1) Redundant internet connectivity
2) 24-hour access to requisite EZM Staff, as needed
3) Idle services at the ready for spool up
DR Site recovery will provide connectivity and data availability to project staff. The disaster recovery solution proposed:
1. Most of the infrastructure is based on a virtualization platform, with regular offsite backups to allow services to be restored to any other hardware platform that meets the performance specifications of the existing virtualization platform.
2. The database server is a MySQL cluster that is backed up offsite and can also be restored to any other Linux platform with equivalent performance to the existing cluster.
3. The firewall configuration is stored offsite for quick recovery to an identical firewall, either in the same location or a different one, should the old one be unusable or inaccessible for any reason.
4. DNS records are configured with a 30 minute or less TTL to allow IP address changes to propagate more rapidly. Additionally, public DNS caches (Google, TWC, etc.) will be flushed immediately for relevant DNS records in the event of an unplanned IP address change.
5. In the event of a disaster that disabled or destroyed the existing datacenter, Stream IT would be notified. The virtual machines/MySQL databases and configuration will be restored to the DR platform from offsite backups. Concurrent changes to DNS would be made to ensure that when services are back online, public DNS names are pointed to the new IP addresses. Additionally, and concurrently, the firewall configuration would be restored and modified to reflect the new IP range, and the VPN tunnels to each EZM branch office restored.
In the event of a disaster, arrangements will be made to ensure the recovery site for operations is sufficiently staffed and prepared for the increased workload. Overtime will be allotted, or staff voluntarily brought in from other locations to assist with the additional workload. If necessary, a staffing agency/temporary labor group will be utilized to assist; however, due to the distributed nature of the workload at EZ Messenger, it should not be necessary. The Oklahoma City, OK location currently has a total of sixteen (16) staff members. The Disaster Recovery Facility has over sixty (60) staff members and is capable of absorbing the Oklahoma City capacity with little to no notice. Additionally, as most work is virtualized, each department’s workload can be picked up quickly by any of the qualified employees across all EZ Messenger offices. EZ Messenger will maintain 100% of production staff as approved for CLIENT work based on required background checks and review by the CLIENT. EZ Messenger’s employee handbook is being updated to accommodate and explain compensation and relocation coverage in a disaster scenario. The updated employee handbook, or the section pertaining to the above, will be provided as an addendum to this document no later than February 1, 2016.
***EZ Messenger requires additional information on what access relative to TXCSES is referenced here to properly address this concern.
EZ Messenger will provide database access to the DR agents. The database will be current as of the moment of the last backup. Agents will confirm the data is intact and the F.A.S.T application is operating normally.
To route incoming calls to the Disaster Recovery Facility EZ Messenger should not require any additional action due to the nature of the distributed phone system. Calls not answered in the Primary Facility will automatically roll over via designated hunt group. In the event of telephone outage, the various hunt groups default to cell numbers held by EZ Messenger managers.
The Disaster Recovery Facility will maintain internet services at all times and remains on the WAN. Transferring or forwarding internet services will not be necessary.
Questions regarding insurance coverage will be directed to Jacqueline Powers, Controller.
In the event of loss or damage, the Facilities Team will work with the Supplies and Office Recovery Team to accommodate required functions with available resources. The Facilities Team will work with insurance companies, vendors, and staff at the affected location to restore functionality as quickly as possible.
This document contains a list of vendors utilized by EZ Messenger to include office supplies, computer hardware, payroll providers, telephone, and internet providers. The list will consist of all vendors paid in the prior 120 days from QuickBooks.
This portion of the plan documents the detailed recovery procedures for each of the computer and network systems to be restored at the DR site. Each procedure documents the list of equipment necessary to restore service, power and cooling, cabling and networking, operating system and data restoration procedures, and procedures for placing the system into final form for general use.
Upon declaration of a ‘Hard’ disaster, the DR Site Director is informed, and arrangements are made to use the facilities and personnel.
Table 6: DR Site Recovery Contact Information
The first order of business is to prepare the DR site virtual hardware to a minimum specification of two servers with at least 128GB of RAM, 24 CPU Cores, and 4TB of storage each for the HyperV hosts, as well as three additional servers with 64 GB RAM, 12 CPU cores, and 250GB of disk space each for the MySQL cluster.
In the event a disaster is declared, the following steps will be initiated:
1. Prepare the DR site hardware
2. Locate and prepare most current backups available
3. Connect all hardware and install base Operating Systems
4. Configure Hyper V and spin up base OSes for all VMs
5. Begin restoration of each VM from Offsite backup in the following groups in the following order:
a. Mysql, web, hostsvc, mysql-arb, clustercontrol, onr-dc2
b. Testweb, testmysql, efile, reporting
c. Exchange, scan1, scan2
6. Configure mysql cluster and begin restoration of DBs/config from Offsite backup
7. While restoration of data is taking place, restore firewall configuration to Sonicwall device and update configuration to represent the new IP schema at the DR location
8. Update DNS records to reflect the new IP schema for the DR location
9. Restore VPN tunnel connectivity to each EZM office
10. Upon completion of offsite data restoration (per-server) spin up VM’s and verify basic functionality/connectivity of each server/service as they become available.
11. Test external access to services as they come up and notify affected internal staff and clients of service restoration.
12. Reconfigure/resume offsite backups from temporary infrastructure.
Having a disaster recovery plan is critical, but it will rapidly become obsolete if a workable procedure for maintaining the plan is not also developed and implemented. This section provides information about the document itself, standards used in its construction, and maintenance procedures necessary to keep it up to date.
It is inevitable in the changing environment of the computer industry that this disaster recovery plan will become outdated and unusable unless someone keeps it up to date. Changes that will likely affect the plan fall into several categories:
4. Procedural; and
As changes occur in any of the areas mentioned above, EZ Messenger management will determine if changes to the Plan are necessary. This decision will require that the managers are familiar with the Plan in some detail. A document referencing common changes that will require plan maintenance will be made available and updated when required.
Changes that affect the platform recovery portions of the Plan will be made by the staff in the affected area. After the changes have been made, the EZ Messenger appointed contact will be advised that the updated documents are available. They will incorporate the changes into the body of the Plan and distribute as required.
The following lists some of the types of changes that may require revisions to the Disaster Recovery Plan. Any change that can potentially affect whether the Plan can be used to successfully restore the operations of the department's computer and network systems should be reflected in the Plan.
Additions, deletions, or upgrades to hardware platforms.
1. Additions, deletions, or upgrades to system software.
2. Changes to system configuration.
3. Changes to applications software affected by the Plan.
Changes that affect the availability/usability of the DR Site location.
- Changes to personnel identified by name in the Plan.
- Changes to organizational structure of the department.
- Changes to personnel levels at either the primary facility or the DR site.
- Changes to off-site backup procedures, locations, etc.
- Changes to application backups.
- Changes to vendor lists maintained for acquisition and support purposes.
The Disaster Recovery Plan will be updated annually by EZ Messenger for specific information such as team member names, references to bond numbers, insurance policy numbers, local addresses, changes in team participation, and changes in local vendors. The Plan will be submitted to CLIENT with the changes highlighted for approval.
EZ Messenger will conduct a thorough evaluation of the following disaster scenarios that are site specific:
1. Location Hazards:
a. Proximity to Airport;
b. Multi-level Building;
c. Nearby Multi-level Building;
d. Congestion Area;
e. Industrial Park;
f. Proximity to Chemical Factory;
g. Proximity to Chemical Laboratories;
h. Proximity to Natural Gas;
i. Proximity to Railway;
j. Power Supply Problems; and
2. Low-lying Areas.
3. Large Bodies of Water.
4. Proximity to Military Exercise Area.
5. Major Construction Equipment.
6. Terrorist Activity.
7. Virus Contamination.
9. Intrusion by unauthorized personnel.
EZ Messenger will also conduct annual reviews of the following scenarios that are specific to personnel:
1) Review staffing levels at the primary facility and DR site to ensure capacity levels can be absorbed in a disaster recovery scenario
2) Evaluate staffing geography at the primary facility to determine likelihood of ingress challenges for staff that could result in loss of workforce
Each year, the first week of March, EZ Messenger will hold comprehensive training for all members of the disaster recovery as well as its regional management staff. Training will include a step by step review of the disaster recovery plan and each role within the plan. Key staff will be identified as potential backups to the identified roles in the plan and those specific roles reviewed with the staff. Gerri Gentilquore, Director of Compliance, will oversee the training and review of the disaster recovery plan. Immediately following review and training on the plan will commence testing of the plan as outlined below.
The following section highlights the steps that will be taken to test the DR Plan and train all involved personnel in the execution of different disaster scenarios. To complete the Plan, a “dry run” situation test will be performed annually, and the results of the test will be documented (See Appendix D – Annual Disaster Recovery Plan).
This test includes:
1) Annual test the second week of March for primary production site. The Primary Facility will encounter a mock internet outage resulting in the need to move production to the Disaster Recovery Facility to simulate a “Soft” disaster.
2) Annual test the third week of March for primary data center. Due to the critical nature of the data center and systems, the test of the Primary Data Center Disaster Recovery to Amazon will be done parallel to the Primary Data Center remaining active and operational. Systems, including F.A.S.T., web services, customer portals, mobile apps, email, and other network functions will be restored to the Disaster Recovery Data Center site and tested by key staff.
If the test does not meet the outcome expected by EZ Messenger or the CLIENT, corrective action will be taken, which will include re-evaluation of the Plan’s adequacy, the action items called out in the Plan, and re-training of team members. EZ Messenger will then perform retraining of staff on the Plan the first week of May, redo the Primary Facility the second week of May and the Primary Data Center the third week of May. In the event either the Primary Facility or Primary Data Center test was successful the additional test will not be necessary for that section of the Plan.
- Continuation of Business – Site and Infrastructure Still Usable – Same Day Recovery Phones will be rerouted. At time of this document only one customer service employee exists in the Primary Facility. She is part of the client services hunt group that automatically rolls to additional client services staff. Back-up files of databases and image files are to be replicated offsite. Plans must be in place for retrieval, installation, and reinitiating of full operations within: [timeframe] hours following restoration of power or at the time occupancy of the building is allowed.
- Hard Disaster – Site Not Usable – Requires Relocation of Operations EZ Messenger must have designated DR Site with hardware and software that emulates operations and is available within required time frames. EZ Messenger must have a plan for reinitiating operations at this DR Site in the event of a hard disaster that prevents the use of existing application and infrastructure for more than twenty-four (24) hours.
Off–site operations will need to be initiated by EZ Messenger as follows:
1. Phone lines will be ported to the Disaster Recovery Facility within two (2) hours of the disaster along with a Broadcast Message.
2. EZ Messenger has a dedicated disaster recovery site which will be made available for use within four (4) to eight (8) hours.
3. Operations will resume as soon as the Disaster Recovery Facility is functional and Remote Staff are in place, or within four (4) to eight (8) hours – whichever comes first.
Data Replication Testing
Move: BY: ________________________________ Date: ___________
Verify Information: BY: ________________________________ Date: ___________
Add Broadcast Message: BY: ________________________________ Date: ___________
Verify Broadcast: BY: ________________________________ Date: ___________