System Backup and Recovery SOP

6 min read

Purpose

Establish a systematic approach to data backup and recovery to prevent loss, enable rapid restoration and maintain data integrity for all Neosofia IT systems.

Scope

This SOP applies to any IT system that manages Neosofia client or corporate data.

Assets in Scope

Each of the assets below will have an entry in this SOP that outlines the backup and recovery procedures Neosofia employs to protect client and corporate data.

Data/Support AssetRPORPRTOOC
HardwareN/AN/A2 hoursN/A
Operating Systems1 day7 days1 hour1 full + 6 incr.
Virtual Machines1 day28 days1 hour1 full + 27 incr.
Public DNS RecordsN/AN/A1 hourN/A
Source Code1 week25 years1 hourN/A
System LogsN/A30 daysN/AN/A
Credentials1 hour1 year1 hour1 full

Responsibilities

IT System Administrators will be responsible for

  • L4 Architecture, design, implementation, and execution of the procedures outlined in this document.
  • L3 System monitoring to determine if restoration procedures need to be executed on
  • L2 Documentation of the backup and restoration procedure execution as evidence for auditors
  • L1 Provide feedback on this document

IT Managers will be responsible for

  • L4 Review of this document no less than once per year
  • L4 Respond to and integrate feedback into this document
  • L3 Review of this document when new IT systems are procured or retired to determine the system backup and restoration procedures that may require an update
  • L4 Advise and mentor IT System Administrators in their responsibilities.

Procedures

Hardware Procedures

Neosofia will maintain a 2% hardware inventory reserve to recover from hardware losses or will define procedures below to enable cloud resources to be used as a temporary replacement for system restoration.

Operating System Procedures

Backup

When provisioning a new piece of hardware, the IT System Administrators runs the OS setup script that creates an OS level backup cron job to be run automatically starting at 2AM UTC every day. The automated backup script will:

  1. Create a full OS level snapshot and on-device (USB stick) rescue media needed to restore the system in the event of a hardware failure
  2. Reboot the device into the rescue media's automated restora tion program
  3. Upon system restoration and reboot, the restoration logs are sent to the central log service
  4. If the daily OS backup procedure completes without errors, a status report is automatically sent to the central log server. If any errors occur, an email is sent to all IT System Administrators with details of the error to be remediated.

Automated backup and system restoration should take no more than 15 minutes 99% of the time

Recovery

Upon notification of a system failure, the IT System Administrators will

  1. Identify and replace defective hardware
  2. Boot the machine from the restoration media (USB Stick)
  3. The restoration procedure should begin automatically. If the restoration procedures requests input due to hardware changes, contact a L3 IT system Administrator or higher for guidance on appropriate inputs.
  4. If successful, confirm the automated restoration logs were sent to the central log server. if the automated restoration process fails contact a L4+ SA to troubleshoot the error.
  5. Update the inventory management system and procure replacement hardware if the stock level falls below 2%.

VM Procedures

TBD

Networking Procedures

Backup

All network configurations are automatically backed up on a weekly basis by the networking equipment vendor. When a new piece of networking equipment is acquired, follow the procedures below to ensure the device is being backed up.

  1. Log into the networking management interface and navigate to the system backup setting
  2. Ensure the system backup checkbox is checked and click the back up now button
  3. Observe that no errors are reported upon backup and verify that the device is writing to the central log server

Restore

In the event of a networking equipment failure, follow the following steps

  1. Replace the failed device with the same model
  2. Log into the networking management interface and navigate to the settings panel of the new device
  3. from the existing device configuration menu, select the failed device profile and apply it to the new device.

Source Code

Backup

Whenever a pull request is merged into a protected branch an automated script pushes changes to a secondary SCS vendor.

Restoration

Should the primary SCS vendor be compromised in such a way that the source files can not be restored to their original state, the restoration procedures below should be initiated.

  1. Checkout the git repository from the secondary SCS vendor
  2. Create a new (blank) repository in the primary SCS vendor
  3. push the git repository from the secondary vendor into the newly created repo on the primary vendor
  4. Make a test pull request against the primary repository and merge
  5. Observe that the changes to the primary repo are synced to the secondary repo
  6. if the test succeeds, notify all members of the repository that it has been restored and that all changes should be submitted to the primary repository. If the test fails, notify your manager and troubleshoot failures with them until there is resolution.

Log File Procedures

All log files are pushed to an immutable central log server and retained for 30 days.

Credential Procedures

TBD