System Backup and Recovery SOP

6 min read

Purpose

Establish a systematic approach to data backup and recovery to prevent loss, enable rapid restoration and maintain data integrity for all Neosofia IT systems.

Scope

This SOP applies to any IT system that manages Neosofia client or corporate data.

Assets in Scope

Each of the assets below will have an entry in this SOP that outlines the backup and recovery procedures Neosofia employs to protect client and corporate data.

Data/Support Asset	RPO	RP	RTO	OC
Hardware	N/A	N/A	2 hours	N/A
Operating Systems	1 day	7 days	1 hour	1 full + 6 incr.
Virtual Machines	1 day	28 days	1 hour	1 full + 27 incr.
Public DNS Records	N/A	N/A	1 hour	N/A
Source Code	1 week	25 years	1 hour	N/A
System Logs	N/A	30 days	N/A	N/A
Credentials	1 hour	1 year	1 hour	1 full

Responsibilities

IT System Administrators will be responsible for

L4 Architecture, design, implementation, and execution of the procedures outlined in this document.
L3 System monitoring to determine if restoration procedures need to be executed on
L2 Documentation of the backup and restoration procedure execution as evidence for auditors
L1 Provide feedback on this document

IT Managers will be responsible for

L4 Review of this document no less than once per year
L4 Respond to and integrate feedback into this document
L3 Review of this document when new IT systems are procured or retired to determine the system backup and restoration procedures that may require an update
L4 Advise and mentor IT System Administrators in their responsibilities.

Procedures

Hardware Procedures

Neosofia will maintain a 2% hardware inventory reserve to recover from hardware losses or will define procedures below to enable cloud resources to be used as a temporary replacement for system restoration.

Operating System Procedures

Backup

When provisioning a new piece of hardware, the IT System Administrators runs the OS setup script that creates an OS level backup cron job to be run automatically starting at 2AM UTC every day. The automated backup script will:

Create a full OS level snapshot and on-device (USB stick) rescue media needed to restore the system in the event of a hardware failure
Reboot the device into the rescue media's automated restora tion program
Upon system restoration and reboot, the restoration logs are sent to the central log service
If the daily OS backup procedure completes without errors, a status report is automatically sent to the central log server. If any errors occur, an email is sent to all IT System Administrators with details of the error to be remediated.

Automated backup and system restoration should take no more than 15 minutes 99% of the time

Recovery

Upon notification of a system failure, the IT System Administrators will

Identify and replace defective hardware
Boot the machine from the restoration media (USB Stick)
The restoration procedure should begin automatically. If the restoration procedures requests input due to hardware changes, contact a L3 IT system Administrator or higher for guidance on appropriate inputs.
If successful, confirm the automated restoration logs were sent to the central log server. if the automated restoration process fails contact a L4+ SA to troubleshoot the error.
Update the inventory management system and procure replacement hardware if the stock level falls below 2%.

VM Procedures

TBD

Networking Procedures

Backup

All network configurations are automatically backed up on a weekly basis by the networking equipment vendor. When a new piece of networking equipment is acquired, follow the procedures below to ensure the device is being backed up.

Log into the networking management interface and navigate to the system backup setting
Ensure the system backup checkbox is checked and click the back up now button
Observe that no errors are reported upon backup and verify that the device is writing to the central log server

Restore

In the event of a networking equipment failure, follow the following steps

Replace the failed device with the same model
Log into the networking management interface and navigate to the settings panel of the new device
from the existing device configuration menu, select the failed device profile and apply it to the new device.

Source Code

Backup

Whenever a pull request is merged into a protected branch an automated script pushes changes to a secondary SCS vendor.

Restoration

Should the primary SCS vendor be compromised in such a way that the source files can not be restored to their original state, the restoration procedures below should be initiated.

Checkout the git repository from the secondary SCS vendor
Create a new (blank) repository in the primary SCS vendor
push the git repository from the secondary vendor into the newly created repo on the primary vendor
Make a test pull request against the primary repository and merge
Observe that the changes to the primary repo are synced to the secondary repo
if the test succeeds, notify all members of the repository that it has been restored and that all changes should be submitted to the primary repository. If the test fails, notify your manager and troubleshoot failures with them until there is resolution.

Log File Procedures

All log files are pushed to an immutable central log server and retained for 30 days.

Credential Procedures

TBD