Worse than a flood, scarier than a fire: how to prepare your backups for a ransomware virus visit

For decades, backups have primarily protected us from physical equipment failure and accidental data corruption. A good backup system should survive a fire, a flood, and then quickly enable the business to continue normal operations. But another problem has emerged, which is much more likely than a flood and from which fireproof barriers and physical separation of sites in different cities do not save.

Ransomware viruses are a nightmare for almost every company. More and more often, attackers encrypt data, leading the business of large organizations to downtime, significant financial losses, and reputational damage. And as it often turns out, just having a backup does not protect the business from such threats if the backup itself is designed incorrectly or without considering modern dangers.

The purpose of this post is to talk about existing methods and technologies in the field of data storage systems and backup systems that can reduce the damage from ransomware viruses and minimize data loss during attacks. Remember: it is not enough to just make a backup - you need to make the right backup. So, welcome under the cut!

When talking about Ransomware, we primarily mean logical data corruption and the compromise of the domain administrator account.

Every company has four vectors that deserve attention: technology, processes, personnel, and compliance with key information security rules. In each direction, it is necessary to carry out a set of measures aimed at minimizing the consequences of attacks and quickly restoring data. We will talk about each vector separately.

Technology

Basic hygiene

Before diving into the details, we share truths that should not only be followed but also be known to everyone, like the multiplication table. Here is a list of methods that definitely work:

— Following an extended multi-level backup strategy (3-2-1-1), according to which to ensure reliable data storage, it is necessary to have:

  • three copies of data;

  • two copies on different types of storage (for example, two different disk arrays, an array and a tape, or an array and a cloud);

  • one copy outside the main site (for example, a "bunker" outside the main data center);

  • one copy on immutable storage, WORM media, or offline.

What is WORM?

WORM (write once, read many) — media that allows one-time recording and multiple reading.

— Storing backups (BK) not on productive disk arrays (devices that provide access to productive data). Backups should be stored on separate devices.

— Regular and correct creation of BK, without errors and according to the established schedule.

— Creating consistent backups not only at the file system level, but also at the DBMS and application level. It is necessary to copy not only the database, but also the entire IT system landscape.

— Implementing network segmentation. Since the main attack vector is usually through the Ethernet network, it is correct to allocate separate network segments (preferably not intersecting at the physical equipment level):

  • data transmission network;

  • management network;

  • data storage network;

  • backup network.

— Isolated storage of backups (air-gapped backups) in offline environments that are not permanently connected to the main network, as well as the use of removable media (tapes, external hard drives) stored in secure locations.

— Regular backup of the centralized deduplication database and BK catalog and/or management server with its database.

— Setting up built-in protection against ransomware.

Catalog protection

A very important point is to ensure the protection of backup software metadata: backup database or backup catalog. Having a backup copy of the backup software metadata will allow you to quickly restore the operability of backups and not lose information about which backup is on which media.

All advanced backup software (BS) are capable of creating an autonomous directory backup (DB) and deploying it from scratch on a new server, which, for example, can be in cold standby mode (prepared but turned off and disconnected from the network). In this case, it is necessary to set up regular (at least once a day) full DB in the backup storage. As a rule, during such a procedure, the BS saves service information: for example, on which tape to look for the required directory backup. Such data can be sent by mail and/or saved as text files. In addition, the DB can be duplicated on tapes and unloaded for offsite storage.

One of the effective options for protecting the directory can be the following scenario:

The DB is performed on a separate NFS share from the server in a separate (isolated) segment not connected to the directory service. On schedule, the NFS share is exported, connected to the master server, after saving the DB it is disconnected and deported. This scenario can be implemented using pre- and post-commands as part of any BS in conjunction with the cron scheduler.

Built-in ransomware protection mechanisms in BS

Almost every BS already has control over backup clients, for example, detecting abnormal changes in backup volumes, the ability to block system files from changes, and so on. We have reviewed the current and most common BS in terms of the protection options offered in them. Although many neglect this, we recommend enabling and using these components, as this provides additional levels of protection against attackers, and quite effective ones.

Cyberprotect Cyber Backup

  • The Active Protection module monitors the processes running on the protected server. When a third-party program process tries to encrypt files or mine cryptocurrency, Active Protection creates an alert and takes additional actions if specified in the settings.

    In addition, Active Protection prevents unauthorized changes to its own processes, registry entries, executable and configuration files, and backups located in local folders.

  • The Vulnerability Assessment module scans protected BCDR (Backup and Disaster Recovery) computers for vulnerabilities, checks if operating systems and installed applications are up to date, and verifies their correct operation. Vulnerability assessment scanning is supported only for Windows-based computers.

Features are available from version 16. By the way, Acronis CyberBackup has similar modules.

RuBackup (Astra Group)

  • Replication (export-import) of selected backups on a schedule to another independent RuBackup installation. Due to the complete independence of the second domain from the attacked infrastructure, the ransomware will not be able to damage the backups on it. The feature is available from version 2.1.

  • The digital signature of backups allows you to control the integrity and authenticity of the backups.

Commvault B&R
  • A separate Enable Ransomware Protection application from the Commvault store. It allows you to manage and monitor the operation of the File Anomaly Activity alert and Ransomware Protection options on all media agents within CommCell.

    The Ransomware Protection functionality ensures that any processes not related to Commvault are prohibited from modifying local backup files, including those on file shares (this requires detailed access rights configuration on the share).

    In addition, it becomes possible to receive reports on anomalous activity. In addition to analyzing typical file activity on clients, it is possible to place a honeypot or canary file on it. If it gets encrypted, Commcell will send a notification.

    Additionally, it is possible to set up an alert if the CRC client is unavailable.

  • The system automatically analyzes historical data on its own operation, deduplication database, events, backup job operations, and also looks for anomalies.

General protection recommendations are available on the documentation website in the Ransomware Protection section (relevant for version 11.20, for newer releases expert consultation is needed).

Veritas NetBackup
  • The AIR (Auto Image Replication) functionality allows you to replicate backup data to another NetBackup installation (domain). This is implemented using special SLP (Storage Lifecycle Policy) policies and an import command in the domain receiving the backups. Thanks to the complete independence of the second domain from the attacked infrastructure, the ransomware will not be able to damage the backups on it. This technology has been repeatedly designed and implemented for a number of our customers.

  • The use of specialized NetBackup Appliance also increases the security of the backup and recovery environment, taking into account the use of the specialized OST (OpenStorage Technology) data transfer protocol. It also provides the ability to store in an immutable format based on the WORM principle.

  • Built-in anomaly activity analytics using ML tools and historical data analysis on backup jobs and other Veritas Anomaly Detection metrics. Works starting from version 9.1.

Veeam Backup & Replication
  • Using Immutable storage, as well as settings on the server with Hardened Repository disks. This is a set of settings that enhance storage security on a Veeam server based on Linux, including restricting access to modify backup data.

  • Integration with the HPE StoreOnce deduplicator creates secure interaction with dual accounts and allows for immutable storage without additional complexities (e.g., setting the inability to change within seven days of placement).

  • Auxiliary tool — Veeam ONE monitoring system (a separate product, part of the Availability Suite, along with B&R). Without it, effective monitoring of Veeam B&R is impossible.

Vinchin Backup & Recovery

In Vinchin, the main tool is protection against external changes to the backup storage (works with built-in disks or block devices with external storage systems) connected to Vinchin servers. The option is called Storage Security.

Function

CyberBackup

RuBackup

Veritas NetBackup

Veeam

Commvault

Snapshots on the production array

Yes. There is integration with Huawei Dorado, planned with YADRO.

Yes. Planned with YADRO.

Yes

Yes

Yes

Immutable storage

Planned for 2025

Planned for 2025

Yes

Yes

Yes

Integration via S3 protocol (object lock)

Planned for 2025

Planned for 2025

Yes

Yes

Yes

Support for GOST encryption

Planned for 2025

Yes

No

No

No

Protection at the level of backup storage devices

Option 1. Storage systems with block data access

As one of the means of protection, you can consider the option of regularly creating snapshots on productive arrays and replicating snapshots to a dedicated storage system for backups.

A snapshot on a productive storage system is not a backup itself, but a convenient means for quickly taking a copy (important: it is necessary to ensure data consistency in the snapshot!) and quick recovery. However, a consistent snapshot replicated to a second storage system intended for storing backups will be considered a full backup. It is the backup software that is responsible for data consistency when creating snapshots, as well as for their rotation.

In this case, it is necessary to understand that there are risks of uncontrolled volume growth, which can affect performance (depending on the technology used - Copy-on-Write or Redirect-on-Write). This can significantly increase storage costs (for example, if it is a separate license). The main advantage is the ability to recover data in a fairly short time.

Industrial backup software (Veeam, Commvault) successfully integrates with productive disk arrays (for example, Huawei, Netapp) to create snapshots that can later be used for operational or long-term storage (as the first level of storage).

Snapshots can also be created without backup software, using only storage system tools, thanks to the Continuous Data Protection (CDP) technology of leading manufacturers' disk arrays (Huawei, Dell, NetApp, Pure). The technology allows you to create a large number of snapshots at a given time interval (up to minutes), ensuring their rotation. However, in this case, all snapshots will be inconsistent and their recovery will be similar to turning on the server after an unexpected power outage. Domestic manufacturers do not have CDP technology, but some players (Yadro Gen2, Aerodisk Engine, Baum) have the ability to create snapshots via the command line or using the API.

It is possible to use delayed asynchronous replication, lagging behind the production data, for example, by X hours/days. However, this feature is not available from all disk array manufacturers.

Option 2. Storage systems with file access to data

The use of external file systems with write protection for a specified period (Retention lock) can significantly complicate the lives of attackers, especially in terms of their deletion/corruption. Immutable storage technologies protect backup data from changes after they are created.

In this case, support is required from both the storage system (NetApp SnapLock, Huawei Hyperlock) and the backup software. Otherwise, errors may occur in the operation of the backup software, as the management components (master server, media servers) will not be able to work correctly with write-protected backup files. The simplest option is to use file systems in WORM mode only for performing full backups.

Function/Manufacturer

YADRO

Aerodisk

Baum

Huawei

NetApp / Lenovo

DellEMC

Snapshots

Yes

Yes

Yes

Yes

Yes

Yes

CDP

No. Only through API and CLI.

No. Only through API and CLI.

No

Yes

Yes

Yes

Immutable storage

N/A

N/A

N/A

Yes

Yes

Yes

Option 3. Specialized appliances (deduplicators)

In the context of protection against ransomware, deduplicators are mainly a special case of "storage systems with file access to data".

Variants of immutable storage solutions depend on the specific vendor, including in the form of separate hardware Purpose-Built Backup Appliance (PBBA) — NetBackup Flex Appliance, StoreOnce, DataDomain, Quantum DXi, Tatlin Backup (functionality in development).

The use of proprietary protocols for interaction between SRK software and PBBA (DDBoost, Catalyst) will significantly complicate the risk of data compromise of RK, as they will not be explicitly present in the system, since they use non-standard methods of interaction between SRK software and storage devices.

For example, in HPE StoreOnce, this is implemented through StoreOnce Catalyst Store and API. StoreOnce Catalyst Store does not use standard operating system commands and instructions to interact with the client or SRK software. Access is carried out through a set of API commands that are directly integrated into the media server of the backup application and includes the StoreOnce Catalyst client library (working as part of the plugin). When using StoreOnce Catalyst, the stored data is not accessible from the management server OS (or media server), but is visible only from the SRK software console or the deduplicator web interface.

The main difference of these PBBA: they do not use the same authentication methods or, more importantly, the same set of instructions as other file share technologies (CIFS, NFS, SMB) that use OS tools. Storage devices connected by PBBA means will not be accessible from the OS without using the appropriate APIs.

Function

YADRO Tatlin Backup

HPE StoreOnce

Netbackup Flex Appliance

DellEMC DataDomain

Proprietary protocol

Yes. Implementation of T-Boost over FC is planned.

Yes

Yes

Yes

Source deduplication

Yes. Implementation of T-Boost over FC is planned.

Yes

Yes

Yes

Global deduplication

Yes

Yes

Yes

Yes

Ability to connect to SRM software via file protocols

Yes

Yes

Yes

Yes

Immutable storage (Retention lock)

Planned for 2025

Yes

Yes

Yes

Option 4. Tape libraries with LTO cartridges

The most protected from ransomware attacks are tape media. Even in the event of a complete loss of the SRM master server, backups on tapes can be imported — as a rule, all SRM software can do this. However, this takes a lot of time, since each tape has to be re-read. The time spent depends on the volume, type, and speed of the tape. A little earlier, we recommended making a backup copy of the SRM software catalog — this will significantly reduce the recovery time of the backup if it is suddenly completely destroyed during an attack.

For the most critical systems, it is necessary to make duplicate backups on separate tape media, remove them from tape libraries, and store them outside the data center in a fireproof safe. In this case, even if an attacker gains access to the backup system and tries to delete data from the tapes in the library, it will be more difficult to reach the tapes in the offsite storage. However, this does not rule out the possibility of a targeted long-term attack using an insider (a company employee with access to the backup system and the ability to conduct covert sabotage).

The use of WORM tapes is advisable for backups with a fixed long-term retention period as required by the regulator (3-5-10 years of protection against intentional data modification by administrators), or in cases where access to the tape library is physically restricted. Otherwise, this is an expensive method, as WORM tapes are not rewritable. The ability to write once prevents accidental or intentional data deletion, for example, in the case of ransomware attacks or human error.

WORM cartridges are almost indistinguishable from RW cartridges of the same generation, except that the chip (Linear Tape-Open Cartridge Memory, LTO-CM) identifies it as WORM. There are also minor changes to the servo tracks needed to verify that the data on the tape has not been altered. The bottom part of the cartridge is usually gray, and it may be equipped with tamper-proof screws. Drives that support WORM mode automatically recognize WORM cartridges and include a unique identifier (WORM ID) in each data set written to the tape.

One of the alternative options is to protect filled tapes in the library from being written to. Ideally, at the end of each working day, the duty administrator of the CRC approaches the library, switches the write protection flag on the Full tapes, and clicks it off on those tapes for which the Retention has expired. Compliance with the regulations at an unpredictable time is checked by the IS officer or the service organization.

To simplify working with a large number of tapes, you can use cassettes with barcodes. Before starting work, you need to check if the barcodes are damaged and make sure that their reader is ready to work. If you have multiple libraries, the barcodes should not be duplicated.

Option 5. Storage with object access to data (S3), cloud storage

For storing RC, you can use S3 storage, available in several options:

  • Public cloud (Mail.ru, Yandex Cloud, and others);

  • On-premise solution (based on Open Source Ceph, Minio, or in the form of proprietary solutions Hitachi Content Platform, NetApp StorageGrid, Tatlin Object, and so on).

The CRC software should manage and support the main functions (methods) of the S3 protocol. In the standard S3 protocol, this is the Object Lock mechanism with Compliance mode, Governance mode, and Legal Holds. If the CRC software does not support integration with the S3 protocol, it is necessary to prepare and configure the bucket level in advance, where the backup data will be placed.

  1. Governance (least strict, managed lock). A user with object upload rights can set a lock. A user managing Object Storage can bypass the lock (delete or overwrite the object version), change the lock period, or remove it. These actions need to be explicitly confirmed: for example, when making a request via an Amazon S3-compatible REST API, using the X-Amz-Bypass-Governance-Retention: true header.

  2. Compliance (most strict, strict lock). A user with object upload rights can set a lock. A user managing Object Storage can only extend the lock. The lock cannot be bypassed, shortened, or removed before it expires.

  3. Legal hold (indefinite lock). A user with object upload rights can set and remove the lock. The lock cannot be bypassed.

It is important to remember that when using the Object Lock mechanism, the SRK software will also not be able to delete copies, including rotating them according to retention periods. Therefore, the issue of placing backup copies that are not subject to changes needs to be addressed at the design stage.

An on-premise solution will require additional investments in its design, implementation, operation, and support, and may not be a very reliable option, as it will be an additional point of failure in terms of potential access to the cluster nodes that implement the S3 protocol itself (remote access, OS, and FS).

From this point of view, the public cloud looks more reliable, as it provides access only to storage devices (buckets) using an access key/secret key pair. That is, it excludes access to the servers on which the S3 storage is organized, using standard tools and protocols. On the other hand, depending on the services provided by the providers, it is important to remember that this will incur additional costs. Additionally, the following may be charged:

  • organization of communication channels from the customer's infrastructure to the provider;

  • volume required for storing RC;

  • number of input-output operations;

  • volume of outgoing traffic.

Processes

All data backup and recovery processes must be described in the DRP documentation, agreed with management and information system owners. The processes in the documentation should be updated every six months regardless of the company's size. Among the approved documents, it is necessary to have:

  • A unified data backup and recovery regulation;

  • Recovery plans for each information system (including the DRP itself);

  • Backup system update regulation;

  • Regulation for working with removable copies.

To ensure efficiency, all work must be carried out through requests (RFC) indicating responsible persons.

To prepare the above-mentioned documents, it is necessary to audit all information systems, define the RTO and RPO requirements for each system, necessary to choose the data protection method (the DRP can protect against logical errors, but not always ensure performance during recovery for large volumes), and also determine the retention periods based on internal regulations or regulator recommendations. To choose the data backup scheme, the feasibility of implementing disaster recovery/clustering at the application level, a cost/criticality assessment of the unavailability of each information system is usually carried out (including using BIA).

Test Recoveries

To ensure that the DRP and DR work correctly, it will be useful to conduct regular recovery drills of IS from backups in an environment as isolated as possible from the production environment.

This helps to understand several important things:

  1. How long it will take to recover data in case of an emergency;

  2. Bottlenecks in the data recovery process itself (the next step is to correct them);

  3. How accurately the backup is made (for example, will the restored database be in an integral state);

  4. Whether all the data necessary for a specific IS is backed up and can be restored from the backup (checking the completeness of the backup from the point of view of the application IS);

  5. Is it possible to restore the data of the backup system itself (catalog) using the methods of its backup.

Monitoring

A unified monitoring and alerting system is an important tool that allows you to respond promptly to any incidents in the IT infrastructure, as well as proactively monitor the state of the backup system itself and its individual components.

Key employees (ideally responsible groups of people) should be promptly notified of unsuccessful backup attempts and any anomalies by all possible means (email, SMS, Telegram bots, duty engineer calls, etc.).

At a minimum, it is necessary to fully use the monitoring system built into the backup software and configure monitoring of its hardware components (media servers, storage devices).

What can cause suspicion?

  • Abnormally large volumes of incremental/differential copies;

  • Uncharacteristic load on files/LUN (for example, the number of sequential reads is increasing);

  • Manual deletion of backup files;

  • Unauthorized access of personnel to the backup system.

Personnel

An important component for countering any attacks is the training and awareness of personnel. And this is usually one of the most difficult tasks.

What can help?

  • Regular training of employees on cybersecurity issues. Another important topic is the recognition of phishing attacks and other methods of spreading malware.

  • Approval of the role model for access to both the CRM software and its component parts.

  • Appointment of responsible persons for data recovery and/or clearly defined boundaries (for example, in the case of critical and large databases, the participation of both the CRM administrator and the DBA is required in the recovery).

  • Regular test data recoveries from backups.

CRM Information Security

Access to resources

To limit attackers' access to important company resources, you can follow key rules:

  1. The most vulnerable to ransomware are disk storage devices accessible to the OS as file systems, i.e., created on internal server disks; disks connected from external disk arrays as block devices or as network disks connected via CIFS/NFS file protocols. It is preferable to use Fibre Channel rather than Ethernet as a storage area network (SAN) as it is less vulnerable to external intrusions.

  2. To avoid losing access to the entire infrastructure (e.g., in case of Active Directory directory service account compromise), it is recommended to use local accounts with strict password policies and regular password changes for management/RC/SAN segments.

  3. It is important to ensure physical security at the location of hardware components of the RC (e.g., through access control systems, video surveillance, etc.).

  4. Timely install patches on the OS in both the production infrastructure and the RC infrastructure. This is especially true for updates that close zero-day vulnerabilities.

  5. Regularly update the backup system software and apply all available security patches.

  6. Monitor security updates and respond promptly to vulnerabilities.

  7. Use firewalls to restrict network traffic between different segments. Backup traffic is not recommended to be routed through firewalls due to the impact on production and the limited performance of firewalls.

  8. Use the minimum necessary set of ports (preferably non-standard) to ensure the overall operation of the RC and the interaction of its individual components.

  9. Ensure backup of configuration files of network devices (FC, Ethernet).

For example, in large SAN networks built on the basis of Brocade and/or Cisco MDS FC switches, restoring zoning, which ensures the interaction of media servers and storage devices, can be very labor-intensive, which will increase recovery time. Since zoning is part of the configuration file, it is recommended to back it up after any changes, but at least once a month. The above manufacturers have built-in saving functionality. For Brocade, this is the configupload command with the ability to use scp, ftp, sftp protocols. For Cisco MDS, this is the copy running-config startup-config command with the ability to use FTP, TFTP, SFTP, and SCP protocols.

  1. Implement access restriction to the administrative console of the SRK software and its individual components only from a dedicated terminal server with a hardware USB token.

  2. Ensure access control and authentication by restricting access rights to the SRK and using multi-factor authentication (MFA).

Conclusions

The SRKiVD circuit is the last line of defense in the company, which should ensure recovery in case of damage to the main productive data. The most guaranteed way to recover in case of a ransomware virus intrusion is to follow the "3-2-1-1" rule.

Some of the protective measures we talked about are designed for more operational and painless recovery in case of intentional data corruption. When choosing specific methods, it is necessary to understand both the budget for their implementation and how well the IT staff is staffed.

In this post, we tried to describe the full range of issues and solutions that need to be considered when designing a CRC that truly protects data from encryption and allows recovery in case of hacking. The more measures you apply, the more protected your backup will be. Some of the proposed methods do require significant investments, such as switching to tape libraries, separate storage systems, isolated CRC circuits, if you have not previously used such solutions. But some measures are aimed at the proper configuration of the CRC already in use and the establishment of backup processes, as well as, in general, ensuring information security within your perimeter.

Choose what suits you, keep your feet warm, your head cool, and your backups safe!

The post was prepared by the team of storage and CRC of the infrastructure solutions center "Infosystems Jet"

With the participation of: Andrey Yankin, Director of the Information Security Center "Infosystems Jet"

Comments

    Relevant news on the topic "Security"

    Also read