Setting the Right Recovery Point Objective: An Art of balancing Costs and Risks
Payal Wadhwa
Oct 30, 2024Today, CISOs and founders understand that an employee’s accidentally deleted file, a power outage, or a disaster leading to data loss is no longer a ‘technical challenge’—a ‘business problem’ that impacts revenue, compromises compliance posture, and erodes trust. As a result, integrating disaster recovery plans into a cohesive resilience strategy is paramount — a critical metric in this strategy is the Recovery Point Objective or RPO.
Recovery Point objective (RPO) answers the question: ‘How much data can we afford to lose?’ by setting a threshold for data recovery guided by recovery costs, required system performance, and the organization’s risk tolerance.
However, determining the RPO right is far from straightforward. Setting it too high can be costly while setting it low can be disastrous.
In this blog, we demystify RPO, providing calculations and examples to help you set the right metric and integrate it into your disaster recovery and business continuity plan.
TL,DR:
RPOs help determine how much data an organization can lose without impacting operations. Sometimes this data can be near-zero due to the criticality of operations such as a payment processing business. |
RPOs help dictate the backup and recovery strategies for an organization. For example, zero RPO requires continuous replication of data while non-critical data such as HR records can be backed up anywhere from 13-24 hours and require less frequent backups. |
RPO is different from RTO that answers the question of ‘how fast can organizations recover after an incident’ |
What is the Recovery Point Objective?
The Recovery Point Objective (RPO) is the maximum acceptable data loss, measured in time, that an organization can afford during an unexpected event such as a system failure or natural disaster. The concept is relevant to disaster recovery and business continuity planning and helps make decisions about the frequency of data backups to minimize data loss.
For example, if the RPO is set for 4 hours, you can lose 4 hours of data within the acceptable range and the backup systems must ensure that no more data should be lost in case of an incident.
TL;DR
A compliance audit checklist ensures that all necessary documentation, processes, and policies are readily available and organized, reducing the time spent during the audit process. |
By outlining specific responsibilities for each audit area, the checklist fosters accountability across departments, ensuring that everyone knows their role in maintaining compliance. |
A checklist helps you spot gaps and anomalies before they snowball into more significant problems. |
How does RPO work?
The RPO works with RTO (Recovery Time Objective) to determine the maximum amount of data that can be lost without impacting business operations. RTO defines how fast an organization can recover data after a data loss incident. These calculations then help decide the frequency of data backups.
The threshold for more critical systems is lower than the less critical ones and they require continuous data replication or real-time backups. It indicates that the availability of these systems is immediately required and the recovery time should be near-zero. Less critical data may be backed up anywhere from 13-24 hours.
Overall, RPO helps define the data backup strategies for different systems, prepares the organization for incidents, and ensures minimum data loss.
How do you calculate RPO?
RPO is calculated as the time difference between the most recent backup and the occurrence of a security event. For less critical systems, it is expressed in days, minutes, or hours, whereas for critical systems, it can be measured in seconds.
However, the calculations depend on the tolerance for data loss, downtime impact, and recovery capabilities. Several other factors impact RPO, including but not limited to:
- Criticality of business operations: For example, in an e-commerce business, every transaction counts, and typically requires more frequent backups.
- Dynamic vs static data: Data that is frequently updated will require a shorter RPO
- Cost of downtime: If the downtime results in significant revenue loss, such as in the case of customer records, the acceptable time window will need to be lower.
- Compliance requirements: Critical data, such as cardholder information, has specific protection requirements that affect its RPO.
- Recovery capabilities: The speed at which the organization can back data directly impacts RPO.
- Cost of recovery solutions: More frequent backups can be cost-intensive and the RPO can vary based on available resources.
Ask yourself the following questions to help with the calculations:
- How much data is projected to be lost after an incident?
- How much data can we lose without financial or reputation repercussions?
- What would the business impact be of losing different amounts of data?
- What is the current frequency of backups?
- How long does recovery take after an outage or service disruption?
You will also need to perform some downtime calculations and analysis to set the right RPO.
Here’s an example of downtime calculation:
Average salary = $82000
Work hours in a year = 2080
Number of employees impacted = 80
Average hourly rate = $39.4 (salary/work hours)
The cost of downtime for 1 hour will be $3152 (number of employees x average hourly rate)
Similarly, for 4 hours it will be $12608.
Once you are done with the analysis, there will be different tiers for RPO based on the loss tolerance, data criticality levels, and other factors:
- 0-1 hour: For critical data that the organization cannot afford to lose because of revenue reasons, difficulty in recreating records, costs involved, or other factors. For example, online banking transactions.
- 1-4 hours: This time frame is suitable for semi-critical data such as customer support tickets or any team collaboration files with less urgent data.
- 4-12 hours: It covers less critical data, such as social media engagement metrics or employee performance reviews that do not require real-time availability
- 13-24 hours: For data that is not important for immediate operations, such as historical sales data, purchase orders, meeting notes from previous sessions, etc.
Ensure data protection and compliance with Sprinto
Examples of RPOs
Let’s understand RPO for different systems based on real-life examples.
Example 1: A bank or a payment processor subject to PCI DSS. The bank will set the following RPOs:
- Near-zero RPO for payment card transaction systems with debit/credit transactions and settlement data.
- One-minute RPO for cardholder storage and encryption systems with CVV codes, expiration systems
- One-hour RPO for customer-facing service systems that have account balances
- Four-hour RPO for CRM that manages customer interactions and contains communication history
- 12–24-hour RPO for historical transactions stored in archival systems