Transcription of Disaster Recovery of Workloads on AWS: Recovery in the ...
1 Disaster Recovery of Workloadson AWS: Recovery in the CloudAWS Well-Architected FrameworkDisaster Recovery of Workloads on AWS: Recoveryin the Cloud AWS Well-Architected FrameworkDisaster Recovery of Workloads on AWS: Recovery in the Cloud: AWSWell-Architected FrameworkCopyright Amazon Web Services, Inc. and/or its affiliates. All rights 's trademarks and trade dress may not be used in connection with any product or service that is notAmazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages ordiscredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who mayor may not be affiliated with, connected to, or sponsored by Recovery of Workloads on AWS: Recoveryin the Cloud AWS Well-Architected FrameworkTable of ContentsAbstract .. 1 Introduction .. 2 Disaster Recovery and availability.
2 2 Are you Well-Architected? .. 3 Shared Responsibility Model for Resiliency .. 4 AWS responsibility Resiliency of the Cloud .. 4 Customer responsibility Resiliency in the Cloud .. 4 What is a Disaster ?.. 6 High availability is not Disaster Recovery .. 7 Business Continuity Plan (BCP) .. 8 Business impact analysis and risk assessment .. 8 Recovery objectives (RTO and RPO) .. 8 Disaster Recovery is different in the cloud .. 11 Single AWS Region .. 11 Multiple AWS Regions .. 12 Disaster Recovery options in the cloud .. 13 Backup and restore .. 13 AWS services .. 14 Pilot light.. 16 AWS services .. 17 AWS Elastic Disaster Recovery .. 19 Warm standby .. 19 AWS services .. 20 Multi-site active/active .. 21 AWS services .. 21 Detection .. 23 Testing Disaster Recovery .. 24 Conclusion.
3 25 Contributors.. 26 Further reading .. 27 Document history .. 28 Notices .. 29 AWS glossary .. 30iiiDisaster Recovery of Workloads on AWS: Recoveryin the Cloud AWS Well-Architected FrameworkDisaster Recovery of Workloads onAWS: Recovery in the CloudPublication date: February 12, 2021 (Document history (p. 28)) Disaster Recovery is the process of preparing for and recovering from a Disaster . An event that prevents aworkload or system from fulfilling its business objectives in its primary deployed location is considereda Disaster . This paper outlines the best practices for planning and testing Disaster Recovery for anyworkload deployed to AWS, and offers different approaches to mitigate risks and meet the RecoveryTime Objective (RTO) and Recovery Point Objective (RPO) for that Recovery of Workloads on AWS: Recoveryin the Cloud AWS Well-Architected FrameworkDisaster Recovery and availabilityIntroductionYour workload must perform its intended function correctly and consistently.
4 To achieve this, you mustarchitect for resiliency. Resiliency is the ability of a workload to recover from infrastructure, service,or application disruptions, dynamically acquire computing resources to meet demand, and mitigatedisruptions, such as misconfigurations or transient network Recovery (DR) is an important part of your resiliency strategy and concerns how your workloadresponds when a Disaster strikes (a Disaster (p. 6) is an event that causes a serious negative impacton your business). This response must be based on your organization's business objectives whichspecify your workload's strategy for avoiding loss of data , known as the Recovery Point Objective(RPO) (p. 8), and reducing downtime where your workload is not available for use, known as theRecovery Time Objective (RTO) (p. 8). You must therefore implement resilience in the design of yourworkloads in the cloud to meet your Recovery objectives (RPO and RTO (p.))
5 8)) for a given one-timedisaster event. This approach helps your organization to maintain business continuity as part of BusinessContinuity Planning (BCP) (p. 8).This paper focuses on how to plan for, design, and implement architectures on AWS that meet thedisaster Recovery objectives for your business. The information shared here is intended for those intechnology roles, such as chief technology officers (CTOs), architects, developers, operations teammembers, and those tasked with assessing and mitigating Recovery and availabilityDisaster Recovery can be compared to availability, which is another important component of yourresiliency strategy. Whereas Disaster Recovery measures objectives for one-time events, availabilityobjectives measure mean values over a period of Recovery of Workloads on AWS: Recoveryin the Cloud AWS Well-Architected FrameworkAre you Well-Architected?
6 Figure 1 - Resiliency ObjectivesAvailability is calculated using Mean Time Between Failures (MTBF) and Mean Time to Recover (MTTR):This approach is often referred to as nines , where a availability target is referred to as threenines .For your workload, it may be easier to count successful and failed requests instead of using a time-basedapproach. In this case, the following calculation can be used: Disaster Recovery focuses on Disaster events, whereas availability focuses on more common disruptionsof smaller scale such as component failures, network issues, software bugs, and load spikes. Theobjective of Disaster Recovery is business continuity, whereas availability concerns maximizing the timethat a workload is available to perform its intended business functionality. Both should be part of yourresiliency you Well-Architected?The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you makewhen building systems in the cloud.
7 The six pillars of the Framework allow you to learn architectural bestpractices for designing and operating reliable, secure, efficient, cost-effective, and sustainable the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you canreview your Workloads against these best practices by answering a set of questions for each concepts covered in this whitepaper expand on the best practices contained in the ReliabilityPillar whitepaper, specifically question REL 13, How do you plan for Disaster Recovery (DR)? . Afterimplementing the practices in this whitepaper, be sure to review (or re-review) your workload using theAWS Well-Architected Recovery of Workloads on AWS: Recoveryin the Cloud AWS Well-Architected FrameworkAWS responsibility Resiliency of the Cloud Shared Responsibility Model forResiliencyResiliency is a shared responsibility between AWS and you, the customer.
8 It is important that youunderstand how Disaster Recovery and availability, as part of resiliency, operate under this shared responsibility Resiliency of the Cloud AWS is responsible for resiliency of the infrastructure that runs all of the services offered in the AWSC loud. This infrastructure comprises the hardware, software, networking, and facilities that run AWSC loud services. AWS uses commercially reasonable efforts to make these AWS Cloud services available,ensuring service availability meets or exceeds AWS Service Level Agreements (SLAs).The AWS Global Cloud Infrastructure is designed to enable customers to build highly resilient workloadarchitectures. Each AWS Region is fully isolated and consists of multiple Availability Zones, whichare physically isolated partitions of infrastructure. Availability Zones isolate faults that could impactworkload resilience, preventing them from impacting other zones in the Region.
9 But at the same time,all zones in an AWS Region are interconnected with high-bandwidth, low-latency networking, over fullyredundant, dedicated metro fiber providing high-throughput, low-latency networking between traffic between zones is encrypted. The network performance is sufficient to accomplish synchronousreplication between zones. When an application is partitioned across AZs, companies are better isolatedand protected from issues such as power outages, lightning strikes, tornadoes, hurricanes, and responsibility Resiliency in the Cloud Your responsibility will be determined by the AWS Cloud services that you select. This determines theamount of configuration work you must perform as part of your resiliency responsibilities. For example, aservice such as Amazon Elastic Compute Cloud (Amazon EC2) requires the customer to perform all of thenecessary resiliency configuration and management tasks.
10 Customers that deploy Amazon EC2 instancesare responsible for deploying EC2 instances across multiple locations (such as AWS Availability Zones),implementing self-healing using services like AWS Auto Scaling , as well as using resilient workloadarchitecture best practices for applications installed on the instances. For managed services, such asAmazon S3 and Amazon DynamoDB, AWS operates the infrastructure layer, the operating system,and platforms, and customers access the endpoints to store and retrieve data . You are responsible formanaging resiliency of your data including backup, versioning, and replication your workload across multiple Availability Zones in an AWS Region is part of a high availabilitystrategy designed to protect Workloads by isolating issues to one Availability Zone, and uses theredundancy of the other Availability Zones to continue serving requests. A Multi-AZ architecture is alsopart of a DR strategy designed to make Workloads better isolated and protected from issues such aspower outages, lightning strikes, tornadoes, earthquakes, and more.