Terraform is an open-source infrastructure as code (IaC) tool used for building, changing, and managing infrastructure in a safe and efficient manner. It allows us to define and provision infrastructure resources, such as virtual machines, networks, storage, and more, using a declarative configuration language.
One of the main benefits of this tool is its State Management capability. Terraform maintains a state file that records the current state of your infrastructure and it helps Terraform understand the differences between the desired and actual states and allows it to make the changes without recreating all resources from scratch.
In this article we describe how to use Terraform to replicate the state of a production environment resources to a disaster recovery site. We combine it with a backup strategy to cover the need to restore stateful components. The diagram below shows the design for multi-region backup and restore DR strategy.
This approach relies on the codification of infrastructure and application configuration. The technical solution implemented here uses Terraform for the infrastructure codification and AMIs generated by Packer and Ansible for the application configuration. By following this technical solution, we are supporting the concept of immutable infrastructure for the stateless components.
Terraform is then responsible for managing the infrastructure by deploying the desired state to the primary region and maintaining the latest state by persisting it in an S3 bucket.
Regular database backups are also persisted in an S3 bucket, so them can be available for other regions. We usually consider this to deal with the replication of stateful components for a "backup and restore" recovery strategy.
If the primary region becomes unavailable, we can then use Terraform to deploy the infrastructure to the DR region based on the latest state and database data can be recovered from S3 buckets.