Book a demo

Sep 3, 2025

Database Services: A Deep Dive in AWS Resources & Best Practices to Adopt

Article Series

(Advanced) How to Not Break Your Cloud Infrastructure

0 min read

In a previous blog post in this series we went through Amazon S3 and learned about storing unstructured data in the form of blobs or binary large objects. When it comes to structured data the primary service on AWS is the Relational Database Service (RDS). But AWS offers additional types of database services, most notably their NoSQL offering known as DynamoDB.

In this blog post we will go through a few different database offerings that AWS provides. We will learn about relational databases, NoSQL databases, and graph databases, and how to manage a few of these resources using infrastructure as code with Terraform.

What types of database services does AWS offer?

There are many different database offerings on AWS. These services can broadly be categorized into the type of database technology they belong to: relational, NoSQL, graph, time series, in-memory, ledger, data warehousing, and search.

In the space of relational database technologies AWS offers the Relational Database Service (RDS). RDS has managed offerings for MySQL, Db2, MariaDB, Microsoft SQL Server, Oracle, and PostgreSQL. There is another service in this space known as Amazon Aurora which is a fully managed relational database engine compatible with both MySQL and PostgreSQL. Aurora is an AWS proprietary technology. Aurora also comes as a serverless version for workloads with sporadic usage patterns.

The number of options in the NoSQL space is less. Depending on how you define NoSQL there are a number of services you can include in this category. However, for this discussion we will consider the service offering in the NoSQL space to consist of DynamoDB, DocumentDB, and Keyspaces.

The main NoSQL service on AWS is DynamoDB. This is global scale database technology capable of handling a lot of traffic. The simplicit and scale of DynamoDB has made it one of the most popular NoSQL database offerings on the market.

DocumentDB is an older NoSQL offering that has in practice been replaced by DynamoDB. DocumentDB has MongoDB compatibility which makes it a good option if you are migrating a MongoDB application over to AWS.

Amazon Keyspaces is a fully-managed service compatible with Apache Cassandra. This is a great option if you run Cassandra on-premises or in another cloud today and want to migrate your workload over to AWS without changing the application code or tooling you use to interact with the Cassandra database.

A major difference between designing data structures in SQL versus NoSQL databases is that SQL databases allow for ad-hoc queries that you didn't necessarily think of when you designed your tables and the data within them. NoSQL databases often require you to think about what queries you will run on your data, and then you design your tables from these access patterns. With that said, it is often possible to add additional indices after a NoSQL database has been created to allow for new access patterns. At worst, you can also redesign your tables and migrate the data between them to support new access patterns.

Another type of database technology that is common in social networking contexts is a graph database. In this space AWS has the Neptune service.

A graph database contains two major objects: edges and vertices. A vertex could be a user, an application, a city, or something similar or completely different. Edges connect vertices with each other. For instance, if you have two user vertices in a social network they might be connected with an edge of type "is connected to". In these types of databases you run queries to discover these types of connections, for instance you could query for all connections of a given user by following all the "is connected to" edges out from the user in question.

For data warehousing there is a service called Amazon Redshift. This is a fully-managed data warehouse service capable of scaling up to petabytes of data. Data warehousing is not used as a primary database in an application serving live traffic. It is often used purely for running analytics jobs on data combined from different sources. Redshift offers a traditional computing model and a serverless model.

In-memory databases, or caches, on AWS include the ElastiCache service. This service is compatible with Valkey (a fork of Redis), Memcached, and Redis.

ElastiCache is often used in combination with a different database technology to allow for caching query responses to increase the performance of your applications. One pattern is to check the cache for a recent result of a given database query, and if it is there you can immediately return the response. If it is not there you would go and run the query in the database, return the response to the user and store the response in the cache. This is similar to how a content distribution network (CDN) operates with web content.

Last but not least there is a ledger database service offered by AWS: Amazon Quantum Ledger Database (QLDB). This is a fully managed service for immutable and cryptographically verifiable transaction logs.

Previously, AWS offered a time series database service known as Amazon Timestream. However, this service has been deprecated and is no longer available for new AWS customers.

If none of the database technologies discussed above suit your needs you can always use EC2 instances and install whatever database technology you need. Of course, this puts a lot of the management burden on you instead of AWS.

Managing databases on AWS using Terraform

All database technologies on AWS have their place and their use cases. However, the two most popular options include RDS and DynamoDB. This section will focus on how to configure these services using Terraform.

Managing Aurora RDS with Terraform

In this walkthrough we will use Amazon Aurora with PostgreSQL as an example. Even if this walkthrough focuses on Aurora, the same steps apply for the other types of RDS flavors available.

An RDS database lives inside one or more subnets of a VPC. This means a prerequisite for configuring an RDS database is that you set up a VPC. In an earlier part of this blog series we covered the details for how to configure VPCs on AWS, reference that blog post if you need a refresher on this topic. In the rest of this discussion we will assume you have a VPC with three subnets in three different availability zones.

To specify in which subnets your database will be accessible you must create a resource known as a database subnet group. We configure this resource like this:

resource "aws_db_subnet_group" "default" {
  name       = "anyshift"
  subnet_ids = [
    aws_subnet.a.id, # these subnets must be configured (not shown)
    aws_subnet.b.id,
    aws_subnet.c.id,
  ]
}

We could reuse the same DB subnet group for multiple RDS instances if required.

To create our RDS database we have two options to select between. Either we create a standalone database instance, or we create a cluster. The benefit with a cluster is that we can set up replication between databases, standby instances, and read replicas. For a production scenario you should definitely go for the cluster.

You can configure an RDS cluster for PostgreSQL like this:

resource "aws_rds_cluster" "anyshift" {
  cluster_identifier   = "aurora-anyshift"
  engine               = "aurora-postgresql"
  availability_zones   = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]
  # reference the db subnet group from above
  db_subnet_group_name = aws_db_subnet_group.default.name
  master_username            = "anyshift"
  master_password_wo         = var.master_password
  master_password_wo_version = "1"
  vpc_security_group_ids = [
    aws_security_group.db.id, # existing security group
  ]
  # set this to true to allow running terraform destroy later
  # in production you should set it to false
  skip_final_snapshot = true
}

Note how we use write-only attributes on the cluster resource (master_password_wo). This is a new feature requiring Terraform 1.11+. The benefit is that the password will not be stored in the Terraform state file. In this case the password is passed into the Terraform configuration in the master_password variable. This variable must be configured as an ephemeral value:

variable "master_password" {
  description = "Master password for the RDS cluster"
  type        = string
  ephemeral   = true
}

To use the master_password_wo write-only attribute we must also set the master_password_wo_version to allow Terraform to keep track of when an update to the write-only attribute is required.

Next we need to configure one or more instances in our cluster. A cluster always has a primary instance that handles both read and write requests from clients. You can also add read-replicas to the cluster to scale out handling read requests.

Adding an instance to the cluster:

resource "aws_rds_cluster_instance" "cluster_instances" {
  identifier         = "customers"
  cluster_identifier = aws_rds_cluster.anyshift.id
  instance_class     = "db.t3.medium"
  engine             = aws_rds_cluster.anyshift.engine
  engine_version     = aws_rds_cluster.anyshift.engine_version
}

There are many configuration options for the engine and engine version. In this example we have not added any explicit configuration for the engine, which means the default configuration for Aurora PostgreSQL will be used.

We can now provision the Aurora RDS cluster and connect to the database. You have to make sure you have network access to the RDS instances. By default, no public access is configured for new RDS instances. You can explicitly configure public access for dev and test purposes.

Managing DynamoDB with Terraform

Data in DynamoDB is stored in a table. Contrary to RDS, you do not need to first provision an instance where you later create one or more tables to store your data. In DynamoDB you create the table resource directly. Everything in the DynamoDB service is cloud-native in the sense that we configure it using APIs exposed through AWS.

An important attribute of a DynamoDB table is the primary key that is used to distinguish items in the table from each other. No two items can have the same primary key. This is similar to the primary key in relational databases but it plays a central role for queries to DynamoDB tables.

There are two different types of primary keys:

A partition key.
A partition key and a sort key.

In the first case the partition key and the primary key are one and the same. The value must be unique for each item. In the second case the partition key and sort key make up a composite primary key. In this case the combination of partition key and sort key must be unique.

Apart from selecting the primary key for your table there is no other schema to take into account. Each item could potentially contain different properties if you want.

Let's imagine we want to provision a DynamoDB table to store data about purchases that our customers do. We can use a composite primary key where the partition key is the customer ID, and the sort key is an order number. This would allow us to easily query all orders from a given customer.

We can configure this using Terraform:

resource "aws_dynamodb_table" "customer_orders" {
  name         = "CustomerOrders"
  billing_mode = "PAY_PER_REQUEST"
  hash_key     = "CustomerId"
  range_key    = "OrderNumber"
  attribute {
    name = "CustomerId"
    type = "S"
  }
  attribute {
    name = "OrderNumber"
    type = "S"
  }
}

A point of confusion here is that the nomenclature for the partition key and sort key are also known as hash key and range key, respectively.

DynamoDB supports two different billing modes which determine how the pricing for the service works. In the example above we specified PAY_PER_REQUEST which means we pay for what we use. If you have a consistent usage pattern for your DynamoDB table you can instead use the PROVISIONED billing mode to possibly save some on the cost. In that mode you select a number for the read and write capacity that you require.

Since the DynamoDB table is a native AWS database service you control access to your data using IAM policies. This is in contrast to the RDS service where you use IAM to control access to the database resource, but you use the native access management system for the database technology that runs on RDS (e.g. roles in your PostgreSQL database). This allows you to manage all aspects of DynamoDB using Terraform.

You can allow roles and other IAM principals to use a DynamoDB table by configuring IAM policies and attach them to these principals. But DynamoDB also supports resource policies, which are policies you attach to the DynamoDB table itself. An example of a DynamoDB resource policy is this:

data "aws_iam_policy_document" "app_role" {
  # …
}
resource "aws_dynamodb_resource_policy" "example" {
  resource_arn = aws_dynamodb_table.customer_orders.arn
  policy       = data.aws_iam_policy_document.app_role.json
}
If you require certain data to be available in your DynamoDB table from the moment it is created you can even use Terraform to provision this data. This is done using the aws_dynamodb_table_item resource:
resource "aws_dynamodb_table_item" "sample_order" {
  table_name = aws_dynamodb_table.customer_orders.name
  hash_key   = aws_dynamodb_table.customer_orders.hash_key
  range_key  = aws_dynamodb_table.customer_orders.range_key
  item = jsonencode({
    CustomerId = {
      S = "1000"
    },
    OrderNumber = {
      S = "1"
    },
    TotalCost = {
      N = "1234"
    }
  })
}

This resource could be useful if you want to prepare a demo with some sample data. If you need to create a large number of table items you can store them as JSON in a separate file and use for_each:

locals {
  items = jsondecode(file("${path.module}/data/dynamodb.json"))
}
resource "aws_dynamodb_table_item" "sample_orders" {
  count = length(local.items)
  table_name = aws_dynamodb_table.customer_orders.name
  hash_key   = aws_dynamodb_table.customer_orders.hash_key
  range_key  = aws_dynamodb_table.customer_orders.range_key
  item = jsonencode(local.items[count.index])
}

You can declaratively through Terraform create an export of your DynamoDB table. This is not to be used as your primary backup strategy for DynamoDB, but could be useful for automation purposes. You create an export using the aws_dynamodb_table_export resource type:

resource "aws_dynamodb_table_export" "backup" {
  table_arn = aws_dynamodb_table.customer_orders.arn
  s3_bucket = aws_s3_bucket.anyshift.id
}

You can only export DynamoDB table data to S3.

Best practices for database services on AWS

Much can be said about managing databases in general, and as resources on AWS in specific. The data that we store in our databases (along with all data stored elsewhere) is the most important asset we have in our applications and systems.

In the following sections we will look at a few best practices for database services on AWS from a resource management perspective. There are many best practices related to data management, query optimization, and more, but these are out of scope for this blog post.

Build redundant systems

If you use RDS, make sure to allow the RDS cluster to span multiple availability zones (AZ) for increased redundancy. Use read-replicas in other AZs than your main instance and perform a fail-over to a read-replica in case of an issue with your primary instance. This fail-over is also built-in to the RDS service but clever automation could speed this process up significantly.

For DynamoDB there is a feature to enable a global table to automatically replicate your data from your main region to other regions. This allows applications running in these regions to read data from a local DynamoDB table to increase performance.

Use built-in security features

As with other services on AWS there is built-in support for encryption at rest with KMS keys. If required you can specify your own key. All services support (or more like require) TLS connection for encryption in-transit.

If you need to connect to your RDS instance securely you can use the RDS query editor that allows you to run queries against your database instances without connecting to them with external tooling. This is built into the AWS console. Never expose your instance to unsecure networks like the internet!

For DynamoDB you have full support for AWS IAM roles and policies. You should create a role for your applications running on AWS to use when accessing and interacting with DynamoDB. Never give your apps more permissions than they require.

As we saw in an earlier blog post in this series, the RDS service supports automated password rotation using AWS Secrets Manager. Use this feature to rotate the master password for your databases to your requirements. The master user and master password should only be used during initial setup and for exceptional circumstances. Create other lesser privileged users in your databases for your applications to use.

Take DynamoDB table design seriously

Designing tables in DynamoDB is a challenge. You need to be aware of all the access patterns that your applications require. Basically, when you start working on a new application that will use DynamoDB you should create a list of all the queries that this application will make. Then you can go through a design exercise to make sure your DynamoDB table supports all these queries.

An important thing to note is the hot-partition problem. This is when a given partition in your DynamoDB table receives an unproportional amount of all traffic reaching your DynamoDB table. This usually means you should partition your data in some other way to get a more even distribution of load. You could end up with a hot partition if our customer orders an example if you have one customer who makes the majority of your orders. Hot partitions could negatively affect the performance of your DynamoDB table.

Take regular backups

The CTO of AWS, Verner Vogels, famously said: "Everything fails, all of the time". This is true for database services, no matter how great your architecture is.

To prepare for inevitable failures you must take regular backups of your database services. On AWS there is a managed service for backups known simply as AWS Backup. This service is compatible with many AWS database services, including DynamoDB, RDS, Aurora, DocumentDB, Neptune, Redshift, etc.

With AWS Backup you can set up a backup policy for your database services that dictates how often a backup is performed, for how long it is stored, and how many consecutive backups are stored at any point in time. Together with AWS Organization service-control policies you can make sure no backup is deleted by accident or by ill will.

Prepare and practice your disaster recovery plan

An often forgotten part of a backup plan is to practice restoring a backup to make sure it works as intended. Make it a habit to test your backup restore procedure in a dev environment regularly and improve the process if needed. Automation is key here, you do not want to have to rely on memory or skill during a database restore in production on a late weekend evening.

Note that to restore a database from a backup is often only a small part of a larger disaster recovery (DR) plan. You need to be aware of how your application and database fits into the larger ecosystem of applications and services that you run. If the data from the past hour in your database would disappear, how does this affect related systems and applications? In great microservice architectures you should have as little coupling between applications as possible. A DR plan for one of your applications should ideally not include coordination with any other application or team.

DR preparedness includes defining the recovery-point objective (RPO) and recovery-time objective (RTO). These are measures for how much data could be lost in case of an issue, and for how long your application will be unavailable.

Use an appropriate database technology for your use case

Every database technology has its use case. It is common to have a slight preference for a certain database technology, often due to historic reasons (e.g. you've always used relational databases for as long as you can remember).

When implementing a new application or system, you should analyze what type of data you will be storing and what the access patterns of this data looks like. If your application will store simple key/value data that is accessed a few times an hour, then a DynamoDB table with on-demand pricing would be a good fit. If you are building an application where the data is highly connected in different ways and you want to explore these relationships in a simple manner, then a graph database using Amazon Neptune might be a better option.

Selecting the right database technology will simplify the implementation, decrease your costs, and have a positive influence on your application performance.

Protect database resources in your Terraform code

Provisioning database resources using infrastructure-as-code comes with some caveats. What if you run terraform destroy on the infrastructure that includes your database? The short answer is that it would be gone along with all the data it contained.

In these cases it is first of all important to have a backup strategy (see the best practice about backups above). You would be able to restore the database to the latest backup.

However, ideally you do not want to end up accidentally destroying your database in the first place. To help achieve this there are a few things you can do.

Most AWS database resources support deletion protection. Enabling this means you will not be able to delete the service before you first disable the deletion protection. This is an extra safety mechanism which could help you. For a DynamoDB table you can enable deletion protection like this:

resource "aws_dynamodb_table" "customer_orders" {
  name           = "CustomerOrders"
  deletion_protection_enabled = true
  
  # … other arguments omitted 
}

There are similar features for RDS database clusters and instances, Neptune clusters, and more. For instance, for an RDS instance:

resource "aws_db_instance" "customer_orders" {
  identifier = "customer-orders"
  deletion_protection = true
  # … other arguments omitted
}

Deletion protection always defaults to false, so it is important to include this argument if you want to enable the feature.

Another option you can use is native to Terraform. In the lifecycle block of a resource, add the prevent_destroy argument. An example of this for a DynamoDB table:

resource "aws_dynamodb_table" "customer_orders" {
  name = "CustomerOrders"
  lifecycle {
    prevent_destroy = true
  }
  
  # … other arguments omitted 
}

If you perform a Terraform update that would try to destroy or replace the DynamoDB table it would cause an error.

Terraform and Anyshift for AWS database services

It was pointed out earlier that your data is your most important asset. Managing database resources at scale with Terraform can be challenging.

By connecting your infrastructure as code context (your code repositories, your AWS environments, and your Terraform state files) to Anyshift you can ask simple questions and get insightful answers.

As an example, consider your estate of DynamoDB tables. You can configure a DynamoDB table with deletion protection. To find all DynamoDB tables that are at risk of being accidentally deleted because they have not configured deletion protection, ask Annie:

This is just the tip of the iceberg of the types of insights you can gain from your full infrastructure as code context.

Visit the documentation for more on how Anyshift can help you understand your context better.

Conclusions

AWS has many options when it comes to database services to choose from. A key best practice is to select the database service that best matches the intended use case for your application.

In summary, AWS has the following offerings:

The relational database service (RDS) with Aurora for relational databases (e.g. MySQL or PostgreSQL).
NoSQL database technologies through DocumentDB, DynamoDB, and Keyspaces (Cassandra).
Graph databases in Amazon Neptune.
Ledger databases in the Amazon Quantum Ledger Database service.
In-memory databases with ElastiCache (Valkey, Redis, Memcached).
Data warehousing with Redshift.

You can manage any type of database on AWS using Terraform.

Care must be taken to not accidentally delete your database instances and your data by accident. There are features to help you with this in Terraform and on AWS. You should combine this with a robust backup strategy.

Articles by

Mattias Fjellström

Accelerate at Iver Sverige

Cloud Architect | Author | HashiCorp Ambassador | HashiCorp User Group Leader

Mattias is a cloud architect consultant working to help customers improve their cloud environments. He has extensive experience with both the AWS and Microsoft Azure platforms and holds professional-level certifications in both.

He is also a HashiCorp Ambassador and an author of a book covering the Terraform Authoring and Operations Professional certification.

Blog: https://mattias.engineer
Linkedin: https://www.linkedin.com/in/mattiasfjellstrom/
Bluesky: https://bsky.app/profile/mattias.engineer

See my articles

Find me on Linkedin