Book a demo

Jun 6, 2025

S3 Buckets: A Deep Dive in AWS Resources & Best Practices to Adopt

Article Series

(Expert) A Deep Dive in AWS Resources & Best Practices to Adopt

0 min read

Everything you do in the cloud is built upon a few basic building blocks: networking, compute, and storage. Storage in the cloud comes in many forms. You can classify storage in the cloud into three types:

Block storage
File storage
Blob storage

In this blog post we will discuss the third type of storage: blob storage.

The market leader in massive scale blob storage is Amazon Simple Storage Service, or S3. S3 is a blob storage solution that excels at storing images, videos, log files, binaries etc. Basically, you can store any type of unstructured data in S3.

In the following sections we will learn about what Amazon S3 is, how you can configure S3 using Terraform, and best practices around managing and using S3.

What is Amazon S3?

Amazon S3 is a blob storage service.

A blob is a binary large object. Essentially, a blob can be almost anything. Most common types of blob data on S3 are text files, images, videos, or binary files (e.g. executable files built from source code).

S3 is a storage service for unstructured data. Structured data on the other hand, is data that follows a specific schema. A typical example of structured data is the data you would store in a relational database. Unstructured data has no schema.

The Amazon S3 storage unit is a bucket. This is an apt name for a storage service where you can dump almost any type of data.

The name of an S3 bucket must be globally unique. This is a restriction you will need to keep in mind when you configure Amazon S3 buckets using Terraform.

Inside of the bucket you store blobs in a flat hierarchy. This is different from how you store files in a typical file system, where you have directories forming a hierarchy of different levels of storage. However, S3 supports the idea of virtual directories. With virtual directories the blobs in your bucket appear to be stored in a normal hierarchy with directories. You create these virtual directories by including / characters in your blobs' names. A simple example:

state/environment/prod/networking/terraform.tfstate

This is a Terraform state file blob stored in the virtual directory hierarchy state/environment/prod/networking/.

In most practical situations you do not need to care about blobs being stored in a flat hierarchy. However, for data processing applications it can make a big difference to understand and utilize this to your advantage. Using specific naming conventions you are able to partition the data in your S3 bucket, making it much more efficient to read at scale. We will not go into any further depth in this topic.

S3 buckets are often used as a destination for log files. Many other services on AWS support S3 as a destination for logs. For example, you can store your CloudTrail audit logs on S3. A benefit of using S3 for storing logs is that it is cheap even if you have a lot of log data, as long as you utilize different storage classes (see the discussion later in this section).

Another common use case for an S3 bucket is to host a website. You can do this with and without an Amazon CloudFront CDN before the S3 bucket. Hosting a website in an S3 bucket is a cheap hosting solution for simple websites.

You can share access to your S3 buckets across accounts, and even to people outside of an AWS environment. There is even the possibility of making an S3 bucket public. However, be careful of making your bucket public. Many security incidents have started with the wrong people accessing a publicly available S3 bucket. AWS has lately made it much more difficult to accidentally make an S3 bucket publicly accessible.

S3 is not just a service on AWS, it's also an API standard for working with blob storage. Many other blob services offer an S3 API. This is a powerful feature, allowing you to use the same clients to work with many different types of S3-compatible services. One example that supports an S3 API is the MinIO storage service, and another example is the Google Cloud Storage service.

You can store data on S3 using different storage classes to reduce your costs. Data that you actively work with and need immediate access to should be stored in the standard tier. There is a cheaper storage class called infrequent access for data that you occasionally access, but still need immediate access to. There is also the extremely cheap glacier class for archive data that you need to keep (perhaps for compliance reasons) but do not need immediate access to. There are variants of these classes too. If you are unsure what class to pick you can use the intelligent-tiering feature to have the class automatically set based on your usage patterns.

This has been a 1000-foot S3 overview. There are a lot of nuances in this service that makes it attractive for a number of use cases that were not discussed here.

Managing Amazon S3 using Terraform

There are more than a few resource types under the Amazon S3 domain in the Terraform provider for AWS.

The base resource is the S3 bucket. Remember how S3 buckets must have globally unique names, this necessitates a naming strategy that has some complexity to it. Using a simple name like "test" will not work.

An example of a bucket resource:

resource "aws_s3_bucket" "anyshift" {
  bucket = "anyshift-bucket"
}

This bucket is configured with an exact name (i.e. anyshift-bucket) using the bucket argument. To avoid name conflicts you can instead us the bucket_prefix argument:

resource "aws_s3_bucket" "anyshift" {
  bucket_prefix = "anyshift-bucket"
}

This will create a bucket with a name starting with the specified prefix, with an added suffix to make the name unique.

Another common approach for naming buckets is to use the random provider. The random provider allows you to have more control over the suffix that is added to the name of the bucket compared to using the bucket_prefix argument.

An example of using the random provider to create an S3 bucket:

resource "random_string" "suffix" {
  length  = 10
  special = false
  upper   = false
}
resource "aws_s3_bucket" "anyshift" {
  bucket_prefix = "anyshift-bucket-${random_string.suffix.result}"
}

Using the bucket_prefix argument or using the random provider allows you to apply and destroy this Terraform configuration multiple times without name conflicts.

Speaking of destroying an S3 bucket: AWS requires that the bucket is empty before you can delete the bucket. This is a sane default setting to avoid accidental deletion of a lot of data.

To force the bucket to be emptied when you run terraform destroy, add the force_delete argument in the bucket configuration:

resource "aws_s3_bucket" "anyshift" {
  bucket_prefix = "anyshift-bucket"
  force_destroy = true
}

Avoid setting this flag to true for production. Once the bucket and its data is destroyed it will not be recoverable. For development environments this flag can come in handy.

In earlier versions of the Terraform provider for AWS there were a lot more configuration options for the bucket resource. In recent versions of the provider these have been extracted to their own resource types. This means that to fully configure an S3 bucket you will need multiple resources in your Terraform configuration. In the following text we will see the most common and important configuration resources for the S3 bucket.

S3 buckets are configured to not be publicly available by default. However, you can always explicitly configure the bucket to block public access or even configure it to allow public access (this is not recommended, see the best practices section later in this blog post).

You configure this using the aws_s3_bucket_public_access_block resource type:

resource "aws_s3_bucket_public_access_block" "default" {
  bucket = aws_s3_bucket.anyshift.id
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

Even if this is the default configuration it is a good practice to include it in your Terraform code to make it explicit.

For some types of blobs that are updated frequently it can make sense to store each version. This means that each time you write to the blob it creates a new version of it. This is very useful for Terraform state files.

To enable blob versioning for your bucket, use the aws_s3_bucket_versioning resource type:

resource "aws_s3_bucket_versioning" "default" {
  bucket = aws_s3_bucket.anyshift.id
  
  versioning_configuration {
    status = "Enabled"
  }
}

By default, blobs in your bucket are encrypted server-side by an AWS managed KMS key. If you want to use your own KMS key you can configure this using the aws_s3_bucket_server_side_encryption_configuration resource type:

resource "aws_kms_key" "anyshift" {
  deletion_window_in_days = 10
}
resource "aws_s3_bucket_server_side_encryption_configuration" "anyshift" {
  bucket = aws_s3_bucket.anyshift.id
  rule {
    apply_server_side_encryption_by_default {
      kms_master_key_id = aws_kms_key.anyshift.arn
      sse_algorithm     = "aws:kms"
    }
  }
}

Later in this blog post I will cover best practices for S3, and one of them is about replication to a different AWS region. To set this up you must have two buckets and a lot of Terraform code due to the requirement to set up an IAM role for the replication and give the role the correct permissions.

The full details of how to configure this is left as an exercise for the reader. The important piece of the replication is configured using the aws_s3_bucket_replication_configuration resource type:

resource "aws_s3_bucket_replication_configuration" "replica" {
  # this role must be configured to allow replication
  # see AWS documentation for details
  role   = aws_iam_role.replication.arn
  
  bucket = aws_s3_bucket.anyshift.id
  rule {
    id = "replicate-terraform-state-files"
    filter {
      prefix = "state/"
    }
    status = "Enabled"
    destination {
      bucket        = aws_s3_bucket.destination.arn
      storage_class = "STANDARD"
    }
  }
}

You can choose to replicate all data or just data under certain prefixes.

Modern AWS usage involves more than just a single AWS account. It is common for organizations using hundreds or even thousands of AWS accounts.

For these environments you might need to use resource policies for your buckets to allow principals in other accounts to use your buckets.

To configure this you create a resource policy document (e.g. using the aws_iam_policy_document data source) and an aws_s3_bucket_policy resource:

data "aws_iam_policy_document" "second_account" {
  statement {
    principals {
      type        = "AWS"
      identifiers = ["arn:aws:iam::123456789012:root"]
    }
    actions = [
      "s3:GetObject",
      "s3:ListBucket",
    ]
    resources = [
      aws_s3_bucket.anyshift.arn,
      "${aws_s3_bucket.anyshift.arn}/*",
    ]
  }
}
resource "aws_s3_bucket_policy" "second_account" {
  bucket = aws_s3_bucket.anyshift.id
  policy = data.aws_iam_policy_document.second_account.json
}

Finally, you can also declaratively create S3 blobs inside your bucket with Terraform. This is more common than you think. One example is if you need to provide a configuration file to another AWS service that can only read it from S3. Another example could be if you create Lambda functions and store the source code on S3.

You can create blobs using the aws_s3_object resource type:

resource "aws_s3_object" "source" {
  bucket = aws_s3_bucket.anyshift.id
  key    = "path/to/blob/data.txt"
  source = file("${path.module}/input/data.txt")
}

Combining the aws_s3_object resource with the archive and local providers allows you to create an archive (zip) file locally that you upload as a blob in your S3 bucket.

It should be clear by now that there are a lot of S3 resource types in the AWS provider for Terraform. For brevity I will skip the other resource types that are less common.

Best practices for Amazon S3

S3 is a service that has been involved in a few data leaks over the years. This is because it used to be too easy to accidentally make the S3 bucket public. AWS has worked to change this situation, so today it is harder to make this mistake.

There are still many things to think about when it comes to Amazon S3. In the following sections we will go through a few best practices to adopt.

Protect your data

Data is one of your most valuable assets, perhaps even the most valuable.

You should protect your data wherever it is stored on AWS. This is also true for Amazon S3.

As with all other services on AWS you can use IAM policies for your principals (users, groups, roles) to control what they are allowed to do with your S3 buckets and the data stored within them. For cross account access you also need to use S3 bucket policies in addition to normal IAM policies.

You can allow access to blob name prefixes in your IAM policies. You can be very granular in what a given policy allows access for. There are also many IAM conditions you can use in your policies when working with S3.

The data in an S3 bucket is encrypted at rest using an AWS KMS key. You can use your own key or use the S3 provided key managed by AWS. Using the standard AWS KMS key is a protection against someone walking into an AWS data center and ripping out a hard drive with your data. If something like that would happen, they would still not be able to read your data.

S3 uses HTTPS to encrypt data in transit. The minimum supported TLS version is 1.2, but version 1.3 is recommended.

Avoid making your S3 buckets public

As mentioned in the introduction to the best practice section, It is possible to make an S3 bucket publicly available. However, the reasons for doing so are rare. Whatever your reason is, discuss with your colleagues to discover potential other approaches to achieve the same end result.

One common example is hosting a website in an S3 bucket. You can host the website directly through the S3 service. However, a better approach is to front the bucket by an Amazon CloudFront CDN solution. In this case you add the S3 bucket as an origin in the CloudFront distribution. You only allow the specific CloudFront distribution to access blobs in the S3 bucket. You can set up much greater control over your content this way, and you do not need to expose your bucket directly to the internet.

Another common example is to provide access for an individual to a specific file in an S3 bucket, where the receiver does not have access to AWS. It could be tempting to expose the file publicly. A better solution is to generate a pre-signed URL with a TTL that allows the receiver to download the file. After the TTL has passed the file is no longer accessible.

Use object replication to another region for critical data (e.g. Terraform state files)

If you are storing data in your Amazon S3 bucket that must be available at all times, make sure to set up object replication to a different region. Even if the primary AWS region would experience issues, the data will be available from a different region.

This recommendation can seem extreme considering S3 promises eleven nines (99.999999999%) durability and four nines (99.99%) availability of your data for a given year. That means S3 promises that your data is at most unavailable for 52 minutes during a year. However, this is the stated SLA from AWS. It is not a guarantee set in stone.

This best practice is especially important for things like your Terraform state files. If you use Amazon S3 as your Terraform backend for state storage you must make sure that the state files are always available. If not, you would not be able to run Terraform if the primary AWS region is not available. This could have severe consequences.

Enable object versioning

Object versioning allows you to store all versions of blobs. This is important for things like the Terraform state file. If you need to recover a previous version of the state file, then you must enable object versioning.

Note that this is needed even if you are using object replication (see the previous section). Since all changes are replicated to the other region you will not be able to use the replica to recover an older version of the blob.

Use lifecycle rules to manage costs

Lifecycle rules allow you to transition blobs between different storage classes automatically based on usage of individual blobs. This is a powerful but still simple way to manage storage costs.

One concrete example is to manage CloudTrail audit logs using lifecycle rules. For compliance reasons you may have to store these logs for a long time. It is relatively rare that you have to go back several years and actually look through the data. An auditor could ask you to provide this data so it must be kept in storage.

It might suffice to store a few days worth of data in the standard class so that you can query it if needed. The previous 1-3 years worth of data could be stored in an infrequent access storage class, while all older data (up to the number of years you must keep) can be moved to archive storage.

You could also use lifecycle rules to remove older versions of blobs (if you followed the best practice around versioning) if you only need to store 2-3 older versions of each blob.

Set up a name convention for blobs

As mentioned, all blobs in a bucket are stored in a flat hierarchy. However, the use of "/" in blob names creates a virtual directory hierarchy. Unless you are doing big data applications you do not need to even be aware of the distinction between a flat hierarchy and a virtual directory hierarchy.

Using a blob name convention is more or less important depending on your use case. One use case where it is important is for storing Terraform state files. You should use a name convention that allows you to easily work with IAM policies to allow access to the data.

For instance, you could name state files starting with state/team-a/ for all state files related to Team A. This allows you to give read and write permissions to state/team-a/* for members of Team A.

There are other naming conventions to consider. It all depends on your environment and needs. The important take-away here is that you should come up with a convention, and then stick to it.

Terraform and Anyshift for Amazon S3

Amazon S3 is used throughout your AWS environment. S3 is usually an integral part of many other services and your custom applications. S3 is also a pure data storage solution where you might have all your audit logs from the past ten years, potentially required during your next audit. Generally, the majority of the data you store on AWS is stored in one or more S3 buckets.

One conclusion from this is that if you manage your S3 buckets using Terraform, you must take care to make changes that do not negatively affect your running applications and systems that depend on these buckets.

You must have a robust and secure workflow for introducing changes into your environment. Removing buckets, blobs, or changing bucket configurations could have far-reaching implications.

Anyshift creates a digital twin of your AWS environment, taking your Terraform state files and Terraform configurations in your git repositories into account. This allows Anyshift to provide contextual insights into how a change to resource in your Terraform configuration could affect the rest of your AWS environment. Perhaps in ways you could not even imagine. An AWS knowledge graph like this is a powerful feature that gives Anyshift great insight into your current context and can judge changes to your Terraform infrastructure accurately and give you an impact analysis.

An SRE AI-copilot like Anyshift that can inform you before-the-fact that a change in your Terraform configuration could lead to potential issues in a different Terraform configuration. This is a powerful feature that could reduce the on-call load on your SREs. The only better option to resolving issues in your cloud infrastructure quickly is to avoid the issue to start with.

Take no chances when managing your most valuable assets: your data.

Visit the documentation for more on how Anyshift can help you understand your context better.

Conclusions

Amazon S3 is a blob storage solution on AWS. Blobs are usually text, images, movies or binary files. S3 allows for massive scale storage.

The storage media on S3 is called a bucket. You can provision and manage everything in S3 using Terraform. The Terraform provider for AWS has split this S3 management into several different resource types to simplify administration. In this blog post we saw how to create the S3 bucket and how to configure the common features of S3 that you will need, including object replication, versioning, encryption and more.

Data is the most valuable asset in your organization. Data placed on Amazon S3 is no exception. Managing Amazon S3 using Terraform requires care. Here Anyshift can help you make informed changes in your environment taking your context into account.

Articles by

Mattias Fjellström

Accelerate at Iver Sverige

Cloud Architect | Author | HashiCorp Ambassador | HashiCorp User Group Leader

Mattias is a cloud architect consultant working to help customers improve their cloud environments. He has extensive experience with both the AWS and Microsoft Azure platforms and holds professional-level certifications in both.

He is also a HashiCorp Ambassador and an author of a book covering the Terraform Authoring and Operations Professional certification.

Blog: https://mattias.engineer
Linkedin: https://www.linkedin.com/in/mattiasfjellstrom/
Bluesky: https://bsky.app/profile/mattias.engineer

See my articles

Find me on Linkedin