DISCOVERY

January 2nd, 2022

Restricting Access to Static Website Amazon S3 Buckets using Terraform

AWS S3

Terraform

AWS

AWS CloudFront

For years, unsecured AWS S3 buckets have been a major source of data leaks in the Software Engineering industry. Enter a Google search for "S3 bucket public data leak" and you will find countless examples of hackers leaking data from Amazon S3.

The biggest reason why S3 data leaks are so common is that objects (files) in S3 buckets are easily misconfigured to be public over HTTP. When an S3 bucket is public, its contents are available for anyone in the world to view. For engineers new to AWS and the S3 service, the mistake of configuring S3 buckets to be public is very common.

Over the years, AWS has provided many ways to secure S3 buckets and provide explicit warnings to users if S3 buckets are not properly protected. AWS accounts can even have account-wide restrictions, preventing the creation of public buckets.

In my AWS account, most of the S3 buckets have restrictions for who can access their contents. However, just a week ago, some buckets in my account were public, such as those used for static websites or those containing assets such as images and fonts for my websites. While there isn't much danger in keeping these buckets public - they don't contain any sensitive data - it is still best practice to keep them private. This article discusses how to restrict access to these S3 buckets, while still keeping their contents publicly available via HTTP. All the infrastructure changes I made to restrict S3 access are written with Terraform.

In the AWS Console, if an S3 bucket is publicly accessible, a warning icon is displayed.

This warning is meant to deter users from leaving S3 buckets without any security. In 2018, AWS added a "Block Public Access" feature to S3, allowing engineers to restrict the creation of public buckets account-wide. If an account's "Block Public Access" settings aren't restrictive towards public S3 buckets, the AWS Console provides additional warning messages.

The public S3 buckets shown in my account are used as static websites. These static websites are distributed and given domain names using the AWS CloudFront CDN. For example, one of my S3 buckets, react16-3.demo.jarombek.com, is distributed using CloudFront and accessible at the https://www.react16-3.demo.jarombek.com domain. Back in February 2020, I wrote an article about how to write the AWS infrastructure for this static website using Terraform.

S3 buckets are also given Amazon domain names. For example, my bucket has the domain http://react16-3.demo.jarombek.com.s3-website-us-east-1.amazonaws.com/. For buckets with public access, anyone with internet access can read the contents of the S3 bucket from this URL.

If you navigate to my S3 bucket's Amazon domain name now, you will receive a HTTP 403 Forbidden error message. This is because my S3 bucket is now private. However, at the time I took the AWS Console screenshots above (back when the react16-3.demo.jarombek.com bucket was still public), navigating to http://react16-3.demo.jarombek.com.s3-website-us-east-1.amazonaws.com/ showed the same static website as the official CloudFront URL, https://www.react16-3.demo.jarombek.com.

While re-evaluating my S3 infrastructure recently, I decided it would be a good idea to make all my S3 buckets, even those for static websites, private. This way, their contents would be inaccessible through Amazon URLs, but still accessible via the CloudFront CDN. I also decided to change the "Block Public Access" settings for my account. This way, no S3 buckets can be made public in my account in the future.

To demonstrate how I made my S3 buckets private, I'll again use the react16-3.demo.jarombek.com bucket as an example. The Terraform infrastructure for this bucket exists on GitHub in my jarombek-com-infrastructure repository.

The first piece of infrastructure to create is the S3 bucket, which is provisioned with the aws_s3_bucket Terraform resource.

resource "aws_s3_bucket" "react16-3-demo-jarombek" { bucket = "react16-3.demo.jarombek.com" acl = "private" tags = { Name = "react16-3.demo.jarombek.com" Environment = "production" Application = "react-16-3-demo" } website { index_document = "index.html" error_document = "index.html" } }

In regards to security, the important argument in aws_s3_bucket is acl. acl configures the Access Control List (ACL) value for the S3 bucket. acl defaults to private, which gives the owner of the S3 bucket (my AWS account) full access to the bucket and provides no access to everyone else. Back when my S3 buckets were publicly accessible, the value of acl was public. The full list of ACL options for S3 buckets are available in the AWS documentation.

The next piece of infrastructure is a Public Access Block configuration for the S3 bucket, which is provisioned with the aws_s3_bucket_public_access_block resource.

resource "aws_s3_bucket_public_access_block" "react16-3-demo-jarombek" { bucket = aws_s3_bucket.react16-3-demo-jarombek.id block_public_acls = true block_public_policy = true restrict_public_buckets = true ignore_public_acls = true }

Public Access Block configurations can be applied account-wide or to a specific S3 bucket. The example above is configured for an individual bucket. Public Access Block configurations contain four settings, represented by the block_public_acls, block_public_policy, restrict_public_buckets, and ignore_public_acls arguments. Giving these arguments a value of true turns them on, while values of false (the default value) keep them off. Since I want these settings applied for additional security, I assign them values of true.

For full details on what these four Public Access Block settings achieve, check out the AWS documentation on the subject. The gist is that with these settings applied, requests to create buckets or objects in buckets with public IAM policies or public ACLs is prohibited.

As I mentioned earlier, Public Access Block settings can be applied to individual S3 buckets or an entire AWS account. If Public Access Block settings are applied to both, the more restrictive of the two settings is enforced. For example, if the account-wide settings are turned on and the settings for a S3 bucket, let's say react16-3.demo.jarombek.com, are turned off, the restrictiveness of the account-wide settings are applied to the bucket1.

Account-wide Public Access Block settings are created using a aws_s3_account_public_access_block resource. The account-wide settings for my AWS account, which exist in my global-aws-infrastructure repository, are shown below.

resource "aws_s3_account_public_access_block" "access" { block_public_acls = true block_public_policy = true restrict_public_buckets = true ignore_public_acls = true }

With account-wide Public Access Block settings turned on, the AWS Console displays the following screen:

After the Public Access Block settings are properly configured, I create a CloudFront distribution for the S3 bucket with the aws_cloudfront_distribution resource. The CloudFront distribution also has a corresponding aws_cloudfront_origin_access_identity resource. Shortened versions of both are shown below, with the full code available on GitHub.

resource "aws_cloudfront_distribution" "react16-3-demo-jarombek-distribution" { origin { domain_name = aws_s3_bucket.react16-3-demo-jarombek.bucket_regional_domain_name ... s3_origin_config { origin_access_identity = aws_cloudfront_origin_access_identity.origin-access-identity.cloudfront_access_identity_path } } ... } resource "aws_cloudfront_origin_access_identity" "origin-access-identity" { comment = "react16-3.demo.jarombek.com origin access identity" }

As far as security is concerned, the aws_cloudfront_origin_access_identity resource is important for granting CloudFront access to a private S3 bucket. It is assigned to the CloudFront distribution with the origin.s3_origin_config.origin_access_identity Terraform attribute. The CloudFront Origin Access Identity (OAI) is a user that can be assigned to CloudFront distributions. The key is to provide read access on a private S3 bucket to an OAI, thus giving CloudFront access to the bucket's contents.

OAIs are given access to S3 buckets using S3 bucket policies, created with the aws_s3_bucket_policy resource. The following Terraform configuration gives the OAI access to my S3 bucket.

resource "aws_s3_bucket_policy" "react16-3-demo-jarombek" { bucket = aws_s3_bucket.react16-3-demo-jarombek.id policy = data.aws_iam_policy_document.react16-3-demo-jarombek.json } data "aws_iam_policy_document" "react16-3-demo-jarombek" { statement { sid = "CloudfrontOAI" principals { identifiers = [aws_cloudfront_origin_access_identity.origin-access-identity.iam_arn] type = "AWS" } actions = ["s3:GetObject", "s3:ListBucket"] resources = [ aws_s3_bucket.react16-3-demo-jarombek.arn, "${aws_s3_bucket.react16-3-demo-jarombek.arn}/*" ] } }

The aws_iam_policy_document data source creates the S3 bucket policy document. It provides s3:GetObject and s3:ListObject access to the OAI (aws_cloudfront_origin_access_identity.origin-access-identity.iam_arn) for the react16-3.demo.jarombek.com bucket.

After all this infrastructure is created, I have a private S3 bucket, whose contents are still accessible to the public through a CloudFront distribution. With all my public s3 buckets made private, the AWS console lists them as follows:

All the infrastructure shown in this article is available on GitHub.

Keeping S3 buckets private, even when they hold publicly accessible assets such as static website files, is critical for cloud infrastructure security. The approach of using a CloudFront OAI to access a private S3 bucket is very easy to configure, and AWS recommends implementing it.