RETROSPECTIVE

September 5th, 2019

AWS Lambda Function for MySQL RDS Backups Part II: Building the Function

AWS Lambda

AWS RDS

Terraform

MySQL

AWS S3

AWS Secrets Manager

AWS VPC Endpoint

AWS IAM

AWS

HCL

Python

Bash

Infrastructure as Code

In my previous article I discussed my existing AWS infrastructure and the additional resources needed to automate backups for a RDS MySQL instance using a lambda function. In this article I'll create those additional resources and further explain my design decisions. Finally, I'll test the lambda function and show the backup file in my S3 bucket.

The lambda function for RDS backups is currently implemented for my SaintsXCTF application. Before creating the lambda function, all my application infrastructure was in place. This includes a VPC with four subnets - two public and two private. It also includes web servers behind a load balancer and a highly available RDS MySQL database. Here is a diagram of my existing infrastructure:

As I discussed in my last article, there are four additional resources needed to implement the lambda function. The first is the lambda function itself, which is placed in the VPC alongside the RDS instances. The second is an S3 bucket to store the database backup files. The third is credentials for the database which are stored in Secrets Manager. The fourth and final resource is a VPC endpoint allowing the lambda function to access S3 and Secrets Manager. We actually need two separate VPC endpoints to complete this task. Here is the infrastructure diagram with the new resources:

Now let’s add these additional pieces to my infrastructure using Terraform.

The lambda function for creating backups of an RDS MySQL instance is written in Python. I chose Python because of its concise syntax and my liking of the boto3 AWS SDK. The Python function also invokes a Bash script which executes on the AWS Lambda runtime environment. AWS Lambda functions run on an Amazon Linux server, so executing Bash is no problem1.

The lambda function is scheduled to run every morning at 7:00am UTC. Its located in the same private subnet as the RDS instance. This is necessary for the Lambda function to connect to the database. With the help of IAM roles, the lambda function is granted access to RDS, S3, and Secrets Manager resources.

At the core of the Terraform infrastructure is the lambda function configuration.

locals { env = var.prod ? "prod" : "dev" } #------------------- # Existing Resources #------------------- data "aws_vpc" "saints-xctf-com-vpc" { tags = { Name = "saints-xctf-com-vpc" } } data "aws_subnet" "saints-xctf-com-vpc-public-subnet-0" { tags = { Name = "saints-xctf-com-lisag-public-subnet" } } data "aws_subnet" "saints-xctf-com-vpc-public-subnet-1" { tags = { Name = "saints-xctf-com-megank-public-subnet" } } data "aws_db_instance" "saints-xctf-mysql-database" { db_instance_identifier = "saints-xctf-mysql-database-${local.env}" } data "archive_file" "lambda" { source_dir = "${path.module}/func" output_path = "${path.module}/dist/lambda-${local.env}.zip" type = "zip" } #-------------------------------------------------- # SaintsXCTF MySQL Backup Lambda Function Resources #-------------------------------------------------- resource "aws_lambda_function" "rds-backup-lambda-function" { function_name = "SaintsXCTFMySQLBackup${upper(local.env)}" filename = "${path.module}/dist/lambda-${local.env}.zip" handler = "lambda.create_backup" role = aws_iam_role.lambda-role.arn runtime = "python3.7" timeout = 15 environment { variables = { ENV = local.env DB_HOST = data.aws_db_instance.saints-xctf-mysql-database.address } } vpc_config { security_group_ids = [module.lambda-rds-backup-security-group.security_group_id[0]] subnet_ids = [ data.aws_subnet.saints-xctf-com-vpc-public-subnet-0.id, data.aws_subnet.saints-xctf-com-vpc-public-subnet-1.id ] } tags = { Name = "saints-xctf-rds-${local.env}-backup" Environment = upper(local.env) Application = "saints-xctf" } }

The first thing you will notice is that my IaC is applied to my production and development environments depending on the var.prod variable. Production and development databases live in the same VPC and subnets, so I grab the VPC and subnet information with the data code blocks. I also grab the database information for the specified environment.

When the Terraform module first executes, the archive_file code blocks executes. archive_file zips the functions code because AWS Lambda expects source code to be uploaded in a zip file. This zip file includes a lambda.py file containing the entry point to the lambda function, a backup.sh file which creates a SQL backup file from the RDS instance, and a mysqldump binary which connects to MySQL and dumps the database contents into a SQL file. I will walk through the Python and Bash files momentarily.

The resource block in the Terraform code defines the AWS lambda function. In my production environment the lambda function is named SaintsXCTFMySQLBackupPROD and uses the Python 3 runtime. I pass the RDS instance domain name as an environment variable to the functions runtime environment. This is used to connect to the database. The domain name can also be obtained programmatically in the lambda function, however a NAT Gateway would be required because the lambda function lives in my applications VPC and has no internet access2. NAT Gateways are expensive, so I avoided that approach. RDS does not offer VPC endpoints at this time (as of September 2019)3.

The next chunk of Terraform IaC configures the IAM policy for the lambda function.

resource "aws_iam_role" "lambda-role" { name = "saints-xctf-rds-backup-lambda-role" assume_role_policy = file("${path.module}/role.json") tags = { Name = "saints-xctf-rds-backup-lambda-role" Environment = "all" Application = "saints-xctf" } } resource "aws_iam_policy" "rds-backup-lambda-policy" { name = "rds-backup-lambda-policy" path = "/saintsxctf/" policy = file("${path.module}/rds-backup-lambda-policy.json") } resource "aws_iam_role_policy_attachment" "lambda-role-policy-attachment" { policy_arn = aws_iam_policy.rds-backup-lambda-policy.arn role = aws_iam_role.lambda-role.name }

The rds-backup-lambda-policy IAM policy is attached to the saints-xctf-rds-backup-lambda-role IAM role, which in turn is bound to the lambda function. The IAM policy grants the lambda function access to Secrets Manager, S3, RDS, and the Network Interfaces in the VPC. Network Interface access is required for the lambda function to connect to my VPC4.

{ "Version": "2012-10-17", "Statement": { "Effect": "Allow", "Action": [ "secretsmanager:Describe*", "secretsmanager:Get*", "secretsmanager:List*", "ec2:CreateNetworkInterface", "ec2:DescribeNetworkInterfaces", "ec2:DetachNetworkInterface", "ec2:DeleteNetworkInterface", "rds:*", "s3:*" ], "Resource": "*" } }

Further improvement can be made to this policy by restricting RDS and S3 access to certain operations.

The final piece of lambda function infrastructure is a CloudWatch trigger5. The following IaC configures a CloudWatch event that invokes the lambda function every morning at 7:00am UTC.

resource "aws_cloudwatch_event_rule" "lambda-function-schedule-rule" { name = "saints-xctf-rds-${local.env}-backup-lambda-rule" description = "Execute the Lambda Function Daily" schedule_expression = "cron(0 7 * * ? *)" is_enabled = true tags = { Name = "saints-xctf-rds-${local.env}-backup-lambda-rule" Environment = upper(local.env) Application = "saints-xctf" } } resource "aws_cloudwatch_event_target" "lambda-function-schedule-target" { arn = aws_lambda_function.rds-backup-lambda-function.arn rule = aws_cloudwatch_event_rule.lambda-function-schedule-rule.name } resource "aws_lambda_permission" "lambda-function-schedule-permission" { statement_id = "AllowExecutionFromCloudWatch" action = "lambda:InvokeFunction" function_name = aws_lambda_function.rds-backup-lambda-function.function_name principal = "events.amazonaws.com" source_arn = aws_cloudwatch_event_rule.lambda-function-schedule-rule.arn }

In the previous section I discussed all the infrastructure required to configure the AWS Lambda function. Now let’s explore the function source code!

The lambda function is written in Python and utilizes the boto3 AWS SDK. It grabs the database credentials from Secrets Manager, runs a Bash script which calls the mysqldump command line utility, and uploads the resulting SQL file to an S3 bucket.

import os import boto3 import botocore.config import json import subprocess def create_backup(event, context): """ Create a backup of an RDS MySQL database and store it on S3 :param event: provides information about the triggering of the function :param context: provides information about the execution environment :return: True when successful """ # Set the path to the executable scripts in the AWS Lambda environment. # Source: https://aws.amazon.com/blogs/compute/running-executables-in-aws-lambda/ os.environ['PATH'] = os.environ['PATH'] + ':' + os.environ['LAMBDA_TASK_ROOT'] try: env = os.environ['ENV'] except KeyError: env = "prod" try: host = os.environ['DB_HOST'] except KeyError: host = "" secretsmanager = boto3.client('secretsmanager') response = secretsmanager.get_secret_value(SecretId=f'saints-xctf-rds-{env}-secret') secret_string = response.get("SecretString") secret_dict = json.loads(secret_string) username = secret_dict.get("username") password = secret_dict.get("password") # To execute the bash script on AWS Lambda, change its pemissions and move it into the /tmp/ directory. # Source: https://stackoverflow.com/a/48196444 subprocess.check_call(["cp ./backup.sh /tmp/backup.sh && chmod 755 /tmp/backup.sh"], shell=True) subprocess.check_call(["/tmp/backup.sh", env, host, username, password]) # By default, S3 resolves buckets using the internet. To use the VPC endpoint instead, use the 'path' addressing # style config. Source: https://stackoverflow.com/a/44478894 s3 = boto3.resource('s3', 'us-east-1', config=botocore.config.Config(s3={'addressing_style':'path'})) s3.meta.client.upload_file('/tmp/backup.sql', f'saints-xctf-db-backups-{env}', 'backup.sql') return True

subprocess.check_call() invokes the Bash script which creates the database backup SQL file. The bash script only backs up the saintsxctf database within my MySQL RDS instance:

# backup.sh # Input Variables ENV=$1 HOST=$2 USERNAME=$3 PASSWORD=$4 cp ./mysqldump /tmp/mysqldump chmod 755 /tmp/mysqldump # Use an environment variable for the MySQL password so that mysqldump doesn't have to prompt for one. export MYSQL_PWD="${PASSWORD}" # Dump the saintsxctf database into a sql file /tmp/mysqldump -v --host ${HOST} --user ${USERNAME} --max_allowed_packet=1G --single-transaction --quick \ --lock-tables=false --routines saintsxctf > /tmp/backup.sql

The function takes less than 15 seconds to complete in my tests. By the time it finishes, the S3 bucket is updated with a new SQL file.

I configured an S3 bucket in my production and development environments to hold database backups. The following S3 infrastructure and IAM policy create the buckets and give other AWS resources read and write access to bucket objects (files).

locals { env = var.prod ? "prod" : "dev" } /* The S3 bucket holding database backups keeps old files versioned for 60 days. After that they are deleted. */ resource "aws_s3_bucket" "saints-xctf-db-backups" { bucket = "saints-xctf-db-backups-${local.env}" # Bucket owner gets full control, nobody else has access acl = "private" # Policy allows for resources in this AWS account to create and read objects policy = file("${path.module}/policies/policy-${local.env}.json") versioning { enabled = true } lifecycle_rule { enabled = true noncurrent_version_expiration { days = 60 } } tags = { Name = "SaintsXCTF Database Backups Bucket" Application = "saints-xctf" } }
{ "Version": "2012-10-17", "Statement": [ { "Sid": "Permissions", "Effect": "Allow", "Principal": "*", "Action": ["s3:PutObject", "s3:GetObject"], "Resource": ["arn:aws:s3:::saints-xctf-db-backups-prod/*"] } ] }

Protecting sensitive data is paramount when developing software. To avoid hard coding credentials anywhere in my source code, I utilized AWS Secrets Manager to store database credentials. As I previously showed, the Python lambda function grabs the RDS database credentials from Secrets Manager via the AWS SDK. The following IaC stores RDS credentials in Secrets Manager through a command line variable rds_secrets.

locals { env = var.prod ? "prod" : "dev" } resource "aws_secretsmanager_secret" "saints-xctf-rds-secret" { name = "saints-xctf-rds-${local.env}-secret" description = "SaintsXCTF MySQL RDS Login Credentials for the ${upper(local.env)} Environment" tags = { Name = "saints-xctf-rds-${local.env}-secret" Environment = upper(local.env) Application = "saints-xctf" } } resource "aws_secretsmanager_secret_version" "saints-xctf-rds-secret-version" { secret_id = aws_secretsmanager_secret.saints-xctf-rds-secret.id secret_string = jsonencode(var.rds_secrets) }

My Secrets Manager module on GitHub walks through how to pass the database credentials into Terraform using the CLI. It’s as simple as running the following command from your terminal:

terraform apply -auto-approve -var 'rds_secrets={ username = "saintsxctfprod", password = "XXX" }'

A consequence of my AWS Lambda function living in my application VPC (so it can connect to my RDS instance) is that it has no access to the internet. This is a problem when using the AWS SDK, which requires an internet connection to access other AWS resources. Fortunately, Amazon created a solution to this problem called VPC endpoints. VPC endpoints provide access to other AWS services without the need for an internet connection. I use them to connect to Secrets Manager and S3 from my lambda function.

The following IaC creates my two VPC endpoints:

locals { public_cidr = "0.0.0.0/0" } #------------------- # Existing Resources #------------------- data "aws_vpc" "saints-xctf-com-vpc" { tags = { Name = "saints-xctf-com-vpc" } } data "aws_subnet" "saints-xctf-com-vpc-public-subnet-0" { tags = { Name = "saints-xctf-com-lisag-public-subnet" } } data "aws_subnet" "saints-xctf-com-vpc-public-subnet-1" { tags = { Name = "saints-xctf-com-megank-public-subnet" } } data "aws_route_table" "saints-xctf-com-route-table-public" { tags = { Name = "saints-xctf-com-vpc-public-subnet-rt" } } #---------------------------------- # SaintsXCTF VPC Ednpoint Resources #---------------------------------- resource "aws_vpc_endpoint" "saints-xctf-secrets-manager-vpc-endpoint" { vpc_id = data.aws_vpc.saints-xctf-com-vpc.id service_name = "com.amazonaws.us-east-1.secretsmanager" vpc_endpoint_type = "Interface" subnet_ids = [ data.aws_subnet.saints-xctf-com-vpc-public-subnet-0.id, data.aws_subnet.saints-xctf-com-vpc-public-subnet-1.id ] security_group_ids = [module.vpc-endpoint-security-group.security_group_id[0]] private_dns_enabled = true } resource "aws_vpc_endpoint" "saints-xctf-s3-vpc-endpoint" { vpc_id = data.aws_vpc.saints-xctf-com-vpc.id service_name = "com.amazonaws.us-east-1.s3" vpc_endpoint_type = "Gateway" route_table_ids = [ data.aws_route_table.saints-xctf-com-route-table-public.id ] }

After building my new AWS Lambda, S3, Secrets Manager, and VPC Endpoint resources with Terraform, I’m ready to test out the lambda function. If you want to implement your own version of this lambda function, you will have to tweak the code on GitHub to match your application needs.

When navigating to the AWS Lambda service page in the AWS Console, I can view the code uploaded in my new lambda function.

There are multiple options for invoking the AWS Lambda function. One option is to test the function directly in the AWS Console. Another is to invoke it programmatically using the AWS CLI or SDKs. I could enhance the lambda function by placing it behind API Gateway. Or I can just wait until 7:00am UTC for the function to be invoked via the CloudWatch event.

After testing the function a few times manually, I checked my S3 bucket and saw the backup.sql file as expected:

The next morning, I checked the version history of the backup.sql file and saw it was updated at 3:00am EST (7:00am UTC) as expected:

From here I can download the backup file, test it on a local MySQL instance, or run it on my RDS instance to restore the database contents.

Creating an AWS Lambda function that handles RDS MySQL backups involved a lot of moving parts and was more complex than I initially anticipated. The biggest hurdle was getting the function to connect to an RDS instance living in a private subnet while simultaneously granting it access to S3 and Secrets Manager. You can view the code from this discovery post along with the rest of my application infrastructure on GitHub.

[1] "AWS Lambda Runtimes", https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html

[2] "AWS Lambda: Internet and Service Access for VPC-Connected Functions", https://docs.aws.amazon.com/lambda/latest/dg/configuration-vpc.html#vpc-internet

[3] "VPC Endpoints", https://docs.aws.amazon.com/vpc/latest/userguide/vpc-endpoints.html

[4] "AWS Lambda: Execution Role and User Permissions", https://docs.aws.amazon.com/lambda/latest/dg/configuration-vpc.html#vpc-permissions

[5] "Use terraform to set up a lambda function triggered by a scheduled event source", https://stackoverflow.com/a/35895316