Photo by Krzysztof Niewolny on Unsplash
Introduction
AWS RDS Data API is the service that helps use relational databases without setting up a direct SQL connection. It seems like a useful option, especially for serverless applications, where connections are created inside short-living lambda containers.
With RDS Data API we don't need to open and close connections by ourselves. AWS manages this process (AppSync uses the same mechanism inside RDS resolvers)
Goal
I plan to build a simple HttpAPI endpoint and add a Lambda function as an integration. The application code will be written in TypeScript. For the database, I will use the Aurora Serverless cluster.
The lambda function will call Aurora DB using RDS Data API.
Project
Code is available in the GH repo
Infrastructure
Let's create AWS resources. I am using AWS CDK to define infrastructure
//... imports
export class RdsDataNodejsStack extends cdk.Stack {
constructor(scope: Construct, id: string, props?: cdk.StackProps) {
super(scope, id, props);
const vpc = new cdk.aws_ec2.Vpc(this, "Vpc", {
maxAzs: 2,
});
const dbSecret = new secretsmanager.Secret(this, "Secret", {
generateSecretString: {
secretStringTemplate: JSON.stringify({ username: "master" }),
generateStringKey: "password",
excludePunctuation: true,
includeSpace: false,
},
});
const cluster = new rds.DatabaseCluster(this, "Database", {
engine: rds.DatabaseClusterEngine.auroraPostgres({
version: rds.AuroraPostgresEngineVersion.VER_16_4,
}),
writer: rds.ClusterInstance.serverlessV2("writerInstance"),
vpc,
credentials: rds.Credentials.fromSecret(dbSecret),
enableDataApi: true,
serverlessV2MaxCapacity: 6,
serverlessV2MinCapacity: 0.5,
defaultDatabaseName: "postgres",
});
const rdsAPIFunction = new nodeLambda.NodejsFunction(
this,
"RdsAPIFunction",
{
runtime: cdk.aws_lambda.Runtime.NODEJS_20_X,
entry: "lambda/handlers/getItem.ts", // Path to the Lambda function code
handler: "handler", // Exported handler function name
tracing: cdk.aws_lambda.Tracing.ACTIVE, // Enable X-Ray tracing
environment: {
DB_SECRET_ARN: dbSecret.secretArn,
DB_CLUSTER_ARN: cluster.clusterArn,
DB_NAME: "postgres",
POWERTOOLS_SERVICE_NAME: "getItemService",
},
bundling: {
minify: true,
sourceMap: true,
keepNames: true,
format: nodeLambda.OutputFormat.ESM,
sourcesContent: true,
mainFields: ["module", "main"],
externalModules: [], // we bundle all the dependencies
esbuildArgs: {
"--tree-shaking": "true",
},
// We include this polyfill to support `require` in ESM due to AWS X-Ray SDK for Node.js not being ESM compatible
banner:
'import { createRequire } from "module";const require = createRequire(import.meta.url);',
},
}
);
cluster.grantDataApiAccess(rdsAPIFunction);
dbSecret.grantRead(rdsAPIFunction);
const itemsIntegration = new HttpLambdaIntegration(
"ItemsIntegration",
rdsAPIFunction
);
const httpApi = new apigatewayv2.HttpApi(this, "ItemsApi");
httpApi.addRoutes({
path: "/items/{id}",
methods: [apigatewayv2.HttpMethod.GET],
integration: itemsIntegration,
});
}
}
Application code
As we are not going to use SQL connection directly. Outside the handler we need only to initialize the SDK client to use rds-data
.
I use Lambda Powertools to create the tracer.
import {
RDSDataClient,
ExecuteStatementCommand,
ExecuteStatementCommandInput,
} from "@aws-sdk/client-rds-data";
import { APIGatewayProxyEventV2, APIGatewayProxyResultV2 } from "aws-lambda";
import { Tracer } from "@aws-lambda-powertools/tracer";
import { captureLambdaHandler } from "@aws-lambda-powertools/tracer/middleware";
import middy from "@middy/core";
const dbClusterArn = process.env.DB_CLUSTER_ARN;
const secretArn = process.env.DB_SECRET_ARN;
const databaseName = process.env.DB_NAME;
const TABLE = "items";
type Item = {
id: number;
name: string;
description: string;
price: number;
image: string;
};
const tracer = new Tracer({ serviceName: "getItemFunction" });
const rdsClient = tracer.captureAWSv3Client(new RDSDataClient());
//...
In the handler, I get the id
from the path parameter, and use it as a parameter for SQL SELECT
query.
RDS Data API has some limitations compared to traditional SQL connection, however, it fits perfectly in the scenario with simple queries, the limited size of the returned data, and AWS Lambda context.
I don't use hardcoded DB credentials from SecretsManager and send the secret's ARN instead with a data-rds
call.
// ...
export const lambdaHandler = async (
request: APIGatewayProxyEventV2
): Promise<APIGatewayProxyResultV2> => {
try {
const id = request.pathParameters?.id;
console.log(`id: ${id}`);
if (!id) {
return {
statusCode: 400,
body: JSON.stringify({ error: "Missing 'id' parameter" }),
};
}
const sql = `SELECT * FROM ${TABLE} WHERE id = :id`;
const parameters = [{ name: "id", value: { longValue: Number(id) } }];
const params: ExecuteStatementCommandInput = {
secretArn: secretArn,
resourceArn: dbClusterArn,
sql: sql,
database: databaseName,
parameters: parameters,
};
const command = new ExecuteStatementCommand(params);
const response = await rdsClient.send(command);
const items: Item[] = (response.records || []).map((record) => ({
id: record[0].longValue as number,
name: record[1].stringValue as string,
description: record[2].stringValue as string,
price: record[3].doubleValue as number,
image: record[4].stringValue as string,
}));
return {
statusCode: 200,
body: JSON.stringify(items),
};
} catch (error) {
console.error("Error executing query:", error);
return {
statusCode: 500,
body: "error",
};
}
};
Finally, I use middy
to wrap the handler with the tracer:
//...
// Wrap the handler with middy
export const handler = middy(lambdaHandler)
// Use the middleware by passing the Tracer instance as a parameter
.use(captureLambdaHandler(tracer));
Testing
For testing purposes, I seed some data into DB (just a few rows for the items
table)
After deployment the endpoint works for the single request:
But what is really interesting is how this solution would handle traffic spikes. The expectation is that all pieces (DB cluster, Lambda functions, and SQL connections via RDS Data API) would be able to scale up under some pressure.
I've created a simple scenario with K2
import http from 'k6/http';
import { check, sleep } from 'k6';
import { randomIntBetween } from 'https://jslib.k6.io/k6-utils/1.2.0/index.js';
export let options = {
stages: [
{ duration: '1m', target: 100 },
{ duration: '1m', target: 500 },
{ duration: '2m', target: 500 },
{ duration: '1m', target: 0 },
],
};
export default function () {
// Randomly generate an ID between 1 and 4 for each request
const id = randomIntBetween(1, 4);
// Make a GET request to the endpoint with the random ID
const res = http.get(`https://dcqnlv17sa.execute-api.us-east-1.amazonaws.com/items/${id}`);
// Check if the response status is 200
check(res, { 'status was 200': (r) => r.status === 200 });
// Optional: sleep for a short period between requests
sleep(1);
}
I will gradually increase the traffic from 0 to 100 users in a minute, and to 500 users in the next minute.
Then for 2 minutes, the traffic will remain on the same level, and eventually decrease to 0 users during the last minute.
Each user will send one request and go to sleep for 1 second before sending the next one.
It is quite a basic scenario but should be enough to put our endpoint under some pressure.
Results
During testing around 78,5k requests were sent. All were successful.
p95
request duration was ~310 ms, which is basically the time needed for networking (the endpoint is deployed in us-east-1
and I call it from Europe)
Max time is above 2 seconds, which is OKish, at least for nodejs
runtime. (If I need fast cold starts, I go for Rust
or Go
)
From Lambda perspective, I can confirm, that there were no errors. In the pick, there were ~280 concurrent functions' executions.
Average function durations, as expected, were in most cases below 100 ms.
Let's check how our DB cluster reacted to the traffic spike.
It scaled up to 6 ACU, which is the maximum defined in our stack.
RDS Data API opened 123 connections. What is interesting, this is significantly less than the number of maximum concurrently running lambda functions (280).
Summary
RDS Data API looks like a valid option to consider when building a serverless application that gets data from AWS RDS service. It helps solve the possible issues of managing opened connections and simplifies the flow of connecting AWS Lambda with DB.
The function doesn't need to be inside VPC and it doesn't use directly the password and the username stored in the secrets manager.
It is worth remembering, that RDS Data API has a cost associated with it. It also brings some overhead to getting data from DB and might not be suitable for applications that require real-time data