TL;DR; This article covers the usage of DynamoDB BatchWrite and BatchGet operations, and how implementing them can help you improve the efficiency by reducing the amount of requests needed in your workload.
Introduction
Have you ever developed any type of workload that interacts with DynamoDB?
If so, you probably have encountered the requirement of retrieving or inserting multiple specific records, be it from a single or various DynamoDB tables.
This article aims to provide you with it by providing all the required resources and knowledge to implement the usage of DynamoDB batch operations and, as a bonus point, increase the efficiency of your current workloads.
What are Batch Operations?
Introduction
When talking about batch operations or batch processing we refer to the action of aggregating a set of instructions in a single request for them to be executed all at once. In terms of interacting with DynamoDB, we could see it as sending a single request that would allow us to retrieve or insert multiple records at once.
Common Bad practices
Continuing with the sample situation mentioned in the introduction, you may face the requirement of having to retrieve or store multiple records at once.
For that scenario, most junior developers might rely on looping over a set of keys and sending the GetItem
requests in sequence or a mid-level developer might propose to parallelize all those requests using for example a Promise.all
, but both approaches are flawed and wont scale well.
On one side, the for-loop
will even be detected by some linters (with rules like no-await-in-loop
) as this implementation would increase the execution time exponentially.
On the other side, the Promise.all
approach will be a tad more efficient by parallelizing the requests, but with high workloads, developers would end up facing issues like the maximum connection limit reached error.
Recommended Implementation
Now that we have gone over some bad practices in implementing it and that you have probably thought of a few projects that could be improved, well dive into how we can take the most advantage of it.
DynamoDB offers two different types of operations BatchGetItem
and BatchWrtieItem
which we will take a look into as part of this article.
There is also BatchExecuteStatement
for those using PartiQL, but we will leave that one for a future article to cover PartiQL in detail.
BatchGetItem
This operation type will allow us to aggregate up to the equivalent of 100 GetItem requests in a single request.
Meaning that with this operation we could retrieve up to 100 records or 16 MB from a single or multiple table at once.
BatchWriteItem
💡PutRequests will overwrite any existing records with the provided keys.
This operation, even if it only contains write
as part of its name, will allow us to aggregate up to 25 PutItem
and DeleteItem
operations in a single request.
Similar to the previous option, well still be limited by the 16 MB maximum, but we would theoretically be able to replace 25 sequential or parallel requests with a single one.
Pagination for Batch operations
Pagination is only valid for the 16 MB limit if the requests dont follow the 100 record read or the 25 record write limit DynamoDB will throw a ValidationException
instead.
Similar to the Scan
and Query
operations, using any of the above Batch*Item
operations can incur in the scenario where the 16 MB maximum is reached and some type of pagination is required.
For Batch* operations this comes in the form of the UnprocessedKeys
attribute that can be part of the response.
Developers are expected to check for this attribute in the response and, if desired, implement its usage as a recursive function to process them automatically.
Full examples for Retrieving, Inserting, and Deleting records using BatchOperations with a recursive implementation to automatically handle the
UnprocessedKeys
can be found here.
Real-world Use Cases
Now that we are aware of all options and limitations regarding how we can process records in batch in DynamoDB, lets see some scenarios that will showcase some real-life improvements.
Scenario 1: Retrieving Data from Multi-table Design Architecture
For this first scenario, lets imagine we are looking to improve the performance of a REST API that, given an array of productId
, will return us the list of desired product details with their respective stock and exact warehouse location. The data is stored in multiple tables, one for each data model (products, stock tracking, and warehouse product location).
Before
The initial implementation was developed by having a for-loop to go over all the provided productIds
and sequentially retrieve all the required data from the different tables.
After
From that initial implementation, you should be able to detect two distinct flaws:
no-await-in-loop
- There is a loop with asynchronous operations inside, which is usually a bad practice, as all operations for a given operation will need to be completed before the next one can start.Sequential
await getItem
requests - This is also a bad practice, as the three operations are independent from each other and wed ideally not want for them to be blocked by each other.
A better approach would look something like this:
Input Validation - Set a limit of maximum items to be requested to avoid requiring parallel
BatchGetItem
requests.
For example - max. 100 items perBatchGetItem
request and every product requires 3GetItem
requests means that a singleBatchGetItem
request can retrieve up to 33 product details.Build Payloads - a helper function will be needed to programmatically build the required payload for the
BatchGetItem
operations taking into consideration the different tables that need to be accessed for each product ID.Recursive
BatchGetItem
- a helper function that recursively calls itself to ensure that allUnprocessedKeys
are retried.Response parsing - a helper function that transforms the
BatchGetItem
response to the given schema that the consumers are expecting for this API
Applying all these changes should significantly increase the efficiency and performance of the API.
Scenario 2: Inserting Data in a Single-table Design Architecture
The second scenario would imply a DynamoDB single table design architecture where we have a single table to store all the information needed for a Dashboard to analyze racehorses historical data. Records such as basic horse information, performance statistics, and race results are stored in the same table.
Before
Similar to the first scenario, we can see that the initial implementation is based on a set of sequential PutItem
requests.
After
From that initial implementation, you should be able to detect two distinct flaws:
no-await-in-loop
- There is a loop with asynchronous operations inside, which is usually a bad practice, as all operations for a given operation will need to be completed before the next one can start.Sequential
await putItem
requests - This is also a bad practice, as the three operations are independent from each other and wed ideally not want for them to be blocked by each other.
A better approach would look something like this:
Build Payloads - a helper function will be needed to programmatically build the required payload for the
BatchGetItem
operations taking into consideration the different tables that need to be accessed for each product ID.Recursive
BatchWriteItem
- a helper function that recursively calls itself to ensure that allUnprocessedKeys
are retried.
Applying all these changes should significantly reduce the required time to upload all information.
Conclusion
Utilizing batch operations in DynamoDB is a powerful strategy to optimize your database interactions. By aggregating multiple requests into a single operation, you can improve performance, reduce latency, and manage resources more effectively. Whether you're dealing with multi-table architectures or single-table designs, batch operations offer a scalable solution to handle large volumes of data efficiently. As you continue to work with DynamoDB, consider integrating batch operations into your workflows to maximize the potential of your applications.
Recap of key points
BatchGetItem can retrieve up to 100 records or 16 MB of data in a single request.
BatchWriteItem can be used to insert or delete up to 25 records or 16 MB of data in a single request.
Using Batch* operations can help you reduce the execution time considerably by aggregating requests that were currently being done in sequence.