Of all the amazing announcements from 2023 Reinvent I was probably most excited about the new scaling capabilities announced for AWS Redshift. As a long time Redshift user, perhaps even the debatable customer #1, I'm always eager to learn of the new and exciting ways we can use this service.
Serverless has long been a direction for AWS service offerings, and this was brought to Redshift last year with it's first serverless release. Redshift Serverless V1 was definitely an amazing step forward for the platform. I've implemented at many organizations to supplement existing RA3 infrastructure, or alone as the primary warehouse engine. For workloads that are "spikey", or overall have low duty cycle we've been able to drive amazing price performance.
In terms of price-performance one of the most important parameters for V1 Redshift serverless is the base RPU (Redshift Processing Unit) allocation. This parameter controls initial "warm" Redshift capacity that is ready for work. As you can guess you want this number to be as low as possible to maintain a low baseline cost profile, which is in tension with reducing latency as it autoscales.
💸 IMPORTANT TIP 💸 The default base RPU is 128. When experimenting be sure to turn base RPU way down or you might have an unexpected surprise in your AWS bill.
The main thing to consider here is that the autoscaling is primarily for query concurrency. When it comes to large workloads like ELT jobs, the throughput is limited to the base RPU setting. So if you throw a large workload at an undersized serverless cluster you will not have the performance you expect. If you maintain too high a base RPU, you will wastefully reserving capacity.
The great thing is that since Redshift infrastructure can be manipulated by API, we can do all sorts of creative things like temporarily increasing the RPU setting before our jobs, or even spinning up an ephemeral cluster. But luckily the new scaling options make things much easier now.
AI Scaling
With the latest release Redshift Serverless now uses ML to estimate the required cluster size in order to process a submitted workload. Instead of a base RPU setting you instead control a price performance ratio.
I decided to test out this service using a larger dataset of clickstream data, and troublesome ELT step that I recently refactored. The incremental data is about 25 million rows, and the ELT step is essentially session-izing and creating user/date level aggregates. We'd found that a 128 RPU V1 cluster was providing adequate price performance, with and average run time of just over 4 minutes.
So let's test drive a preview cluster!
I created a new preview cluster, accepting the default of a 50% price performance ratio and ran an incremental workload. The cluster starting from zero hit a max RPU of 128 and completed the workload in 3.5 min and then settled back to zero.
Conclusion
This initial test is really promising, and In the coming month I plan to do additional testing of larger workloads, and additionally layering on some smaller concurrent queries.
Overall this is a great step forward for Redshift. It will enable some really interesting topologies that mix serverless and provisioned RA3 clusters to optimize for just about any workloads, especially when leveraging the new multi-cluster writes (preview).
Cover photo by Ivan Kovac