Pixel Federation Powers Mobile Analytics Platform with WarpStream, saves 83% over MSK

Shawn Gordon - Jun 11 - - Dev Community

by Caleb Grillo

Case Study

Pixel Federation is the developer of nearly a dozen highly popular mobile games with players from all over the world. They have millions of monthly active users, and those millions of users generate lots of events. In fact, Pixel Federation uses an event-driven architecture for almost everything: logging, events, billing, tracking game state, etc.

TrainStation2 by Pixel Federation

Like many other companies, Pixel Federation initially chose Apache Kafka as the message bus to power all of its real-time data streaming infrastructure. Instead of running open-source Kafka themselves, it started with AWS’s managed Kafka offering: MSK.

Initially, things worked great: developers found that instrumenting their applications to emit new events to Kafka was easy, and once other teams at the company realized how easy it was to tap into the flow of real-time data, they started consuming the data as well.

Before they knew it, Pixel Federation’s Kafka cluster had thousands of different topics, more than forty different consumer applications, and was being accessed by Kafka client libraries in 4 different languages. It’s no exaggeration to say that Kafka was the beating heart of Pixel Federation’s data infrastructure.

Unfortunately, this is also when they started to run into problems with their MSK setup. The first problem they ran into was that their bill was growing much faster than their actual data volumes were because they had so many different topics. MSK requires that Kafka brokers are upgraded to larger and larger VMs as the number of topic-partitions increases, even if data volumes remain flat.

The second issue, besides cost, is that like many organizations, Pixel Federation has a complex production environment with different VPCs and AWS accounts. This works great for isolating teams, enforcing security boundaries, and minimizing blast radiuses, but sometimes data in Kafka needs to be shared across network boundaries. For example, Pixel Federation’s game servers run in a completely different AWS account / VPC than their Flink consumers:

This meant that they had to peer their VPCs so that the MSK cluster in VPC1 could be connected to VPC2. If you’ve ever had to set up VPC peering before, you know just how difficult and burdensome it can be. MSK does offer an alternative using their multi-VPC private connectivity feature, but it adds an extra $0.006 / GiB of data transferred. In addition, Pixel Federation had to pay for inter-zone networking for all the traffic between their producers and the MSK brokers, as well as for the traffic between MSK and their consumers. Their average read amplification was 4x, so this resulted in a lot of inter-zone networking fees.

When they migrated to WarpStream, Pixel Federation took advantage of of WarpStream’s Agent Groups functionality to deploy a much more cost effective architecture instead:

They run a group of Agents in the AWS account / VPC that contains their game servers (the data producers) and those Agents write data directly to an object storage bucket that is shared across both of their AWS accounts. In the second AWS account / VPC, they run a second group of Agents that can consume the data written in the other account via the shared object store. In effect, they use a shared object storage bucket as both the storage layer and the networking layer to flex a single logical “Kafka” cluster across two different AWS accounts / VPCs.

This architecture is significantly more cost-effective than their previous MSK solution because they don’t have to pay for any EBS volumes or networking fees. In fact, before adopting WarpStream, Pixel Federation was spending more than $60,000/year on AWS MSK. By comparison, their total cost of ownership with WarpStream is < $10,000/year, a 6x savings on top of all the additional benefits they got with the migration, like the ability to use WarpStream Agents to flex their cluster across multiple VPCs, seamless auto-scaling, and no more manual partition rebalancing to keep their brokers evenly loaded.

Adam Hamsik is the CEO and co-founder of Labyrinth Labs, an AWS partner that has been working with PixelFederation for years helping them manage their cloud infrastructure. He had this to say:

“We have been using Kafka in our application infrastructure for years, and I really liked its scalability and versatility, but in cloud environments, the cost of managed Kafka clusters can be quite significant. As good engineers, we are always looking for the newest innovation that can save us AWS costs. Working with WarpStream Labs was an absolute pleasure. They went above and beyond anyone else we have ever worked with and tuned their application to our needs.” — Adam Hamsik, CEO of Labyrinth Labs

Get Started

If you’re ready to save money and reduce your operational burden, you can sign up for WarpStream and get started in just a few minutes. New signups get $400 in free credit with no expiration, and no credit card is required to get started.

. . . . . . . . . . . .