Here, There, and Everywhere: Why DataStax Built a Serverless, Multi-Region DBaaS (medium)

Craig Kitterman - May 17 '22 - - Dev Community

Replicating data in multiple regions around the globe is a critical part of a data strategy. It reduces latency by enabling users to access data locally and it creates redundancy to support disaster recovery plans. But maintaining a database in multiple regions has been a complicated, costly task. It usually requires the maintenance of fixed hardware and clusters in multiple regions, with intensive work required to ensure peering and networking connections are set up properly.

DataStax today announced that our Astra DB serverless, multi-cloud database-as-a-service (DBaaS) is now available as a multi-region offering (You can read more about the news here). We sat down with DataStax Area Technical Lead Jake Luciani to dig a little deeper into why a multi-region database is such a game changer.

Q: Why is a multi-region database important to businesses?

For some users, it’s important to have a point of presence in every region, for latency reasons. An organization might want to serve multiple geographies, but if you have data workloads and users in several geographies, you want to have that data as close to the end user as possible while still having it replicated in many regions. Our customer’s users move around and it’s key that we enable them with a global footprint.

You want the peace of mind knowing their data is replicated in multiple regions, so if something goes wrong, you can execute a disaster recovery plan and point your users to another region. It increases your availability and you have a high degree of business continuity.

Finally, there is no primary/replica relationship with Apache Cassandra®. Relational databases, on the other hand, require a manual process to fail over between primary and secondary regions in the event of a disaster. It’s easier than it was in years past, but it’s still a cumbersome process. Cassandra, however, is an “AP” (available and partition tolerant) database. If one of the clusters goes into a partitioned state — whether the network connection is cut or the data center goes down — when it comes back up, Cassandra self heals. When that partition is resolved and the clusters can communicate again, they just catch each other up and do all the reconciliation. There is zero work for you to do.

Q: Why is Astra DB a game changer for multi-region database deployments?

For one thing, there is no fixed infrastructure for the operators — our users. If you want to replicate data from one region to another (US West to US East, for instance), it’s just done. Traditionally with Cassandra, you have to install clusters in each region, set up the peering, set up the networking. Now, you don’t do any of that. You go into the UI, or use the API, and boom, you’re done.

Now everything you do is going to get replicated bi-directionally across all regions, and you’ll now have a point of presence in each region to serve your users or maintain business continuity when a disaster hits.

Q: So simplicity is an important part of today’s announcement, but so is cost. Why is maintaining a presence in multiple regions expensive, and how does multi-region Astra DB help?

You have to have that fixed hardware in each region, regardless of how much it is being used. If most of your users are in, say, US East, you still have to have a database of equivalent size in US West, regardless of usage there. You have to scale to the peak that you might see in the biggest region — in both regions.

With serverless, there is no scaling. We take care of all the scalability so you get all the TCO benefits we built with serverless, but now the savings are even better, because you’re basically not paying for the business continuity anymore. Everybody gets business continuity out of the box. It’s no longer a privileged system — not a rich person’s game anymore. Anyone can do this. For each additional region, we charge an additional write. That’s all you pay, along with whatever the network transfer cost is for moving data around.

Now you can be a small app and have the business continuity that a larger app would have. But you also get the ability to serve any geographic region worldwide that might be attractive to you and your business. Before, if you wanted a global app that exists on four continents or 4 regions, that’s $5,000 to $6,000 a month. Now, with serverless and multi-region, the cost is simply 4x the cost of whatever your writes are. If you get a million writes for $1.25. You could replicate 250k writes to four regions for about $5. Exclusive of network costs, which is based on which region you’re going in or out of, you’re basically able to have a worldwide presence for just dollars — not tens or even hundreds of thousands of dollars.

Q: What else makes Astra DB multi-region unique?

Astra DB is the world’s only serverless multi-region database that isn’t delivered as a cloud provider solution. So there are offerings like Dynamo, Spanner, Google CloudSQL, and Cosmos DB, but those are all tied to a single cloud provider. There are also other DBaaS offerings, including MongoDB and Cockroach, that are multi-region, but they aren’t serverless.

We’re not locking you into anything. You can pick your cloud provider and you can deploy your data as you want. And if you want to get off of a particular cloud provider, it’s just Cassandra. If you want to go to another cloud, or go on prem, you can take the data and go from one place to another. If you want to migrate from one cloud to another, it’s like migrating from Cassandra to Cassandra. You’re no longer tied to a specific cloud provider for your persistence.

Q: Let’s dive a little deeper into disaster recovery and business continuity for organizations. Why is serverless, multi-region Cassandra important?

It really enables you to pick anywhere in the world to have a disaster recovery (DR) solution without having a fully operational database there. Then the only thing that’s left up to you are the components surrounding the database — the actual services themselves. But you can essentially go into a “cold” disaster recovery solution without expending a lot of money. Or you can maintain “warm” DR, without having to spend a lot to have a large cluster up and running. You won’t be charged for idle running hardware — it’s just ready to go.

It will have all the data that was replicated before the source went down. It’ll give you that peace of mind. DRs normally require you to pay a pretty significant sum to make sure that you’re covered. Now you just double the cost of your writes — it’s democratizing access to disaster recovery solutions that are usually only available to the largest enterprises out there.

Even for a big enterprise, you can have a dedicated DR that is not just sitting there burning electricity. A warm DR entails having a few services that are ready to go if there’s a failover. You can fail and scale rapidly. But there is no easy way to scale up a DB quickly. The DB on the other side of your DR has to be the same as what you’re running somewhere else, so the DB is always cooking, even in a cold DR scenario. In this case, web servers and queues could be completely shut down, and within minutes scale your underlying serving infrastructure to handle the new traffic.

In a warm DR scenario, you need to be back up in five minutes, so basically you have bare bones deployment ready, then as soon as anything bad happens, either manually or automatically everything else scales up. A hot DR is if you’re down for a minute, you’re losing millions of dollars, so you have to be ready to go instantly. You’re willing to pay more for that.

In all these scenarios, the database has been the fixed cost. With serverless, the database is no longer a fixed cost for your DR. It’s proportional to your business and the load you have on the DB, not a fixed cost. It’s not an estimated peak that you think you’ll need.

Q: Tell me about the challenges the engineering team faced with multi-region Astra DB.

First of all, we had to make several significant improvements to make Astra DB serverless. For example, we split the underlying Cassandra componentry into separate services to make it more cloud-native and scalable in a way that’s conducive to the Kubernetes environment. We then needed to make analogous changes to make those things multi-region.

One such thing is safe schema replication. We’re no longer relying on inherent Cassandra schema management. Instead, we’re using the distributed key-value store that comes with Kubernetes, called etcd. It’s strongly consistent, which gets away from one of the biggest challenges when it comes to schema changes. One of the problems with Cassandra arises when you push a schema change, it will eventually be consistent across the cluster, but if that schema doesn’t replicate to the node, and someone tries to write data using the new schema to that node, it’ll fail. Also, schema changes that happen concurrently can put the cluster into a state that requires manual intervention.

With Astra DB now, that isn’t a major concern. Schema changes are immediately consistent throughout all the nodes in a cluster. We needed a way to make that work in multi-region fashion as well.

We also needed a way to set up the networking so that every Astra DB region is connected to every other one in a safe and reliable way. So we built a global database mesh that enables data to be replicated anywhere on the planet.

Q: What’s next?

We’re working on an “inter-cloud” Astra DB for enterprises with multi-region deployments that use more than one cloud provider; right now you can only replicate data within a particular cloud. We don’t support replication between clouds — yet. Our engineers are hard at work solving the problem of how to make those network links between every database, so, eventually we’re going to be essentially running a huge data backplane across the internet. This is a hard thing to do, but it will be a major win for enterprises who need to replicate data across clouds.

Learn more about Astra DB

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .