I'm still in search of a datastore for my microblogging SaaS product. I read the DynamoDB book and had the impression that the database isn't quite optimal.
Last week I found out about Upstash, a managed database service with a Redis compatible API. It also comes with on-demand pricing, so I thought, let's take a look!
What is Upstash?
It's a managed database service that is API compatible with Redis, the key-value store you all seem to love. And it comes with a serverless pricing model, starting with a free tier for small databases.
It can be deployed into the cloud, and region, of your choice, so the latency of most requests is usually way below hundred milliseconds.
Upstash, like DynamoDB, is a NoSQL database, but Upstash goes more in the direction of simplicity, which requires you to put more of the data modeling into your application code.
Oh and, it comes with a GraphQL API because that's a thing now, right? When they get on-par with Redis features like pub/sub, this will map very nicely to GraphQL subscriptions.
Microblogging with Redis
I plan to build a company-internal microblogging service. I got the idea when I was scrolling through Twitter one morning, thinking, "I get all news from my industry by scrolling on my phone. Wouldn't it be cool if people could get their work-related news in a similar way?"
Anyway, I mostly a frontend developer. I did a few APIs back in the days with PHP and Node.js, but this wasn't my core competency. The backend work usually didn't require me to choose database technology. The "real" backend developer already did, and I just had to use it—file system storage, MySQL, MongoDB, RethinkDB, PostgreSQL, and whatnot.
Finding a good database for my use case has proven to be quite a chore, but a welcome one, because I read many interesting things about databases in the last weeks.
After I found out about Upstash, I looked into their offering and Redis, and until now, I like what I see.
Redis seems to be very simple; for example, it doesn't allow for nested data structures. On the other hand, it has very low latency, and many commands are O(1) in complexity, making building on top of this intriguing.
One of my goals for this project was to keep it as serverless as possible, so Upstash's on-demand pricing and free tier come in rather handy.
Data modeling with Redis
Upstash doesn't support all of Redis's features yet, so I would have to get by with the basics. But I think this isn't an issue since simplicity seems to be the spirit of Redis anyways.
My system will let people write small blogposts, like Twitter. These usually belong to a company, a user, and one or more teams. They will also have hashtags, because why not?
The requirement that every post belongs to a company can be solved with one database per company; this follows the siloed multi-tenancy model, which leads to decent isolation.
The requirement that every post belongs to one user is simple too, in the production environment, I would simply use a hash to store a post and add a field for the user ID to it.
But what about things like teams and hashtags? After all, every post has multiple hashtags and teams, and every team and hashtag can have multiple posts. Many-to-many relationships are usually the prime domain of relational databases, but can they be done with Upstash?
It turns out they can!
I tried this out with hashtags. A string that contains text, which, in turn, is sprinkled with hashtags, is my data model for the posts.
post:id -> 3
posts -> [post:1, post:2, post:3]
post:1 -> "A blog post with a #cool hashtag."
post:2 -> "Another post with a #cool hashtag!"
post:3 -> "And the third post, with #another hashtag."
...
The post:id
item is just an integer that gets incremented when a new post is created, so the next post can use it to generate its ID. Seemingly this is how it's done in Redis, but I will investigate further. For this experiment, this should suffice.
The posts
item holds a set with all the posts created. Sets, sorted sets, and lists can hold up to four billion entries, so I think they should be future proof for quite some time.
4,000,000,000 posts / 10,000 users / 10 years / 52 weeks
is roughly 750 posts/week
For the hashtags, I'm using a set too. A sorted set or a list is probably the better solution in the real system since the posts need to be sorted by their creation date.
hashtag:cool -> [post:1, post:2]
hashtag:another -> [post:3]
Connecting to Upstash
Let's try this out with some example code!
I created a project on GitHub. I used the CDK to create an API Gateway backed by a Lambda function that connects to Upstash.
In lib/upstash-microblogging-stack.ts
you will find the environment variables used by the Lambda function for the Upstash connection.
environment: {
REDIS_ENDPOINT: "<DATABASE_ENDPOINT>",
REDIS_PORT: "<DATABASE_PORT>",
REDIS_PASSWORD: "<DATABASE_PASSWORD>",
},
You find the values for <DATABASE_ENDPOINT>
, <DATABASE_PORT>
, <DATABASE_PASSWORD>
in the Upstash console after you created a database.
The actual database connection happens inside the Lambda function code, which is located at lib/backend/index.js
.
I created the connection outside of the function body, so it's only created on a cold start. All subsequent requests handled by that Lambda function are handled with the same connection.
const Redis = require("ioredis");
const redisClient = new Redis({
host: process.env.REDIS_ENDPOINT,
port: process.env.REDIS_PORT,
password: process.env.REDIS_PASSWORD,
});
CRUD with Upstash
Okay, no update for this experiment; let's keep things simple. Just create, read and delete. Also, no read for one post; only read for all posts or filtered by hashtag.
Create
Let's start with the creation of a post.
async function createPost(text) {
const id = await redisClient.incr("post:id");
const postKey = `post:${id}`;
const transaction = redisClient.multi();
transaction.set(postKey, text);
transaction.sadd("posts", postKey);
extractHashtags(text).forEach((hashtag) =>
transaction.sadd(`hashtag:${hashtag}`, postKey)
);
await transaction.exec();
return createResponse(201, { post: { id: postKey, text } });
}
First, I get the next id
for the postKey
from the post:id
item; then, I create a transaction
for the post creation.
I don't have enough Redis knowledge to know if this one of these commands could fail if another client did something in-between, so I used the transaction. Otherwise, a pipeline would probably enough, which wouldn't lock the whole database until all commands are done.
Anyhow, I create a new post, add its key to the posts
set that keeps track of all posts and add it to the sets of every hashtag in that post.
I use a utility function to extract the hashtags from the string and remove the hash character.
const extractHashtags = (text) =>
text.match(/#\w*/gm).map((hashtag) => hashtag.substr(1));
Redis and, in turn, Upstash are pretty chill about adding and creating sets, so if you add something to a non-existing set, it will be created.
When I call exec
on the transaction
the whole batch of commands will be sent to Upstash, locking the database until every command was executed.
Read
The next step is to read the posts we created.
async function listPosts(hashtag) {
const setKey = hashtag ? `hashtag:${hashtag}` : "posts";
const postKeys = await redisClient.smembers(setKey);
let posts = await redisClient.mget(postKeys);
posts = posts.map((text, i) => ({ id: postKeys[i], text }));
return createResponse(200, { posts });
}
The listPosts
function would get the content of the hashtag
query parameter if it were supplied in the request.
Then it either fetches the post keys from the corresponding hashtag item or the posts
.
Some small transformation to create a JSON object for the API client, and we can respond!
Delete
To delete a post, we have to update all the sets we created too!
async function removePost(postKey) {
const text = await redisClient.get(postKey);
const transaction = redisClient.multi();
transaction.srem("posts", postKey);
extractHashtags(text).forEach((hashtag) =>
transaction.srem(`hashtag:${hashtag}`, postKey)
);
transaction.del(postKey);
await transaction.exec();
return createResponse(200, { post: { id: postKey, text } });
}
Again, I use a transaction and add all the commands. Remove the postKey
from the posts
set, remove it from the hashtag sets, and then delete the post item itself.
Conclusion
Upstash is a fresh take on managed Redis deployments. With the free tier and on-demand pricing, it's pretty cheap to start with.
Since it's just a key-value store, it requires you to do more data modeling in your own code, and for this, I should read more about Redis in general before considering it for my product. I will probably end up wrapping the whole thing in a data layer that keeps track of all the relationships. But since the latency is so very low, it should make sense to use Upstash as a primitive for building a data model.
Some Redis features are still missing, and if you need a full-text search (which I'd love to have for my product) or geospatial queries, you have to wait, but overall it seems like a solid offering.
It would also be cool to integrate with infrastructure as code tools like the CDK or Pulumi, but I think this isn't a big problem because they already offer an API to manage the databases.