Microblogging with Serverless Redis

K - May 12 '21 - - Dev Community

I'm still in search of a datastore for my microblogging SaaS product. I read the DynamoDB book and had the impression that the database isn't quite optimal.

Last week I found out about Upstash, a managed database service with a Redis compatible API. It also comes with on-demand pricing, so I thought, let's take a look!

What is Upstash?

It's a managed database service that is API compatible with Redis, the key-value store you all seem to love. And it comes with a serverless pricing model, starting with a free tier for small databases.

It can be deployed into the cloud, and region, of your choice, so the latency of most requests is usually way below hundred milliseconds.

Upstash, like DynamoDB, is a NoSQL database, but Upstash goes more in the direction of simplicity, which requires you to put more of the data modeling into your application code.

Oh and, it comes with a GraphQL API because that's a thing now, right? When they get on-par with Redis features like pub/sub, this will map very nicely to GraphQL subscriptions.

Microblogging with Redis

I plan to build a company-internal microblogging service. I got the idea when I was scrolling through Twitter one morning, thinking, "I get all news from my industry by scrolling on my phone. Wouldn't it be cool if people could get their work-related news in a similar way?"

Anyway, I mostly a frontend developer. I did a few APIs back in the days with PHP and Node.js, but this wasn't my core competency. The backend work usually didn't require me to choose database technology. The "real" backend developer already did, and I just had to use it—file system storage, MySQL, MongoDB, RethinkDB, PostgreSQL, and whatnot.

Finding a good database for my use case has proven to be quite a chore, but a welcome one, because I read many interesting things about databases in the last weeks.

After I found out about Upstash, I looked into their offering and Redis, and until now, I like what I see.

Redis seems to be very simple; for example, it doesn't allow for nested data structures. On the other hand, it has very low latency, and many commands are O(1) in complexity, making building on top of this intriguing.

One of my goals for this project was to keep it as serverless as possible, so Upstash's on-demand pricing and free tier come in rather handy.

Data modeling with Redis

Upstash doesn't support all of Redis's features yet, so I would have to get by with the basics. But I think this isn't an issue since simplicity seems to be the spirit of Redis anyways.

My system will let people write small blogposts, like Twitter. These usually belong to a company, a user, and one or more teams. They will also have hashtags, because why not?

The requirement that every post belongs to a company can be solved with one database per company; this follows the siloed multi-tenancy model, which leads to decent isolation.

The requirement that every post belongs to one user is simple too, in the production environment, I would simply use a hash to store a post and add a field for the user ID to it.

But what about things like teams and hashtags? After all, every post has multiple hashtags and teams, and every team and hashtag can have multiple posts. Many-to-many relationships are usually the prime domain of relational databases, but can they be done with Upstash?

It turns out they can!

I tried this out with hashtags. A string that contains text, which, in turn, is sprinkled with hashtags, is my data model for the posts.

post:id  ->  3
posts    ->  [post:1, post:2, post:3]
post:1   ->  "A blog post with a #cool hashtag."
post:2   ->  "Another post with a #cool hashtag!" 
post:3   ->  "And the third post, with #another hashtag."
...
Enter fullscreen mode Exit fullscreen mode

The post:id item is just an integer that gets incremented when a new post is created, so the next post can use it to generate its ID. Seemingly this is how it's done in Redis, but I will investigate further. For this experiment, this should suffice.

The posts item holds a set with all the posts created. Sets, sorted sets, and lists can hold up to four billion entries, so I think they should be future proof for quite some time.

4,000,000,000 posts / 10,000 users / 10 years / 52 weeks
is roughly 750 posts/week
Enter fullscreen mode Exit fullscreen mode

For the hashtags, I'm using a set too. A sorted set or a list is probably the better solution in the real system since the posts need to be sorted by their creation date.

hashtag:cool     ->  [post:1, post:2]
hashtag:another  ->  [post:3] 
Enter fullscreen mode Exit fullscreen mode

Connecting to Upstash

Let's try this out with some example code!

I created a project on GitHub. I used the CDK to create an API Gateway backed by a Lambda function that connects to Upstash.

In lib/upstash-microblogging-stack.ts you will find the environment variables used by the Lambda function for the Upstash connection.

environment: {
  REDIS_ENDPOINT: "<DATABASE_ENDPOINT>",
  REDIS_PORT: "<DATABASE_PORT>",
  REDIS_PASSWORD: "<DATABASE_PASSWORD>",
},
Enter fullscreen mode Exit fullscreen mode

You find the values for <DATABASE_ENDPOINT>, <DATABASE_PORT>, <DATABASE_PASSWORD> in the Upstash console after you created a database.

Upstash database credentials

The actual database connection happens inside the Lambda function code, which is located at lib/backend/index.js.

I created the connection outside of the function body, so it's only created on a cold start. All subsequent requests handled by that Lambda function are handled with the same connection.

const Redis = require("ioredis");
const redisClient = new Redis({
  host: process.env.REDIS_ENDPOINT,
  port: process.env.REDIS_PORT,
  password: process.env.REDIS_PASSWORD,
});
Enter fullscreen mode Exit fullscreen mode

CRUD with Upstash

Okay, no update for this experiment; let's keep things simple. Just create, read and delete. Also, no read for one post; only read for all posts or filtered by hashtag.

Create

Let's start with the creation of a post.

async function createPost(text) {
  const id = await redisClient.incr("post:id");
  const postKey = `post:${id}`;

  const transaction = redisClient.multi();

  transaction.set(postKey, text);
  transaction.sadd("posts", postKey);

  extractHashtags(text).forEach((hashtag) =>
    transaction.sadd(`hashtag:${hashtag}`, postKey)
  );

  await transaction.exec();

  return createResponse(201, { post: { id: postKey, text } });
}
Enter fullscreen mode Exit fullscreen mode

First, I get the next id for the postKey from the post:id item; then, I create a transaction for the post creation.

I don't have enough Redis knowledge to know if this one of these commands could fail if another client did something in-between, so I used the transaction. Otherwise, a pipeline would probably enough, which wouldn't lock the whole database until all commands are done.

Anyhow, I create a new post, add its key to the posts set that keeps track of all posts and add it to the sets of every hashtag in that post.

I use a utility function to extract the hashtags from the string and remove the hash character.

const extractHashtags = (text) =>
  text.match(/#\w*/gm).map((hashtag) => hashtag.substr(1));
Enter fullscreen mode Exit fullscreen mode

Redis and, in turn, Upstash are pretty chill about adding and creating sets, so if you add something to a non-existing set, it will be created.

When I call exec on the transaction the whole batch of commands will be sent to Upstash, locking the database until every command was executed.

Read

The next step is to read the posts we created.

async function listPosts(hashtag) {
  const setKey = hashtag ? `hashtag:${hashtag}` : "posts";

  const postKeys = await redisClient.smembers(setKey);

  let posts = await redisClient.mget(postKeys);

  posts = posts.map((text, i) => ({ id: postKeys[i], text }));

  return createResponse(200, { posts });
}
Enter fullscreen mode Exit fullscreen mode

The listPosts function would get the content of the hashtag query parameter if it were supplied in the request.

Then it either fetches the post keys from the corresponding hashtag item or the posts.

Some small transformation to create a JSON object for the API client, and we can respond!

Delete

To delete a post, we have to update all the sets we created too!

async function removePost(postKey) {
  const text = await redisClient.get(postKey);

  const transaction = redisClient.multi();

  transaction.srem("posts", postKey);

  extractHashtags(text).forEach((hashtag) =>
    transaction.srem(`hashtag:${hashtag}`, postKey)
  );

  transaction.del(postKey);

  await transaction.exec();

  return createResponse(200, { post: { id: postKey, text } });
}
Enter fullscreen mode Exit fullscreen mode

Again, I use a transaction and add all the commands. Remove the postKey from the posts set, remove it from the hashtag sets, and then delete the post item itself.

Conclusion

Upstash is a fresh take on managed Redis deployments. With the free tier and on-demand pricing, it's pretty cheap to start with.

Since it's just a key-value store, it requires you to do more data modeling in your own code, and for this, I should read more about Redis in general before considering it for my product. I will probably end up wrapping the whole thing in a data layer that keeps track of all the relationships. But since the latency is so very low, it should make sense to use Upstash as a primitive for building a data model.

Some Redis features are still missing, and if you need a full-text search (which I'd love to have for my product) or geospatial queries, you have to wait, but overall it seems like a solid offering.

It would also be cool to integrate with infrastructure as code tools like the CDK or Pulumi, but I think this isn't a big problem because they already offer an API to manage the databases.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .