Mastodon on YugabyteDB

Franck Pachot - Nov 4 '22 - - Dev Community

Twitter is changing and lot of users are moving to Mastodon. With this increase of load, had to scale up their database:

<br>
@Gargron<br>
From 8 cores to 36 cores processing power, let's see how the new database server holds up. Hopefully everything will be much faster from now on, but it should also give me a bit more room to scale horizontally.<br>

Wouldn't it be easier to scale out? Mastodon uses PostgreSQL, so let's try it on YugabyteDB, which is the Open Source Postgres-compatible Distributed SQL database.

My test is available here, ready to run on GitPod: https://github.com/FranckPachot/gitpod-mastodon-yb

Changes for YugabyteDB

I forked this and changed the base image from gitpod/workspace-postgresql to gitpod/workspace-yugabytedb.

As the default port for YSQL, the YugabyteDB PostgreSQL-compatible endpoint is 5433 I changed the DB parameters in .env.production.example to:



DB_HOST=127.0.0.1
DB_PORT=5433
DB_NAME=mastodon
DB_USER=yugabyte
DB_PASS=


Enter fullscreen mode Exit fullscreen mode

The YSQL endpoint takes a few seconds to be available, then I wait that it is available and create a mastodon database:



      # wait until YSQL is available, create the database and utility function
      until ysqlsh <<<'
       create database mastodon;\c mastodon
       ' ; do sleep 1 ; done 2>/dev/null
      # create the database
      RAILS_ENV=production rails --trace db:setup


Enter fullscreen mode Exit fullscreen mode

The creation of the database failed on two index creation:



CREATE UNIQUE INDEX index_ip_blocks_on_ip ON public.ip_blocks USING btree (ip);

NOTICE:  index method "btree" was replaced with "lsm" in YugabyteDB
ERROR:  INDEX on column of type 'INET' not yet supported


Enter fullscreen mode Exit fullscreen mode

I'll not create this index. If it is required for performance reasons, there are some alternatives (use text instead of inet datatype), and a git issue can be opened to support it in Yugabyte.



CREATE UNIQUE INDEX index_unique_conversations ON public.account_conversations USING btree (account_id, conversation_id, participant_account_ids);

NOTICE:  index method "btree" was replaced with "lsm" in YugabyteDB
ERROR:  INDEX on column of type 'user_defined_type' not yet supported


Enter fullscreen mode Exit fullscreen mode

The participant_account_ids is bigint[]. I'm not sure how this makes sense in a unique index because arrays with the same items in different order will be considered different.
Then, I'll create a non-volatile function that sorts the items and returns them as text:



       create or replace function array_signature(a bigint[])
       returns text as $$
       select array_agg(unnest order by unnest)::text
       from unnest(a);
       $$ immutable language sql;


Enter fullscreen mode Exit fullscreen mode

and create the index as:



 CREATE UNIQUE INDEX index_unique_conversations ON public.account_conversations 
       (account_id, conversation_id, (array_signature(participant_account_ids)));


Enter fullscreen mode Exit fullscreen mode

Those are quick workaround. If you want to run Mastodon with YugabyteDB in production, please contact the Yugabyte community: https://www.yugabyte.com/community/

To be able to create the database, I've removed the indexes from the schema definition:



      # remove indexes that are not supported
      sed -e '/"index_unique_conversations"/d' -i db/schema.rb
      sed -e '/"index_ip_blocks_on_ip"/d' -i db/schema.rb


Enter fullscreen mode Exit fullscreen mode

then run the creation:



      # create the database
      RAILS_ENV=production rails --trace db:setup


Enter fullscreen mode Exit fullscreen mode

This works:
<br>
You are now connected to database ** Invoke db:setup (first_time)
** Invoke db:create (first_time)
** Invoke db:load_config (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute db:load_config
** Execute db:create
Database 'mastodon' already exists
** Invoke environment
** Invoke db:schema:load (first_time)
** Invoke db:load_config
** Invoke db:check_protected_environments (first_time)
** Invoke db:load_config
** Execute db:check_protected_environments
** Execute db:schema:load"/>
If you are concerned by the performance of DDL, there are ways to improve it (like CREATE INDEX NONCONCURRENTLY).

Then, once created, I create the unique index with the function on the array:


  <span class="o">#</span> <span class="k">add</span> <span class="n">indexes</span> <span class="k">with</span> <span class="n">DDL</span> <span class="p">(</span><span class="n">this</span> <span class="n">may</span> <span class="n">be</span> <span class="n">done</span> <span class="k">from</span> <span class="k">schema</span><span class="p">.</span><span class="n">rb</span><span class="p">)</span>
  <span class="n">ysqlsh</span> <span class="o">-</span><span class="n">e</span> <span class="o">-</span><span class="k">c</span> <span class="nv">"
   create or replace function array_signature(a bigint[])
   returns text as '
   select array_agg(unnest order by unnest)::text
   from unnest(a);
   ' immutable language sql;
  "</span> <span class="o">-</span><span class="k">c</span> <span class="s1">'
   CREATE UNIQUE INDEX index_unique_conversations ON public.account_conversations 
   (account_id, conversation_id, (array_signature(participant_account_ids)));
   '</span> <span class="n">mastodon</span>
  <span class="o">#</span> <span class="k">end</span> <span class="n">modifications</span> <span class="k">for</span> <span class="n">YugabyteDB</span>
Enter fullscreen mode Exit fullscreen mode
Enter fullscreen mode Exit fullscreen mode




Testing

The logs shows the users and passwords created:
Image description

I create some conversations:
Image description

And look at the account_conversations; table to check that all is ok:
Image description

In conclusion

I love: Gitpod, Mastodon, PostgreSQL and YugabyteDB. If you have an application on PostgreSQL, you can get High Availability (resilience to failures and rolling upgrades) and Elasticity (scale out and down without application downtime) with YugabyteDB. If something is not supported, please open a git issue or ask for a workaround, there are always many possibilities. When moving from monolithic to distributed database, it is also important to check the indexes (hash and range sharding) and test performance.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .