Twitter is changing and lot of users are moving to Mastodon. With this increase of load, had to scale up their database:
Wouldn't it be easier to scale out? Mastodon uses PostgreSQL, so let's try it on YugabyteDB, which is the Open Source Postgres-compatible Distributed SQL database.
My test is available here, ready to run on GitPod: https://github.com/FranckPachot/gitpod-mastodon-yb
Changes for YugabyteDB
I forked this and changed the base image from gitpod/workspace-postgresql
to gitpod/workspace-yugabytedb
.
As the default port for YSQL, the YugabyteDB PostgreSQL-compatible endpoint is 5433
I changed the DB parameters in .env.production.example
to:
DB_HOST=127.0.0.1
DB_PORT=5433
DB_NAME=mastodon
DB_USER=yugabyte
DB_PASS=
The YSQL endpoint takes a few seconds to be available, then I wait that it is available and create a mastodon
database:
# wait until YSQL is available, create the database and utility function
until ysqlsh <<<'
create database mastodon;\c mastodon
' ; do sleep 1 ; done 2>/dev/null
# create the database
RAILS_ENV=production rails --trace db:setup
The creation of the database failed on two index creation:
CREATE UNIQUE INDEX index_ip_blocks_on_ip ON public.ip_blocks USING btree (ip);
NOTICE: index method "btree" was replaced with "lsm" in YugabyteDB
ERROR: INDEX on column of type 'INET' not yet supported
I'll not create this index. If it is required for performance reasons, there are some alternatives (use text
instead of inet
datatype), and a git issue can be opened to support it in Yugabyte.
CREATE UNIQUE INDEX index_unique_conversations ON public.account_conversations USING btree (account_id, conversation_id, participant_account_ids);
NOTICE: index method "btree" was replaced with "lsm" in YugabyteDB
ERROR: INDEX on column of type 'user_defined_type' not yet supported
The participant_account_ids
is bigint[]
. I'm not sure how this makes sense in a unique index because arrays with the same items in different order will be considered different.
Then, I'll create a non-volatile function that sorts the items and returns them as text
:
create or replace function array_signature(a bigint[])
returns text as $$
select array_agg(unnest order by unnest)::text
from unnest(a);
$$ immutable language sql;
and create the index as:
CREATE UNIQUE INDEX index_unique_conversations ON public.account_conversations
(account_id, conversation_id, (array_signature(participant_account_ids)));
Those are quick workaround. If you want to run Mastodon with YugabyteDB in production, please contact the Yugabyte community: https://www.yugabyte.com/community/
To be able to create the database, I've removed the indexes from the schema definition:
# remove indexes that are not supported
sed -e '/"index_unique_conversations"/d' -i db/schema.rb
sed -e '/"index_ip_blocks_on_ip"/d' -i db/schema.rb
then run the creation:
# create the database
RAILS_ENV=production rails --trace db:setup
This works:
** Invoke db:setup (first_time)
** Invoke db:create (first_time)
** Invoke db:load_config (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute db:load_config
** Execute db:create
Database 'mastodon' already exists
** Invoke environment
** Invoke db:schema:load (first_time)
** Invoke db:load_config
** Invoke db:check_protected_environments (first_time)
** Invoke db:load_config
** Execute db:check_protected_environments
** Execute db:schema:load"/>
If you are concerned by the performance of DDL, there are ways to improve it (like CREATE INDEX NONCONCURRENTLY).
Then, once created, I create the unique index with the function on the array:
<span class="o">#</span> <span class="k">add</span> <span class="n">indexes</span> <span class="k">with</span> <span class="n">DDL</span> <span class="p">(</span><span class="n">this</span> <span class="n">may</span> <span class="n">be</span> <span class="n">done</span> <span class="k">from</span> <span class="k">schema</span><span class="p">.</span><span class="n">rb</span><span class="p">)</span>
<span class="n">ysqlsh</span> <span class="o">-</span><span class="n">e</span> <span class="o">-</span><span class="k">c</span> <span class="nv">"
create or replace function array_signature(a bigint[])
returns text as '
select array_agg(unnest order by unnest)::text
from unnest(a);
' immutable language sql;
"</span> <span class="o">-</span><span class="k">c</span> <span class="s1">'
CREATE UNIQUE INDEX index_unique_conversations ON public.account_conversations
(account_id, conversation_id, (array_signature(participant_account_ids)));
'</span> <span class="n">mastodon</span>
<span class="o">#</span> <span class="k">end</span> <span class="n">modifications</span> <span class="k">for</span> <span class="n">YugabyteDB</span>
Testing
The logs shows the users and passwords created:
And look at the account_conversations;
table to check that all is ok:
In conclusion
I love: Gitpod, Mastodon, PostgreSQL and YugabyteDB. If you have an application on PostgreSQL, you can get High Availability (resilience to failures and rolling upgrades) and Elasticity (scale out and down without application downtime) with YugabyteDB. If something is not supported, please open a git issue or ask for a workaround, there are always many possibilities. When moving from monolithic to distributed database, it is also important to check the indexes (hash and range sharding) and test performance.