🚀 YugabyteDB on fly.io - I - yugabyted

Franck Pachot - May 8 '22 - - Dev Community

To run this, you need an account on https://fly.io/ and I'll show how to deploy YugabyteDB, the Open-Source, PostgreSQL compatible, Distributed SQL Database. Two things are important to understand about YugabyteDB:

  • a cluster is an unlimited set of yb-tserver nodes to distributed data, SQL processing and connections. Plus a set of 3 yb-master nodes that control the cluster and stores its metadata. yb-master must know the others. yb-tserver must know the masters, and then will have all information to contact its peers.
  • yugabyted is a process that creates the yb-master and yb-tserver automatically. The first node starts one of each. The two next nodes do the same, updating the list of masters, when joining the cluster (by providing the address of the first node). The next nodes starts only a yb-tserver when joining the cluster. yugabyted is an easy way to start, but production deployment should start their yb-master and yb-tserver with more control.

Install fly.io client

Installation is easy. Check the install.sh script before running this. Don't forget: it is a best practice not to run something from the internet without checking.


curl -sL https://fly.io/install.sh | sh
export FLYCTL_INSTALL="/home/opc/.fly"
export PATH="$FLYCTL_INSTALL/bin:$PATH"

Enter fullscreen mode Exit fullscreen mode

The login is my preferred way: flyctl auth login provides an url to authenticate on a browser:

flyctl auth login

Opening https://fly.io/app/auth/cli/e4517be5976769ad84a1f2a7b4c6b610 ...
Enter fullscreen mode Exit fullscreen mode

I can check the deployed applications with flyctl list apps

$ flyctl list apps

  NAME                                | STATUS  | ORG      | DEPLOYED
--------------------------------------*---------*----------*-----------
  fly-builder-ancient-wildflower-1592 | pending | personal |

Enter fullscreen mode Exit fullscreen mode

This one is the builder, free.

free resources

For additional free resources, they are documented in https://fly.io/docs/about/pricing/#free-allowances

I'll try to stick to free resources for this lab:

  • 3 VMs with 256MB RAM
  • 3 volumes with 1GB
  • unlimited IPv6
  • one IPv4

I want to be sure that you are not charged if you test this (because you need to provide a credit card to access free resources). But, obviously, a working database requires more. You will get memory errors soon when working with this setup.

Anyway, in case of doubt, all resources created in this blog post can be removed with: fly destroy yb-demo --yes

Create the application

For simplicity, I'm using a fly.toml file with all startup commands as a one-liner. Let's explain first what I do in it.

The following lists all private IPs from the private network, with dig, removes the current host ones with grep, takes the first one, with head and format it to a --join option with sed

dig +short aaaa ${FLY_APP_NAME}.internal @fdaa::3 |
  grep -Ev "$(hostname -i | tr " " "|")" |
  head -1 |
  sed -e "s/^/--join /"
Enter fullscreen mode Exit fullscreen mode

I used this to set the join_ip command when starting nodes after the first one.

The following defines the placement flags so that the nodes are aware of the region and YugabyteDB can guarantee a distribution resilient to one region failure: flags="placement_cloud=fly,placement_region=${FLY_REGION},placement_zone=zone,use_private_ip=cloud"

The following increases the Open Files limit: ulimit -n 1048576

Finally this starts YugabyteDB, listening on the private IP, adding the placement flags for the yb-master and yb-tserver, enabling YSQL passwords (default password for "yugabyte" is "yugabyte") and joining the other nodes if it is not the first one: yugabyted start --listen="$(hostname -i)" --master_flags="$flags" --tserver_flags="ysql_enable_auth=true,$flags" --daemon=false $join_ip

It stays in the foreground, with --daemon=false, to keep the container up.

So here is the full fly.toml which adds the mount point (/root/var/data is the default):


cat > fly.toml <<'FLY'
[experimental]
  cmd=['/usr/bin/bash','-c',' join_ip=$( dig +short aaaa ${FLY_APP_NAME}.internal @fdaa::3 | grep -Ev "$(hostname -i | tr " " "|")" | head -1 | sed -e "s/^/--join /" ) && flags="placement_cloud=fly,placement_region=${FLY_REGION},placement_zone=zone,use_private_ip=cloud" && ulimit -n 1048576 && yugabyted start --listen="$(hostname -i)" --master_flags="$flags" --tserver_flags="ysql_enable_auth=true,$flags" --daemon=false $join_ip ']
[deploy]
  strategy = "rolling"
[mount]
source="yb_data"
destination="/root/var/data"
FLY

flyctl launch --region fra --image "yugabytedb/yugabyte:latest" \
 --name yb-demo --no-deploy --copy-config --now

Enter fullscreen mode Exit fullscreen mode

This post is for a quick and easy install, using yugabyted for with multi-node clusters is still in beta, and fly.io experimental cmd so that I've no need to build a docker image. The next posts in this series will show other deployment methods, more appropriate for production, and yugabyted will be improved soon.

flyctl launch has created the application and updated (--copy-config) the fly.toml but I don't want to deploy yet (--no-deploy)

Create the volumes

A database is stateful and needs a persistent volume. The advantage of a distributed SQL database like YugabyteDB is that you don't need shared storage: each VM will have its volume. In fly.io the volume determines the geo-location and this example will show a multi-region deployment between fra (Frankfurt - Germany), cdg(Paris - France) and ams (Amsterdam - Netherlands).


flyctl volumes create yb_data --region fra --size 1
flyctl volumes create yb_data --region cdg --size 1
flyctl volumes create yb_data --region ams --size 1
flyctl volumes list

Enter fullscreen mode Exit fullscreen mode

flyctl volumes

The region is visible in the VM with the FLY_REGION environment variable. I use it to set the placement info when starting YugabyteDB. This is important because the database takes care to balance the leaders and followers across regions, to be resilient to one region failure. For single-region deployment, the ZONE could be interesting (--require-unique-zone is true, creating volumes in separate hardware zone from existing volumes) but I've not found how to get it easily from the VM.

I have created the volumes with 1GB because we can have 3GB free. Obviously, you will create larger for production databases, like in TeraBytes, and you will add more nodes when the database grows.

Deploy

From the local fly.toml and now that the volumes declared in it are created, I can deploy. This runs the yugabyted command declared in the fly.toml. I can also connect to the container with flyctl ssh console and show the YugabyteDB status with yugabyted status

flyctl deploy
sleep 1
flyctl ssh console -C "yugabyted status"
Enter fullscreen mode Exit fullscreen mode

yugabyted status

In this post, I didn't open anything to the public network. There are two ways to access the database: from a container (there are many possibilities, well documented, like connecting to one container in the closest region) or though a proxy.

From a container (you get to one with flyctl ssh console or can choose one with flyctl ssh console -s) you can connect to any node in the region with ysqlsh postgres://yugabyte:yugabyte@top3.nearest.of.${FLY_APP_NAME}.internal:5433/yugabyte (the default password is yugabyte - good to change it here)

Image description

TCP Proxy

Without exposing the service to the public network I can start a local proxy for the console port 7000 and YSQL (the PostgreSQL API) 5433:

flyctl proxy 27000:7000 &
flyctl proxy 25433:5433 &
Enter fullscreen mode Exit fullscreen mode

With this I can see the YugabyteDB Web console on http://localhost:27000 and connect with any PostgreSQL driver on postgres://yugabyte:yugabyte@localhost:25433/yugabyte

Proxying local port 25433 to remote [yb-demo.internal]:5433
Proxying local port 27000 to remote [yb-demo.internal]:7000
Enter fullscreen mode Exit fullscreen mode

The yb_servers() view lists the YugabyteDB nodes:


PGPASSWORD=yugabyte psql -h localhost -p 25433 -U yugabyte \
 -c "select * from yb_servers()"

Enter fullscreen mode Exit fullscreen mode

I have only one node for the moment. Note that the public_ipis the private network here as this is what I've provided:

             host              | port | num_connections | node_type | cloud  | region |  zone   |           public_ip
-------------------------------+------+-----------------+-----------+--------+--------+---------+-------------------------------
 fdaa:0:60f7:a7b:5bd4:0:eb11:2 | 5433 |               0 | primary   | fly    | cdg    | zone    | fdaa:0:60f7:a7b:5bd4:0:eb11:2
(1 row)
Enter fullscreen mode Exit fullscreen mode

Scale out

The beauty of this is that it can scale out. You scale the containers and the database adapts itself (new nodes joining the others, replication factor increased, connections, data and load balanced across all nodes).

To reach replication factor 3, for High Availability, I need 3 nodes. And because I have created 3 volumes in 3 regions, the 3 nodes are geo-distributed:

flyctl scale count 3
flyctl volumes list
Enter fullscreen mode Exit fullscreen mode

volumes list

You can get more info about the VMs in which the containers are running with flyctl vm status or the online dashboard:
Dashboard

But the most important is that you have nothing to do on the database side. The scale-out was detected:


PGPASSWORD=yugabyte psql -h localhost -p 25433 -U yugabyte \
 -c "select * from yb_servers()"

             host              | port | num_connections | node_type | cloud  | region |  zone   |           public_ip
-------------------------------+------+-----------------+-----------+--------+--------+---------+-------------------------------
 fdaa:0:60f7:a7b:23c4:0:eb10:2 | 5433 |               0 | primary   | fly    | fra    | zone    | fdaa:0:60f7:a7b:23c4:0:eb10:2
 fdaa:0:60f7:a7b:23c5:0:eb12:2 | 5433 |               0 | primary   | fly    | ams    | zone    | fdaa:0:60f7:a7b:23c5:0:eb12:2
 fdaa:0:60f7:a7b:5bd4:0:eb11:2 | 5433 |               0 | primary   | fly    | cdg    | zone    | fdaa:0:60f7:a7b:5bd4:0:eb11:2
(3 rows)
Enter fullscreen mode Exit fullscreen mode

From the console (Utilities -> TServer Clocks) I check the latency between nodes:
TServer Clocks
10 milliseconds is OK for a YugabyteDB deployment. There are additional features to control the distribution of tablet leaders, and followers that can be used for reads. When you think about it, 10ms to wait for a write quorum is similar to the local disks we used some years ago. This multi-region deployment ensure immediate (RTO=0) and no-data-loss (RPO=0) recovery if a region is down.

Scale out tablet servers

A minimum of 3 nodes is required for replication factor RF=3 so that the database continues if one region is down, but we can scale with more tablet servers in each region, just by creating the volumes and scale:

flyctl volumes create yb_data --region fra --size 1
flyctl volumes create yb_data --region cdg --size 1
flyctl volumes create yb_data --region ams --size 1
flyctl scale count 6
Enter fullscreen mode Exit fullscreen mode

Within 2 minutes I have 6 nodes:
scale out

(for this screenshot I had defined different fly.io cloud and unknown zone names. With the script above you should have fly and zone - it is not a good idea to have a dot in the names so I changed it in the script)

Scale out connections

I expected that connecting with ${FLY_APP_NAME}.internal would round-robin the connections over all nodes:

for i in {1..10}
do
 ysqlsh -c "select inet_server_addr()" \
 postgres://yugabyte:yugabyte@${FLY_APP_NAME}.internal:5433
done | sort | uniq -c

Enter fullscreen mode Exit fullscreen mode

Apparently this is not how it works:
Image description

I don't know the reason (comments welcome) but, anyway, I don't really need it. The yb_servers() view that displays all nodes, with their region, is used by the YugabyteDB cluster-aware drivers to load balance the connections to all nodes with load_balance=true or restrict to the local region with load_balance=true&topology_keys=fly.${FLY_REGION}.zone

YugabyteDB is PostgreSQL compatible, by re-using the PostgreSQL query layer, and then you can use it in place of PostgreSQL. Instead of 1 writer + 2 standby readers, you will have 3 active nodes, with application continuity over rolling upgrades or node/region failure. Here the first 3 nodes hold the yb-masters (the control place) and yb-tservers (data plane) and the additional nodes will have only yb-tservers. In the next posts, I'll show a more complex setting, with more control over yb-masters, yb-tservers.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .