geo-distribution with 🚀 YugabyteDB tablespaces

Franck Pachot - Jun 30 '22 - - Dev Community

In the first post in this series I have created a RF=3 cluster with 5 nodes and defined a RF=5 placement for the tablets. Here, I'll start the same server but define the placement with tablespaces, to have a finer level control. Indexes, tables and partitions can be created in a tablespace to declare their placement requirements.

Setup the YugabyteDB cluster

I'm starting 5 nodes:

docker network create -d bridge yb

docker run -d --network yb --name yb-eu-1 -p5001:5433 -p7001:7000 -p9001:9000 \
yugabytedb/yugabyte:2.15.0.0-b11 \
yugabyted start --daemon=false --listen yb-eu-1 \
 --master_flags="placement_zone=1,placement_region=eu,placement_cloud=cloud" \
--tserver_flags="placement_zone=1,placement_region=eu,placement_cloud=cloud"

docker run -d --network yb --name yb-eu-2 -p5002:5433 -p7002:7000 -p9002:9000 \
yugabytedb/yugabyte:2.15.0.0-b11 \
yugabyted start --daemon=false --listen yb-eu-2 --join yb-eu-1 \
 --master_flags="placement_zone=2,placement_region=eu,placement_cloud=cloud" \
--tserver_flags="placement_zone=2,placement_region=eu,placement_cloud=cloud"

docker run -d --network yb --name yb-us-1 -p5003:5433 -p7003:7000 -p9003:9000 \
yugabytedb/yugabyte:2.15.0.0-b11 \
yugabyted start --daemon=false --listen yb-us-1 --join yb-eu-1 \
 --master_flags="placement_zone=1,placement_region=us,placement_cloud=cloud" \
--tserver_flags="placement_zone=1,placement_region=us,placement_cloud=cloud"

docker run -d --network yb --name yb-ap-1 -p5004:5433 -p7004:7000 -p9004:9000 \
yugabytedb/yugabyte:2.15.0.0-b11 \
yugabyted start --daemon=false --listen yb-ap-1 --join yb-eu-1 \
 --master_flags="placement_zone=1,placement_region=ap,placement_cloud=cloud" \
--tserver_flags="placement_zone=1,placement_region=ap,placement_cloud=cloud"

docker run -d --network yb --name yb-au-1 -p5005:5433 -p7005:7000 -p9005:9000 \
yugabytedb/yugabyte:2.15.0.0-b11 \
yugabyted start --daemon=false --listen yb-au-1 --join yb-eu-1 \
 --master_flags="placement_zone=1,placement_region=au,placement_cloud=cloud" \
--tserver_flags="placement_zone=1,placement_region=au,placement_cloud=cloud"

Enter fullscreen mode Exit fullscreen mode

As yugabyted defines cloud1.datacenter1.rack1 but I use other names, I define the default placement keeping RF=3 but with at least one tablet in both eu:

docker exec -i yb-eu-1 yb-admin -master_addresses yb-eu-1:7100,yb-eu-2:7100,yb-us-1:7100 \
modify_placement_info \
cloud.eu.1:1,cloud.eu.2:1,cloud.us.1:0,cloud.ap.1:0,cloud.au.1:0 \
3

docker exec -i yb-eu-1 yb-admin -master_addresses yb-eu-1:7100,yb-eu-2:7100,yb-us-1:7100 \
set_preferred_zones cloud.eu.1 cloud.eu.2

Enter fullscreen mode Exit fullscreen mode

Tablespace

In the first post of this series, I defined the number of replicas and placement blocks with: yb-admin modify_placement_info cloud.eu.1:1,cloud.eu.2:1,cloud.us.1:1,cloud.ap.1:1,cloud.au.1:1 \
5
and the leader preference with yb-admin set_preferred_zones cloud.eu.1 cloud.eu.2. This was at cluster level. Here I'll leave the default (RF=3 where leaders are distributed in all the cluster) but create a eu_preferred with the same placement:

psql -p 5001 -c '
create tablespace eu_preferred with (
  replica_placement=$placement$
{
    "num_replicas": 5,
    "placement_blocks": [{
            "cloud": "cloud",
            "region": "eu",
            "zone": "1",
            "min_num_replicas": 1,
            "leader_preference": 1
        },
        {
            "cloud": "cloud",
            "region": "eu",
            "zone": "2",
            "min_num_replicas": 1,
            "leader_preference": 1
        },
        {
            "cloud": "cloud",
            "region": "us",
            "zone": "1",
            "min_num_replicas": 1
        },
        {
            "cloud": "cloud",
            "region": "ap",
            "zone": "1",
            "min_num_replicas": 1
        },
        {
            "cloud": "cloud",
            "region": "au",
            "zone": "1",
            "min_num_replicas": 1
        }
    ]
}$placement$)
'
Enter fullscreen mode Exit fullscreen mode

Leader and Follower placement

I run the same as in the previous posts but with the table creation in the new tablespace:

psql -p 5005 -e <<SQL
drop table if exists demo;
create table demo tablespace eu_preferred 
 as select generate_series(1,1000) n;
update demo set n=n+1;
\watch 0.01
SQL

Enter fullscreen mode Exit fullscreen mode

The leaders are in eu and reads and writes happen there:
Image description

I have followers in all regions:
Image description

The tables in this tablespace have the same distributions as in the first blog post of this series.

Default distribution

Tablespaces gives control over specific tables. But the cluster configuration is de default for others. When you don't mention the tablespace, the distribution follows the default tablet placement for the cluster:

psql -p 5005 -e <<SQL
drop table if exists simple ;
create table simple 
 as select generate_series(1,1000) n;
update simple set n=n+1;
\watch 0.01
SQL

Enter fullscreen mode Exit fullscreen mode

This uses the default for my cluster with two tablets in eu, preferred leader there, and one follower in any other region:

Image description

The updates go to the leaders:
Image description

Now, connecting to au and enabling follower reads:

psql -p 5005 -e <<SQL
set yb_read_from_followers=on;
set default_transaction_read_only = on;
explain analyze select * from simple;
\watch 0.01
SQL
Enter fullscreen mode Exit fullscreen mode

As with this simple table using the default cluster distribution (5 nodes with RF=3), there's not a copy of all data in all nodes, the reads are distributed to all nodes:
Image description

This is another advantage of follower reads: the read are distributed even when the leader are not.

Thanks to default placement, and tablespaces, I can decide which tables are replicated on all regions, and which ones are distributed, with specific consideration for tablet leaders in both cases. By tables, I also mean indexes and partitions. Here I considered eu as the main region to place the leaders, but with declarative partitioning this can also depend on a business value in a table column, like the user country.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .