Watching a YugabyteDB table: replication factor

Frits Hoogland - Jan 17 '23 - - Dev Community

One of the most frequently reoccurring questions that we get at YugabyteDB how to interpret the information that YugabyteDB provides when people are testing (high) availability. This is a summary of that.

Replication factor

The ability to sustain failures for a YugabyteDB cluster is essentially dependent on the replication factor (RF). The default and most common replication factor is 3. That replication factor determines the number of participants in a RAFT group, for which the RAFT mechanism determines the LEADER that acts as the operational participant for performing the tasks the RAFT group protects, and distributed the changes to the FOLLOWERS.

With a replication factor of 3, the RAFT group is able to sustain the failure of one group member, because for the RAFT group to function, it requires a majority to be ALIVE.

A minimal cluster contains RF number of nodes. In such a case the RAFT groups of the different protected resources are allocated on all nodes (RF3 = 3 nodes, RF5 = 5 nodes).

However if you increase the number of nodes to a higher number, such as '6' for a RF3 cluster, it will make the cluster manager spread out the RAFT groups for the tablets as evenly as possible over the nodes in a completely random fashion.

Another very common pattern is that in a cloud RF number of regions are used to spread the number of copies, with the intention to be able to sustain the failure of a region. Of course this is one of the obvious implementations of a database that can sustain partial failure.

This can only work in all cases if a placement policy is created that defines which groups of nodes belong to which region. Without a placement policy, the RAFT group members are randomly spread, and therefore if all nodes in a region would fail, there is no way to guarantee that more than one member of a RAFT group would be on these failed nodes. If more than one member of a RAFT group is on these failed nodes, the RAFT group becomes unavailable because the group gets a minority.

Caveat: when testing with RF number of nodes, this is not necessary, because a node is the same as a RAFT group. Warning: in most real life cases this will not be true, because probably you want more than RF number of nodes.

If you intend to test with more than RF nodes, a placement policy is mandatory. That leads to question:

How to test for and set a placement policy?

A placement policy can be checked/validated in a number of ways:

  1. Via yb-admin: yb-admin -master_addresses 192.168.66.80:7100,192.168.66.81:7100,192.168.66.82:7100 get_universe_config. (obviously you have to set the addresses for your masters here, these are my RFC1918 addresses).
  2. Via http://<master address>:<master http port>/cluster-config.
  3. Via yb_stats --print-cluster-config.

By default, no placement policy exists:

➜ yb_stats --print-cluster-config
{
  "hostname_port": "192.168.66.80:7000",
  "timestamp": "2023-01-17T16:23:39.945575+01:00",
  "version": 2,
  "replication_info": null,
  "server_blacklist": null,
  "cluster_uuid": "144176dd-9d18-463c-868e-d80980b26718",
  "encryption_info": null,
  "consumer_registry": null,
  "leader_blacklist": null
}
Enter fullscreen mode Exit fullscreen mode

The placement policy is shown with replication_info.

In my test cluster I got 3 nodes, for which I each want to place in a region. Again: for RF3 and 3 nodes this is not strictly necessary, but as soon as you have more nodes than the RF set, this becomes important. I want to make sure each defined region gets a single copy of the total of 3 copies:

yb-admin -master_addresses 192.168.66.80:7100,192.168.66.81:7100,192.168.66.82:7100 \
modify_placement_info local.local1,local.local2,local.local3 \
3
Enter fullscreen mode Exit fullscreen mode

Now if I query the cluster configuration again, the placement policy is visible:

➜ yb_stats --print-cluster-config
{
  "hostname_port": "192.168.66.80:7000",
  "timestamp": "2023-01-17T16:27:56.248234+01:00",
  "version": 3,
  "replication_info": {
    "live_replicas": {
      "num_replicas": 3,
      "placement_blocks": [
        {
          "cloud_info": {
            "placement_cloud": "local",
            "placement_region": "local3",
            "placement_zone": null
          },
          "min_num_replicas": 1
        },
        {
          "cloud_info": {
            "placement_cloud": "local",
            "placement_region": "local2",
            "placement_zone": null
          },
          "min_num_replicas": 1
        },
        {
          "cloud_info": {
            "placement_cloud": "local",
            "placement_region": "local1",
            "placement_zone": null
          },
          "min_num_replicas": 1
        }
      ],
      "placement_uuid": null
    },
    "read_replicas": null,
    "affinitized_leaders": null,
    "multi_affinitized_leaders": null
  },
  "server_blacklist": null,
  "cluster_uuid": "144176dd-9d18-463c-868e-d80980b26718",
  "encryption_info": null,
  "consumer_registry": null,
  "leader_blacklist": null
}
Enter fullscreen mode Exit fullscreen mode

This gives us the tools and knowledge for understanding the replication factor, how the replication factor influences availability and replica placement, replica placement policies and how to validate them.

The next part will look into a table and the tablets.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .