YugabyteDB is Distributed SQL: resilient and consistent

Franck Pachot - Aug 15 '23 - - Dev Community

In my previous article, I conducted a pgbench test on Citus to underscore its limitations in two key areas:

  • PostgreSQL compatible API: In Citus, creating tables demands the invocation of additional functions alongside the standard SQL CREATE TABLE Data Definition Language.
  • PostgreSQL ACID properties: Citus exhibits eventual consistency in its read operations, and the global database attains a consistent state only when the application is offline.

Distributed SQL sets itself apart by offering comprehensive SQL features on a global scale. It operates as a unified logical database, with data storage and transactions dispersed across multiple nodes. Notably, YugabyteDB extends its value by ensuring PostgreSQL compatibility, enabling application DDL and DML to function seamlessly without necessitating any code modifications.

Startup a YugabyteDB cluster

In this context, all nodes possess equal capabilities to accept PostgreSQL connections, handle SQL processing, manage transactions, distribute table rows and index entries, and store their part in their local storage.

I create a network and start the first node, which I also use to expose the UI on http://n1.yb:15433/

docker network create yb
docker run -d --network yb --hostname n1 -p15433:15433 yugabytedb/yugabyte \
 yugabyted start --background=false --ui=true

Enter fullscreen mode Exit fullscreen mode

I check the status (yugabyted status)
Image description

Then I add as many nodes as I want, with a minimum of 3 to replicate for fault tolerance:

for i in {2..6}
do
docker run -d --network yb --hostname n$i yugabytedb/yugabyte \
 yugabyted start --background=false --join n1.yb
done

Enter fullscreen mode Exit fullscreen mode

It has switched automatically to Replication Factor 3 for resilience to node failure:
Image description

The UI shows the summary with 6 nodes:
Image description

Create pgbench tables

Given YugabyteDB's PostgreSQL compatibility, I run the pgbench utility from the PostgreSQL image, connecting to any of the nodes I've launched, as they are all equal, to create the tables:

docker run -it --network yb \
 -e PGHOST=n4.yb -e PGUSER=yugabyte -e PGPORT=5433  \
 postgres pgbench -iIdtGp yugabyte

Enter fullscreen mode Exit fullscreen mode

I've omitted the VACUUM step (-Iv) from the initialization phases, as YugabyteDB doesn't necessitate VACUUM operations. You are welcome to retain it, but be aware that it will result in a WARNING: VACUUM will be ignored. In YugabyteDB, the storage mechanism utilizes the cluster Hybrid Logical Clock for sequencing transactions and operations on table rows, and index entries. While this might appear intricate, it simplifies various aspects: there's no requirement to fine-tune autovacuum, no occurrence of bloat, and no concerns regarding xid wraparound.

There's a similar warning about the FILLFACTOR, which is not needed either:
Image description

Indeed, Heap Only Tuple (HOT) updates are unnecessary in this context. The tables are stored within their Primary Key LSM-Tree, which allows for solely writing the new column values without duplicating the entire row. This design reduces bloat and obviates the need for HOT updates.

There is nothing else to do. The tables and indexes are already efficiently distributed to "Tablets" and replicated using the Raft consensus algorithm, ensuring data consistency across "Leaders" and their "Peers."

Image description

Run pgbench transactions

I execute transactions over a 10-second interval (-T 10), similar to the previous article. The sole divergence is the omission of the -n parameter, skipping the unnecessary VACUUM step:

docker run -it --network yb \
 -e PGHOST=n4.yb -e PGUSER=yugabyte -e PGPORT=5433  \
 postgres pgbench -nT 10 yugabyte

Enter fullscreen mode Exit fullscreen mode

While this is running, I check the UI, which shows that reads and writes are distributed on all nodes:

Image description

All transactions were processed with no errors:

Image description

Run OLTP workload: strong consistency

Let's run pgbench in the background and check how the tables are consistent:

docker run -it --network yb \
 -e PGHOST=n4.yb -e PGUSER=yugabyte -e PGPORT=5433  \
 postgres pgbench -nT 60 yugabyte &

docker run -i  --network yb \
 -e PGHOST=n4.yb -e PGUSER=yugabyte -e PGPORT=5433 \
 postgres psql yugabyte <<'SQL'
select a.sum, b.sum, t.sum , a.sum = b.sum and b.sum = t.sum
from
  ( select sum(abalance) from pgbench_accounts ) as a
 ,( select sum(bbalance) from pgbench_branches ) as b
 ,( select sum(tbalance) from pgbench_tellers  ) as t
;
\watch 1
SQL
Enter fullscreen mode Exit fullscreen mode

As anticipated, the behavior observed reaffirms YugabyteDB's status as a Distributed SQL database. Both reads and writes exhibit consistent behavior, in line with the expectations.
Image description

The execution proceeded seamlessly without encountering any failures. Remarkably, the latency and throughput remained comparable to the results obtained during the run on Citus, as detailed in the preceding post.
Image description

It's important to underscore that YugabyteDB maintains a high level of performance while offering superior high availability. Unlike the Citus/PostgreSQL/Patroni configuration, YugabyteDB stands out by eliminating downtime associated with unplanned failures and maintenance processes such as OS patching and database upgrades.

Consistency and Resilience

Given the presence of a YugabyteDB laboratory environment, it is feasible to assess the system's resilience. This can be accomplished by isolating one node from the network using the command docker network disconnect -f and validating that the application remains operational despite the network disruption.


docker ps -q

docker run -it --network yb \
 -e PGHOST=n4.yb -e PGUSER=yugabyte -e PGPORT=5433  \
 postgres pgbench -nT 60 --max-tries 10 yugabyte &

sleep 15
docker exec -it 5b63fe9e48e5 hostname
docker network disconnect -f yb 5b63fe9e48e5 

Enter fullscreen mode Exit fullscreen mode

Image description
The observed increase in latency in this short-duration run can be attributed to the TCP Timeout, which is set at 15 seconds. This concerns the first read or write operation for each tablet with its Raft Leader located on the temporarily unreachable node. Importantly, this delay does not result in application errors or timeouts. Subsequently, the reads and writes are directed towards the newly elected Leaders.

You can check the UI to understand what happened. The node was down for more than 3 seconds, and then the tablets have elected new Leaders to take over the ongoing reads and writes:

Image description

Should the network be re-established within a 15-minute window (which can be tested using docker network connect), the tablet peers will synchronize to regain alignment, leading to the election of new Leaders for load rebalancing. However, suppose the network remains disconnected beyond this timeframe. In that case, new peers will initiate bootstrapping on the available nodes to reach three replicas per tablet and no peers in the node that is down for more than 15 minutes:
Image description

This illustrates YugabyteDB's capacity for automated recovery and fault tolerance.

To Summarize...

I've written these two posts to provide concrete facts, which can be easily tested using a conventional SQL application like pgbench within a straightforward Docker-based laboratory setup. Beyond the traditional marketing assertions encompassing Scaling, Sharding, SQL, ACID, and PostgreSQL compatibility, a clear distinction emerges between:

  • Citus or any database sharding solution: In this context, each shard constitutes an individual PostgreSQL database. However, it's crucial to recognize that the cluster as a whole behaves differently from a standard PostgreSQL environment. It exclusively offers eventually consistent without certain critical SQL and ACID attributes.

  • YugabyteDB Distributed SQL: In contrast, YugabyteDB not only distributes connections, transactions, and storage but also presents itself as a unified logical database that spans multiple zones. It encompasses the entire spectrum of SQL features, with robust ACID properties and a commitment to strong consistency and can be used in place of PostgreSQL without changing the application code.

Additionally, we've gained insights into the architectural simplicity of YugabyteDB. All nodes are active, whether pods or virtual machines, and can seamlessly accept connections, DDL, DML, reads, and writes. This is achieved without the need for coordinator nodes or deliberation over synchronous or asynchronous standbys, which would not qualify for being Cloud Native. It is elastic: you can add more nodes online to scale the number of connections, the SQL executions, the reads and writes, and the volume.

Take a Deep Dive into YugabyteDB's Two Layer Architecture

Explore how the YugabyteDB two-layer architecture works, compare to other databases, and examine the benefits it brings to applications.

favicon yugabyte.com
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .