Partitioning vs. Sharding - What about SQL Features?

Franck Pachot - Aug 14 '23 - - Dev Community

I've gone through numerous publications discussing "Partitioning vs. Sharding" recently, particularly in the context of PostgreSQL, largely due to the recent PGSQL Phriday #011 and I was surprised by the low coverage of the limitations with the most basic SQL database features:

  • what about foreign keys?
  • what about unique indexes?
  • what about transaction consistency?
  • what about backups, failover, fault tolerance?

These are the pillars of SQL databases, the features behind the ACID properties, but their development has been rooted in monolithic database architectures. Numerous discussions surrounding sharding or partitioning solutions tend to downplay or overlook the extent to which some or all of these fundamental features are simply not possible over shards or partitions.

Beyond the interchangeable use of terms like "Sharding" or "Partitioning" in the literature, the primary focus lies in determining the specific level at which this division occurs: storage, transactions, SQL processing, proxy pooler, Application?

Disk Striping and Mirroring (Example: Oracle ASM)

Indeed, this is the foundational starting point. A database essentially comprises a collection of files, and prior to delving into the expansion of instances that interact with it, the initial step in breaking down an exceptionally large database involves the utilization of multiple disks for distribution.

This concept has existed for years. Datafile blocks are distributed and replicated across multiple disks. When contrasted with monolithic storage, this approach eliminates constraints related to maximum disk size, IO throughput, and bandwidth. Moreover, with redundancy, it provides resilience against disk failures. The seamless addition or removal of disks doesn't disrupt the application and is achieved without downtime.

This is a proven technology that has been adapted to various implementations. In the context of PostgreSQL, the task is handed over to the operating system, with options like LVM, RAID, SAN, and Cloud Block Devices. Oracle introduces its own implementation with specific optimizations tailored for database files, known as ASM (Automatic Storage Management). Notably, cloud solutions like Amazon Aurora, Google AlloyDB, and Neon employ their unique implementations, which involve capturing writes from the Write Ahead Logging (WAL) mechanism to apply it on storage servers.

In regard to SQL features, since this form of horizontal scalability operates at a level beneath the database, all features remain accessible. Nevertheless, scalability remains constrained by a monolithic compute instance on top of it.

Cluster on shared storage (Example: Oracle RAC)

When the storage is accessible from a cluster of interconnected servers, multiple instances gain the ability to both read from and write to these storage units. While only a handful of implementations exist, the most prominent among them is Oracle RAC. Despite the availability of all SQL features, certain infrastructure limitations come into play. Notably, a low-latency network interconnect is essential due to the presence of a sole memory buffer housing the current state for a set of rows, necessitating constant transmission between instances.
This setup is typically deployed on specialized hardware, such as Oracle Exadata, which isn't accessible on any cloud platform.

Distributed SQL (Example: YugabyteDB)

When the restrictions imposed by the shared storage infrastructure are lifted, the instances can operate on standard commodity hardware interconnected by any network, effectively transforming the solution into a cloud-native architecture. This cannot use the traditional approach where SQL processing reads and write to the database through a shared buffer pool in memory. NoSQL databases introduced a key-value API to split the database into multiple datastores that are both distributed and replicated. Distributed SQL databases add SQL features on top of it, which means that foreign keys, unique constraints and ACID transactions are fully supported at global level.

When discussing this particular level situated beneath SQL processing, the terms "partitioning" and "sharding" can often be used interchangeably. In this context, "partitioning" refers to the division of rows based on their primary key, while "sharding" involves dispersing these rows across multiple key-value data stores. Both concepts are integral components of the same methodology for achieving horizontal scalability. Unfortunately, the terms "partitioning" and "sharding" are used at other level in SQL databases. Hence, it is of utmost importance to be precise and meticulous when specifying the exact level at which this division is occurring.

In the case of YugabyteDB, which employs PostgreSQL for its SQL layer, the term "partitioning" is reserved for the declarative partitioning feature within PostgreSQL. To avoid confusion, for the automatic distribution of table rows and index entries, we say that they are "split" into "tablets". This becomes apparent in the CREATE TABLE syntax when explicitly pre-splitting a table:



create table demo (
 id bigint, col1 int, col2 int, primary key (id hash)
 ) split into 12 tablets;


Enter fullscreen mode Exit fullscreen mode

While navigating YugabyteDB code, you might encounter references to "partition" and "sharding", particularly within the context of DocDB, which serves as the storage and transaction layer for YugabyteDB. However, it's important to be mindful of potential confusion when YSQL, the PostgreSQL API is used on top of it.

In terms of SQL features, this represents the apex of horizontal scalability, encompassing the entirety of SQL capabilities. You have the freedom to connect to any node, where your SQL statements undergo parsing and execution. Despite this distributed setup, you perceive a single logical SQL database, endowed with global ACID properties, transactions, and integrity constraints. This holds true even when you're reading from and writing to other nodes within the network.

At this level, distribution occurs transparently, accompanied by automatic re-balancing of rows when new nodes are introduced. Moreover, this architecture exhibits resilience, effectively managing scenarios where certain nodes experience downtime. An apt comparison would be to envision it as a parallel to disk striping and mirroring, albeit applied to intricate SQL transactions.


Distributed SQL represents a novel iteration of RDBMS, with its origins in Google Spanner, and includes CockroachDB, TiDB, and YugabyteDB.

In contrast, the partitioning or sharding methodologies described below are older technologies with limitations in SQL features when involving multiple partitions or shards.

Declarative Partitioning (Example: PostgreSQL)

I position SQL partitioning here because it divides tables, thereby placing it at a higher level than the previously discussed row distribution but at a lower level than database sharding. SQL partitioning proves beneficial in managing smaller tables, yet for enhanced scalability in SQL processing, it necessitates integration with either Distributed SQL or database sharding methodologies.

In Oracle Database, Declarative Partitioning is compatible with most SQL features, with global indexes to enforce unique constraints.

PostgreSQL is far from that because all indexes are local. Typically, you cannot range partition on a date, for lifecycle management purposes, and have a primary key that doesn't include the date. You cannot have multiple keys, like a generated sequence for internal referential integrity and a UUID or natural key exposed to the user.

YugabyteDB encounters a similar constraint with SQL partitioning, inherited from PostgreSQL. However, in practice, Distributed SQL splitting into tablets is commonly employed to address the scalability needs. SQL partitioning within YugabyteDB is more frequently applied to scenarios such as geo-partitioning or isolating tenants in a multi-tenancy setup, where the partition key forms a component of the primary key.

Database sharding (Example: Citus)

In the absence of Distributed SQL capabilities beneath the SQL layer, the only way to scale out is to split the database into multiple ones. Citus is an extension for PostgreSQL that allows creating a table on multiple databases and connects to a coordinator that sends the query to all databases. Each shard is a PostgreSQL database with all features.

However, it's important to note that certain constraints arise within the coordinating component for queries that involve multiple shards. Similar to declarative partitioning, global indexes are lacking, thus precluding the enforcement of uniqueness across columns. Moreover, foreign key relationships to tables in distinct shards are not supported. Worse than that, there's no ACID read consistency for transactions involving multiple shard. While global transactions are feasible, they solely rely on local transaction tables and two-phase commits (2PC) to synchronize the writes. Beyond the performance implications of the 2PC protocol, the consistency of reads is compromised due to the absence of consensus regarding transaction statuses.

The synchronization between the databases may involve other techniques, still breaking the ACID properties, like multi-master replication with a clever reconciliation algorithm. Notable examples of this approach encompass EDB BDR (now referred to as PGD) and pgEdge.

These techniques find application in scenarios such as data warehouses or multi-tenant environments, where the demand for global transactions might be less pronounced. However, their applicability in scaling out OLTP use cases is relatively infrequent. It's important to note that they can't be accurately classified as Distributed SQL since the distribution of transaction tables is absent. Moreover, these approaches lack the same level of elasticity (re-sharding often involves intricate procedures) and resilience (each shard corresponds to a distinct PostgreSQL database to protect) as found in true Distributed SQL architectures.

Pooler (Example: PgCAt) or Application Sharding

Indeed, it is feasible to manually replicate the functionalities that a database sharding approach provides. This involves creating tables across multiple databases and subsequently dispatching individual queries to them. While certain tools like pgCat may offer assistance, it's important to acknowledge that achieving this level of distribution cannot remain transparent to the application. Numerous SQL features necessitate implementation within the backend of the application, including tasks like sharding, re-sharding, managing failover, and ensuring consistency. The responsibility for handling these intricacies inevitably falls on the application's backend code.


Summary

The capabilities of horizontally scaled databases hinge on the decision of whether to position SQL features above or below the partitioning or sharding threshold. The distinction between using the terms "partitioning" or "sharding" becomes secondary. The critical factor lies in whether you opt to shard at a level above SQL databases, consequently constraining global capabilities, or conversely, partition the tables, indexes, and transaction tables at a lower level, thereby maintaining the façade of a unified logical database for the application. The latter innovation is called Distributed SQL.

Next time you hear about sharding, you should ask about the support of SQL features over shards. Removing SQL features to scale out had its time, with NoSQL databases, but today you can scale horizontally and keep all SQL features.

Image description

Partitioning or Sharding at table or database level is easier but breaks the basic SQL features. Partitioning or Sharding at row level provide all SQL and ACID properties with elasticity and resilence.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .