Automatic ID generation for database records is a fundamental part of application development. In this article, I’ll demonstrate four ways to handle automatic ID generation in Sequelize for PostgreSQL and YugabyteDB, the open source, cloud native, distributed SQL database built on PostgreSQL.
There are many ways to handle ID generation in PostgreSQL, but I've chosen to investigate these approaches:
- Auto-incrementing (SERIAL data type)
- Sequence-caching
- Sequence-incrementing with client-side ID management
- UUID-generation
Depending on your application and your underlying database tables, you might choose to employ one or more of these options. Below I'll explain how each can be achieved in Node.js using the Sequelize ORM.
1. Auto-Incrementing
Most developers choose the most straightforward option before exploring potential optimizations. I'm no different! Here's how you can create an auto-incrementing ID field in your Sequelize model definitions.
// Sequelize
const { DataTypes } = require('sequelize');
const Product = sequelize.define(
"product",
{
id: {
type: DataTypes.INTEGER,
autoIncrement: true,
primaryKey: true,
},
title: {
type: DataTypes.STRING,
}
}
);
If you’re familiar with Sequelize, you’ll be no stranger to this syntax, but others might wonder what's actually happening under the hood.
The autoIncrement
flag tells PostgreSQL to create an id
column with a SERIAL data type. This data type implicitly creates a SEQUENCE which is owned by the products
table's id
column.
// PostgreSQL equivalent
CREATE SEQUENCE products_id_seq;
CREATE TABLE products
(
id INT NOT NULL DEFAULT NEXTVAL('products_id_seq'),
title VARCHAR(255)
);
When inserting a product into our table, we don't need to supply a value for id
, as it's automatically-generated from the underlying sequence.
We can simply run the following to insert a product.
// Sequelize
await Product.create({title: "iPad Pro"});
//PostgreSQL equivalent
INSERT INTO products (title) VALUES ('iPad Pro');
Dropping our table will also drop the automatically-created sequence, products_id_seq
.
// Sequelize
await Product.drop();
// PostgreSQL equivalent
DROP TABLE products CASCADE;
Although this approach is extremely easy to implement, our PostgreSQL server needs to access the sequence to get its next value on every write, which comes at a latency cost. This is particularly bad in distributed deployments. YugabyteDB sets a default sequence cache size of 100. I’ll outline why this is so important below.
Now that we have the basics out of the way, let's try to speed things up. As we all know, "cache is king."
2. Sequence-Caching
Although the autoIncrement
flag in Sequelize model definitions totally eliminates the need to interact with sequences directly, there are scenarios where you might consider doing so. For instance, what if you wanted to speed up writes by caching sequence values? Fear not, with a little extra effort, we can make this happen.
Sequelize doesn't have API support to make this happen, as noted on Github (https://github.com/sequelize/sequelize/issues/3555#issuecomment-1132630072), but there's a simple workaround. By utilizing the built-in literal
function, we are able to access a predefined sequence in our model.
const { literal, DataTypes } = require('sequelize');
const Product = sequelize.define("product", {
id: {
type: DataTypes.INTEGER,
primaryKey: true,
defaultValue: literal("nextval('custom_sequence')"),
},
});
sequelize.beforeSync(() => {
await sequelize.query('CREATE SEQUENCE IF NOT EXISTS custom_sequence CACHE 50');
});
await sequelize.sync();
That's not too bad. So, this is what changed:
We've created our own sequence, named custom_sequence
, which is used to set the default value for our product ID.
This sequence is created in the beforeSync hook, so it will be created before the products table and its CACHE
value has been set to 50.
The defaultValue
is set to the next value in our custom sequence.
Well, what about the cache? Sequences in PostgreSQL can optionally be supplied a CACHE
value upon creation, which allots a certain number of values to be stored in memory per session. With our cache set at 50, here's how that works.
//Database Session A
> SELECT nextval('custom_sequence');
1
> SELECT nextval('custom_sequence');
2
//Database Session B
> SELECT nextval('custom_sequence');
51
>
52
For an application with multiple database connections, such as one running microservices or multiple servers behind a load balancer, each connection will receive a set of cached values. No session will contain duplicate values in its cache, ensuring there are no collisions when inserting records. In fact, depending on how your database is configured, you might even find gaps in your sequenced id
column if a database connection fails and is restarted without using all of the values alloted in its cache. However, this generally isn't a problem, as we're only concerned with uniqueness.
So, what's the point? Speed. Speed is the point!
By caching values on our PostgreSQL backend and storing them in memory, we're able to retrieve the next value very quickly. In fact, YugabyteDB caches 100 sequence values by default, as opposed to the PostgreSQL default of 1. This allows the database to scale, without needing to repeatedly obtain the next sequence value from the master node on writes. Of course, caching comes with the drawback of an increased memory constraint on the PostgreSQL server.
Depending on your infrastructure, this could be a worthy optimization!
3. Client-Side Sequencing
Sequence-caching improves performance by caching values on our PostgreSQL backend. How could we use a sequence to cache values on our client instead?
Sequences in PostgreSQL have an additional parameter called INCREMENT BY
that can be used to achieve this.
// DB Initialization
const { literal, DataTypes } = require('sequelize');
const Product = sequelize.define("product", {
id: {
type: DataTypes.INTEGER,
primaryKey: true
},
});
sequelize.beforeSync(() => {
await sequelize.query('CREATE SEQUENCE IF NOT EXISTS custom_sequence INCREMENT BY 50');
});
await sequelize.sync();
// Caller
let startVal = await sequelize.query("SELECT nextval('custom_sequence')");
let limit = startVal + 50;
if (startVal >= limit) {
startVal = await sequelize.query("SELECT nextval('custom_sequence')");
limit = startVal + 50;
}
await Product.create({id: startVal, title: "iPad Pro"})
startVal += 1;
Here, we're utilizing our custom sequence in a slightly different way. No default value is supplied to our model definition. Instead, we're using this sequence to set unique values client-side, by looping through the values in the increment range. When we've exhausted all of the values in this range, we make another call to our database to get the next value in our sequence to "refresh" our range.
Here's an example:
// Database Session A
> SELECT nextval('custom_sequence');
1
*
inserts 50 records
// id 1
// id 2
...
// id 50
*
> SELECT nextval('custom_sequence');
151
// Database Session B
> SELECT nextval('custom_sequence');
51
* inserts 50 records before Session A has used all numbers in its range *
> SELECT nextval('custom_sequence');
101
Database Session A connects and receives the first value in the sequence. Database Session B connects and receives the value of 51 because we've set our INCREMENT BY
value to 50
. Like our auto-incrementing solutions, we can ensure that there are no ID collisions by referencing our PostgreSQL sequence to determine the start value for our range.
What problems might arise from this solution? Well, it's possible that a database administrator could choose to increase or decrease the INCREMENT BY
value for a particular sequence, without application developers being notified of this change. This would break application logic.
How can we benefit from client-side sequencing? If you have a lot of available memory on your application server nodes, this could be a potential performance benefit over sequence-caching on database nodes.
In fact, you might be wondering if it’s possible to utilize a cache on the client and server in the same implementation. The short answer is YES. By creating a sequence with CACHE
and INCREMENT BY
values, we benefit from a server-side cache of our sequence values and a client-side cache for the next value in our range. This performance optimization provides the best of both worlds if memory constraints are not of primary concern.
Enough with the sequences already! Let's move on to unique identifiers.
4. UUID-Generation
We've covered three ways to generate sequential, integer-based IDs. Another data type, the Universally Unique Identifier (UUID), removes the need for sequences entirely.
A UUID is a 128-bit identifier, which comes with the guarantee of uniqueness due to the incredibly small probability that the same ID would be generated twice.
PostgreSQL comes with an extension called pgcrypto (also supported by YugabyteDB), which can be installed to generate UUIDs with the gen_random_uuid function. This function generates a UUID value for a database column, much the same that nextval
is used with sequences.
Additionally, Node.js has several packages which generate UUIDs, such as, you guessed it, uuid.
// Sequelize
const { literal, DataTypes } = require('sequelize');
const Product = sequelize.define(
"product",
{
id: {
type: DataTypes.UUID,
defaultValue: literal('gen_random_uuid()')
primaryKey: true,
},
title: {
type: DataTypes.STRING,
}
}
);
sequelize.beforeSync(() => {
await sequelize.query('CREATE EXTENSION IF NOT EXISTS "pgcrypto"');
});
// PostreSQL Equivalent
CREATE TABLE products
(
id UUID NOT NULL DEFAULT gen_random_uuid(),
title VARCHAR(255)
);
This allows us to generate a UUID client-side, with a server-side default, if required.
A UUID-based approach brings unique benefits with the random nature of the data type being helpful with certain data migrations. This is also helpful for API security, as the unique identifier is in no way tied to the information being stored.
Additionally, the ability to generate an ID client side without managing state is helpful in a distributed deployment, where network latencies play a big role in application performance.
For example, in a geo-partioned YugabyteDB cluster, connections are made to the nearest database node to serve low-latency reads. However, on writes, this node must forward the request to the primary node in the cluster (which could reside in another region of the world) to determine the next sequence value. The use of UUIDs eliminates this traffic, providing a performance boost.
So, what's the downside? Well, the topic of UUIDs is somewhat polarizing. One obvious downside would be the storage size of a UUID relative to an integer, 16 bytes as opposed to 4 bytes for an INTEGER
and 8 for a BIGINT
. UUIDs also take some time to generate, which is a performance consideration.
Some of the concerns regarding using UUIDs as primary keys are illustrated in this post and are discussed further with regards to YugabyteDB in this thread.
You can read more about the tradeoffs between integer and UUID based IDs here.
Get Building
Ultimately, there are many factors to consider when choosing how to generate your database IDs. You might choose to use auto-incrementing IDs for a table with infrequent writes, or one that doesn't require low-latency writes. Another table, spread across multiple geographies in a multi-node deployment, might benefit from using UUIDs. There's only one way to find out. Get out there and write some code!
If you're interested in using an always-free, PostgreSQL-compatible database node, give YugabyteDB Managed a try.