πŸͺ„ DuckDB sql hack : get things SORTED w/ constraint CHECK

adriens - Apr 4 - - Dev Community

πŸͺ Intro

Ever wanted to get that kind on constraint on a table :

CREATE TABLE sensor_data(
   ts TIMESTAMP,
   measurement INTEGER,
   ...,
   SORTED(ts)
);
Enter fullscreen mode Exit fullscreen mode

Well, it turns out this topic has a dedicated discussion:

Image description

For now it's not implemented...yet. But we have a lot of use cases out there.

Let's see how to achieve this in pure sql... and why it would be so useful.

🍿 Demo

πŸ€” (A bit of) Context

We are currently using duckdb on GitHub Actions (see opt-nc/setup-duckdb-action) as it's a very convenient and efficient way to check data quality with sql as part of our CI... with very very few efforts, see below:

☝️ Recently, we also felt the need to keep a csv file sorted according to a given column to keep it as clean as possible and letting people make Pull Requests... and, most important : delegating the CI the role to explain the end user why the data he wants to put cannot be merged.

We did not want any human in the loop to moderate contributors and explain how the data should be provided to get the PR merged. In particular in the context of we use GH auto-merge to

"[...] increase development velocity by enabling auto-merge for a pull request so that the pull request will merge automatically when all merge requirements are met."

πŸ‘‰ This productivity hack is all about achieving that in pure sql... and take profit of:

ℹ️ Notice that this hack can be applied on very large volumes of data.

πŸͺ„ sql tricks

-----------------------------------------------------------
--
-- Check if a table column is sorted w/ integrity check
--
-----------------------------------------------------------

-- Create a table with a column that is not sorted
-- Feed some random stuff
-- The real target table
create or replace table demo_sort (text varchar);

insert into demo_sort values ('DuckDb');
insert into demo_sort values ('duckdb');
insert into demo_sort values ('Duckdb');
insert into demo_sort values ('DUCKDB');
insert into demo_sort values ('duckDB');
insert into demo_sort values ('DUCKdb');
insert into demo_sort values ('DuckDB');

-- Check the resulting table
from demo_sort;


-- Prepare test environment
CREATE SEQUENCE seq_original START 1;
CREATE SEQUENCE seq_sorted START 1;

create or replace temp table orig_table as
    select nextval('seq_original') as index,
text from demo_sort;

create or replace temp table sorted_table as
    select nextval('seq_sorted') as index,
    text
    from (select text from demo_sort order by text);

-- Check the resulting tables
from orig_table;
from sorted_table;

-- Create the table that compares the sorted and original tables columns
create or replace temp table test_table(orig_text varchar,
                                    orig_index integer,
                                    sorted_index integer
                                    -- the magic part XD
                                    check(orig_index = sorted_index)
                                    );
-- Populate the comparison table
insert into test_table
    select 
        orig_table.text as orig_text,
        orig_table.index as orig_index,
        sorted_table.index as sorted_index,
    from
        orig_table,
        sorted_table
    where
        orig_table.text = sorted_table.text
    order by orig_table.index;

-- Enjoy the resulting "Constraint Error: CHECK constraint failed: test_table"
Enter fullscreen mode Exit fullscreen mode

πŸ”– Resources

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .