Finding Skipped Reference Numbers in DB

Harrison E. Tanojo - Feb 15 - - Dev Community

Intro

For each purchase transaction, a reference number is created.
For example: the reference number is PT-STUR-2025-1.
Where as the structure:

  • PT: is the transaction type
  • STUR: is the code based on the business domain
  • 2025: the year when the transaction is made
  • 1: the unique generated number, will be auto incremented for each transaction created.

So ideally, will be like this: PT-STUR-2025-1, PT-STUR-2025-2, PT-STUR-2025-3 and so on...

Problem

When the server already generated the unique number, and there is an error in the creation, the reference number will be skipped.

I was tasked to tell the users what are the missing reference number.

Solution

Suppose this is the purchase_transaction table
Image description

By eyeballing and ordered the reference number one by one, it is possible to find, but it will take time.

Here is the steps to get the solution:

Step 1. Find the possible patterns

SELECT REGEXP_SUBSTR(reference_no, '^(.*?)-(.*?)-(.*?)-(.*?)') as 'pattern', count(*)
FROM purchase_transaction pt 
GROUP BY pattern
ORDER BY id asc;
Enter fullscreen mode Exit fullscreen mode

Image description
Assuming that the reference number is always have 3 dash '-', now can try to find the occurences of the pattern

Step 2. Get the next unique generated number

SELECT reference_no, 
REGEXP_SUBSTR(reference_no, '([0-9]+)$') as 'current_ref_no', 
LEAD(REGEXP_SUBSTR(reference_no, '([0-9]+)$')) OVER (order by id asc) as 'next_ref_no'
FROM purchase_transaction ai 
WHERE reference_no like 'PT-STUR-2025-%'
order by id asc;
Enter fullscreen mode Exit fullscreen mode

"REGEXP_SUBSTR(reference_no, '([0-9]+)$'))" is needed since just need to find the number at the most postfix of the reference number, don't want to get the year.

The key to get the next reference number is at the use of LEAD...OVER:

  • LEAD(REGEXP_SUBSTR(reference_no, '([0-9]+)$')) OVER (order by id asc) as 'next_ref_no' It tells the SQL to see the next row's data. Inside the OVER(), need to order by the id as to find the sequential row.

Hence, this the output:

Image description

Step 3. Find the starting reference number that is skipping

If there is no skipped number, that means the difference between the current number with the next one is 1. But if there is skipped, the difference will be more than one. Hence, this is the formula to find:
next_ref_no - current_ref_no > 1

The sql:

WITH FIND_THE_NEXT as (
    SELECT id, reference_no, 
    REGEXP_SUBSTR(reference_no, '([0-9]+)$') as 'current_ref_no', 
    LEAD(REGEXP_SUBSTR(reference_no, '([0-9]+)$')) OVER (order by id asc) as 'next_ref_no'
    FROM purchase_transaction ai 
    WHERE reference_no like 'PT-STUR-2025-%'
    order by id asc
)select reference_no, next_ref_no, 
case when next_ref_no - current_ref_no > 1 THEN 
        concat('missing ', (next_ref_no - current_ref_no - 1), ' until ', (next_ref_no - 1))  
    ELSE 
        'ok' 
END as 'is_missing'
from find_the_next
order by id asc;
Enter fullscreen mode Exit fullscreen mode

This is the output:
Image description

From the above, it is known what are the known missing reference numbers.

.