SQL, extract unique values of JSON format field from each group #eg42

WHAT TO KNOW - Sep 14 - - Dev Community

<!DOCTYPE html>





Extracting Unique JSON Values from SQL Groups

<br> body {<br> font-family: Arial, sans-serif;<br> line-height: 1.6;<br> margin: 0;<br> padding: 0;<br> }<br> h1, h2, h3 {<br> margin-bottom: 1rem;<br> }<br> pre {<br> background-color: #f5f5f5;<br> padding: 1rem;<br> overflow-x: auto;<br> }<br> code {<br> font-family: monospace;<br> background-color: #eee;<br> padding: 0.2rem;<br> }<br>



Extracting Unique JSON Values from SQL Groups



Introduction


In modern database systems, storing data in JSON format is becoming increasingly common. This allows for flexible and dynamic data structures, making it easier to represent complex entities and relationships. However, when you need to extract specific data from these JSON fields, especially unique values within groups, it can become a challenging task.

This article dives deep into the techniques and tools you can use to extract unique values from JSON format fields grouped by other columns in your SQL database.


Understanding the Challenge


Imagine you have a database table storing information about customers and their orders. Each order has a field called items which is a JSON array representing the items ordered. You want to find all the unique items ordered by each customer.
CREATE TABLE orders (
    customer_id INT,
    order_id INT,
    items JSON
);

INSERT INTO orders (customer_id, order_id, items) VALUES
(1, 101, '[{"item_id": 10, "name": "Laptop"}, {"item_id": 20, "name": "Mouse"}]'),
(1, 102, '[{"item_id": 30, "name": "Keyboard"}, {"item_id": 20, "name": "Mouse"}]'),
(2, 201, '[{"item_id": 40, "name": "Monitor"}, {"item_id": 20, "name": "Mouse"}]'),
(2, 202, '[{"item_id": 50, "name": "Webcam"}]');

In this example, we need to extract unique name values from the items JSON array for each customer_id.


Approaches for Extracting Unique JSON Values


Several approaches can be used to address this challenge, each with its strengths and weaknesses:

  1. Using JSON Functions (PostgreSQL, MySQL 8.0+)

Many modern databases provide built-in functions for working with JSON data. PostgreSQL and MySQL 8.0+ offer powerful JSON functions that can be leveraged to extract and manipulate JSON values.

PostgreSQL Example:

SELECT DISTINCT customer_id,
       json_array_elements_text(items) AS item_name
FROM orders, json_array_elements(items)
GROUP BY customer_id, item_name;

Explanation:

  • json_array_elements(items) extracts each JSON object within the items array.
  • json_array_elements_text(items) extracts the text value of the name field from each JSON object.
  • DISTINCT ensures we only get unique values.
  • GROUP BY customer_id, item_name groups the results by customer and item name.

MySQL 8.0+ Example:

SELECT DISTINCT customer_id,
       JSON_EXTRACT(items, '$[*].name') AS item_name
FROM orders;

Explanation:

  • JSON_EXTRACT(items, '$[*].name') extracts the name value from each object within the items array.
  • DISTINCT ensures we only get unique values.

Advantages:

  • Concise and efficient queries.
  • Utilizes built-in functionality for JSON manipulation.

Disadvantages:

  • Dependent on database version and support for JSON functions.

    1. Using Lateral Joins (PostgreSQL)

    PostgreSQL's LATERAL join allows you to correlate subqueries with the main query, making it possible to extract data from each JSON object separately.

Example:

SELECT DISTINCT customer_id, item_name
FROM orders o
CROSS JOIN LATERAL json_array_elements(o.items) AS item(item_data)
CROSS JOIN LATERAL json_to_recordset(item.item_data) AS t(item_id INT, name TEXT)
GROUP BY customer_id, item_name;

Explanation:

  • json_array_elements(o.items) extracts each JSON object from the items array.
  • json_to_recordset(item.item_data) converts the JSON object into a record set, allowing access to its fields.
  • DISTINCT ensures we only get unique values.
  • GROUP BY customer_id, item_name groups the results by customer and item name.

Advantages:

  • Flexible approach for complex JSON data.
  • Allows access to individual JSON fields.

Disadvantages:

  • More complex than using built-in JSON functions.
  • Requires a more advanced understanding of PostgreSQL syntax.

    1. Using Subqueries and Unnesting

    This approach involves using subqueries to extract individual JSON objects and then unnesting them to obtain the desired values.

Example (PostgreSQL):

SELECT DISTINCT customer_id, item_name
FROM orders
CROSS JOIN LATERAL unnest(items::jsonb #&gt;&gt; '{*}') AS item_name;

Explanation:

  • items::jsonb #&gt;&gt; '{*}' extracts all objects from the items array as a JSON array.
  • unnest() creates a row for each element in the JSON array.
  • DISTINCT ensures we only get unique values.

Example (MySQL):

SELECT DISTINCT customer_id, item_name
FROM orders
JOIN JSON_TABLE(
    items,
    "$[*]" COLUMNS (
        item_id INT PATH '$.item_id',
        name VARCHAR(255) PATH '$.name'
    )
) AS t ON 1=1
GROUP BY customer_id, item_name;

Explanation:

  • JSON_TABLE() converts the JSON array into a table with columns for each JSON object.
  • The PATH option specifies which field in the JSON object should be extracted into the column.
  • DISTINCT ensures we only get unique values.
  • GROUP BY customer_id, item_name groups the results by customer and item name.

Advantages:

  • Can be used in databases that don't have extensive JSON support.
  • Provides control over how data is extracted.

Disadvantages:

  • Can be less efficient for large datasets.
  • Requires careful handling of data types and conversions.

    Choosing the Right Approach

    The most suitable approach depends on the following factors:

  • Database version and JSON support: Consider the JSON functions available in your database.

  • Data complexity: If you have simple JSON structures, built-in functions might be sufficient. For more complex structures, consider using lateral joins or subqueries.

  • Performance requirements: Consider the size of your dataset and potential performance implications of each approach.

    Illustrative Example

    Let's use the example from the introduction to demonstrate how to extract unique items ordered by each customer using the PostgreSQL approach:

-- Create the sample table
CREATE TABLE orders (
    customer_id INT,
    order_id INT,
    items JSON
);

-- Insert sample data
INSERT INTO orders (customer_id, order_id, items) VALUES
(1, 101, '[{"item_id": 10, "name": "Laptop"}, {"item_id": 20, "name": "Mouse"}]'),
(1, 102, '[{"item_id": 30, "name": "Keyboard"}, {"item_id": 20, "name": "Mouse"}]'),
(2, 201, '[{"item_id": 40, "name": "Monitor"}, {"item_id": 20, "name": "Mouse"}]'),
(2, 202, '[{"item_id": 50, "name": "Webcam"}]');

-- Extract unique items ordered by each customer
SELECT DISTINCT customer_id, item_name
FROM orders o
CROSS JOIN LATERAL json_array_elements(o.items) AS item(item_data)
CROSS JOIN LATERAL json_to_recordset(item.item_data) AS t(item_id INT, name TEXT)
GROUP BY customer_id, item_name;

This query will return the following result:

customer_id item_name
1 Laptop
1 Mouse
1 Keyboard
2 Monitor
2 Mouse
2 Webcam

This output shows the unique items ordered by each customer.


Conclusion


Extracting unique values from JSON fields grouped by other columns is a common task when working with JSON data in SQL databases. Understanding the various approaches, including JSON functions, lateral joins, and subqueries, allows you to choose the most efficient and appropriate method for your specific needs.

Remember to consider the factors mentioned in the "Choosing the Right Approach" section to make an informed decision. By implementing these techniques, you can effectively extract unique values from JSON fields within SQL groups, enabling you to gain valuable insights from your data.
JSON Logo
PostgreSQL Logo
MySQL Logo

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .