Advanced Grouping Query in GBase 8s: Introduction to GROUP BY ROLLUP

Cong Li - Aug 28 - - Dev Community

In data analysis and business intelligence, it's crucial to aggregate data quickly and accurately. GBase provides advanced grouping query capabilities, including the GROUP BY ROLLUP feature, to support multi-dimensional data aggregation and analysis. This article introduces the basic concepts, syntax structure, application examples, and practical use cases of GROUP BY ROLLUP to help you understand and utilize this feature effectively.

1. Basic Functionality of GROUP BY ROLLUP

GROUP BY ROLLUP is an advanced grouping query feature in GBase 8s that allows for grouping and aggregation based on specified columns, supporting different levels of summary calculations.

Basic Syntax:

SELECT column1, column2, ...
FROM table_name
GROUP BY ROLLUP(column1, column2, ...);
Enter fullscreen mode Exit fullscreen mode

When used, the system first groups by the specified columns and then performs a UNION ALL on the multiple result sets.

  • Number of Groups: Number of columns n + 1.
  • Grouping Method: Start by grouping by all specified columns, then reduce one column from right to left sequentially, until no columns are used for grouping. Columns not involved in grouping will return NULL in the result set, and the results of these groups will be returned.

For example, if the ROLLUP grouping columns are (A, B, C), it will first group by (A, B, C), then by (A, B), then by (A), and finally, by no columns at all. The query results are the UNION ALL of each group’s result set.

Summary Calculation:

At each grouping level, aggregate functions such as SUM, AVG, MAX, etc., can be used to calculate summary values. For instance, the SUM function can calculate the total for each group.

Result Set:

The result set returned by GROUP BY ROLLUP includes the union of all grouped data without removing duplicates. This means that statistics for each group are listed separately.

Auxiliary Functions:

  • GROUPING() Function: Used to determine whether a specific group is a summary row, returning 1 for summary rows and 0 otherwise.
  • GROUPING_ID() Function: Identifies the grouping level, taking one or more columns as parameters and returning a binary value representing the grouping level of those columns.

2. Example Analysis

Suppose there is a sales data table sales with fields year (year), region (region), and sales (sales amount).

Note: Example database version: GBase8sV8.8_3.5.1

Creating the sales table:

CREATE TABLE sales (
    id INT PRIMARY KEY,
    year INT,
    region VARCHAR(50),
    sales DECIMAL(10, 2)
);
Enter fullscreen mode Exit fullscreen mode

Inserting data:

INSERT INTO sales (id, year, region, sales) VALUES (1, 2020, 'North China', 10000.00);
INSERT INTO sales (id, year, region, sales) VALUES (2, 2020, 'South China', 15000.00);
INSERT INTO sales (id, year, region, sales) VALUES (3, 2021, 'North China', 12000.00);
INSERT INTO sales (id, year, region, sales) VALUES (4, 2021, 'South China', 18000.00);
INSERT INTO sales (id, year, region, sales) VALUES (5, 2021, 'East China', 20000.00);
Enter fullscreen mode Exit fullscreen mode

Example Query Using GROUP BY ROLLUP:

SELECT year, region, SUM(sales) AS total_sales
FROM sales
GROUP BY ROLLUP(year, region);
Enter fullscreen mode Exit fullscreen mode

Example Results:

YEAR         2021
REGION       South China
TOTAL_SALES  18000.00

YEAR         2020
REGION       South China
TOTAL_SALES  15000.00

YEAR         2021
REGION       North China
TOTAL_SALES  12000.00

YEAR         2021
REGION       East China
TOTAL_SALES  20000.00

YEAR         2020
REGION       North China
TOTAL_SALES  10000.00

YEAR         2020
REGION       
TOTAL_SALES  25000.00

YEAR         2021
REGION       
TOTAL_SALES  50000.00

YEAR         
REGION       
TOTAL_SALES  75000.00

8 row(s) retrieved.
Enter fullscreen mode Exit fullscreen mode

Query Results:

  • Sales by year and region.
  • Total sales by year (with NULL for the region).
  • Total sales for all years and regions (NULL for both year and region).

Using the GROUPING_ID() Function:

SELECT
    year,
    region,
    SUM(sales) AS total_sales,
    DECODE(GROUPING_ID(year, region), 1, 'Yearly Total Sales', 0, 'Yearly Regional Subtotal', 3, 'Global Total Sales')
FROM
    sales
GROUP BY
    ROLLUP(year, region);
Enter fullscreen mode Exit fullscreen mode

Example Results:

YEAR          2021
REGION        South China
TOTAL_SALES   18000.00
(EXPRESSION)  Yearly Regional Subtotal

YEAR          2020
REGION        South China
TOTAL_SALES   15000.00
(EXPRESSION)  Yearly Regional Subtotal

YEAR          2021
REGION        North China
TOTAL_SALES   12000.00
(EXPRESSION)  Yearly Regional Subtotal

YEAR          2021
REGION        East China
TOTAL_SALES   20000.00
(EXPRESSION)  Yearly Regional Subtotal

YEAR          2020
REGION        North China
TOTAL_SALES   10000.00
(EXPRESSION)  Yearly Regional Subtotal

YEAR          2020
REGION        
TOTAL_SALES   25000.00
(EXPRESSION)  Yearly Total Sales

YEAR          2021
REGION        
TOTAL_SALES   50000.00
(EXPRESSION)  Yearly Total Sales

YEAR          
REGION        
TOTAL_SALES   75000.00
(EXPRESSION)  Global Total Sales

8 row(s) retrieved.
Enter fullscreen mode Exit fullscreen mode

3. Use Cases

GROUP BY ROLLUP is suitable for various data analysis scenarios, including but not limited to:

  • Sales Analysis: Generate aggregated sales reports based on different sales regions, product categories, and time dimensions, helping companies understand sales performance and formulate strategies.
  • Financial Reporting: Generate aggregated financial reports based on different accounts, departments, and time dimensions, assisting companies in financial analysis and decision-making.
  • User Statistics: Create aggregated user reports based on different geographic locations, age groups, and genders, helping businesses understand user characteristics and behaviors.

As an advanced feature in GBase 8s, GROUP BY ROLLUP enhances the flexibility and convenience of data aggregation. This detailed introduction and example demonstration aim to help you better grasp this feature, enabling you to achieve more in your data analysis tasks. Thank you for reading!

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .