Paginate SQL results

Tom Nijhof - Jan 19 - - Dev Community

Cover image by Pexels from Pixabay

A common use case within SQL is showing your data in small chunks. This is done, for example, by showing everything in multiple pages. If you try to download all of the millions of products on Amazon at once, you are in for a bad time.
If our products are stored in SQL the best thing is to limit the results from our SQL query. Otherwise, we will pass on more data through our app than needed.

In this blog, we will learn to use the keywords ORDER BY, LIMIT, and OFFSET. I already assume you know about SELECT and FROM.
Bones keyword at the end, EXPLAIN.

An example of caffeine critics

First page

We can add a limit to our query so we only have a certain amount of rows. Adding a LIMIT 5 to select will return only the first 5 elements. If we also add ORDER BY [column name] we can control which 5 elements are returned. Below is an example of a query that selects the first 5 drinks, these 5 drinks are the first alphabetically.

    SELECT name
    FROM drinks
    ORDER BY name
    LIMIT 5s
Enter fullscreen mode Exit fullscreen mode

result:

    "Afternoon Melange"
    "Allegro cafeïnevrij"
    "Ambootia Groen"
    "Ambootia Herfstpluk"
    "Ambootia Regenpluk"
Enter fullscreen mode Exit fullscreen mode

Next pages

While having a limit will reduce the amount of data we download, we might also want to see more than the first 5 drinks. For this we can add OFFSET. It will let us skip a number of drinks before collecting. Below I set the offset to 1, meaning we skip 1 drink and then collect the next 5. In the results we can see Afternoon Melange is skipped and Ambootia Zomerpluk is now added.
With this and a little bit of math we can get the drinks of exactly the page we are looking at.

    SELECT name
    FROM drinks
    ORDER BY name
    OFFSET 1
    LIMIT 5
Enter fullscreen mode Exit fullscreen mode

result:

    "Allegro cafeïnevrij"
    "Ambootia Groen"
    "Ambootia Herfstpluk"
    "Ambootia Regenpluk"
    "Ambootia Zomerpluk"
Enter fullscreen mode Exit fullscreen mode

Optimize with an index

Indexing the name column in the drinks table can significantly improve the performance of our queries.

Why does an index help?

Here’s why indexing the name column is beneficial for this query:

  1. Sorting Optimization: Without an index, when you execute the query, the database will have to scan the entire drinks table, sort the rows by the name column, and then apply the LIMIT to retrieve the top 5 rows. This can be slow and resource-intensive, especially if the table is large.

  2. Reduced Disk and CPU Usage: Sorting a large table can be resource-intensive, as it requires reading and writing data to and from disk, and it consumes CPU cycles. An index allows the database to minimize the amount of data it needs to read and sort, reducing both disk and CPU usage.

  3. Faster Query Execution: With an index on the name column, the query execution becomes much faster, especially when retrieving a small number of rows (in this case, the top 5). The database can quickly access and return the results, resulting in a better query performance.

  4. Consistent Performance: The performance improvement is consistent, regardless of the size of the drinks table. With an index, the query’s execution time remains relatively constant*, whereas without an index, the execution time increases with the size of the table.

  • the time will still increase if the table gets bigger, but way less so. To make the query twice as slow you need to increase a table of 1,000 drinks to a table of 1,000,000.

However, it’s important to note that while indexing improves the performance of retrieval and sorting operations, it can slightly slow down data modification operations (e.g., INSERT, UPDATE, DELETE) because the index needs to be maintained. Therefore, it’s important to consider the trade-off between read and write performance when deciding to create an index. In this case, if the table is primarily used for read-heavy operations like the query you provided, indexing the name column is a good choice.

How to create an index

To create an index on a column in an SQL table (in this case, a table named “drinks”), you can use the CREATE INDEX statement. The specific syntax may vary slightly depending on the SQL database system you are using (e.g., PostgreSQL, MySQL, SQL Server, Oracle). Here’s a general example of how to create an index on a column in SQL:

    CREATE INDEX index_name ON table_name (column_name);
Enter fullscreen mode Exit fullscreen mode

Let’s break down this statement:

  • index_name: This is a user-defined name for the index. You should choose a meaningful and unique name for the index within the schema. This name is used to refer to the index in your queries and for management purposes.

  • table_name: Replace this with the name of your “drinks” table, where you want to create the index.

  • column_name: Replace this with the name of the column in the “drinks” table for which you want to create an index.

Measure the difference

For a bonus keyword, let me introduce EXPLAIN. It will take a full query and give you a measurement of its cost. We try to get the cost as low as possible. Below I will have the same query explained twice, once ordered by an indexed column (name), and once by a not-indexed column (description).

The cost is arbitrary, it does not relate directly to time, CPU usage, number of operations, or anything. The cost is also given in a range (0.27 to 0.69 for the index query) because some queries can vary more than others in cost. We can see we go from a cost of 20 to a cost below 0 by just indexing!
So make sure to use an index right.

Query with an order on an indexed column

    EXPLAIN SELECT name
    FROM drinks
    ORDER BY name
    LIMIT 5s

    Limit (cost=0.27..0.69 rows=5 width=24)
Enter fullscreen mode Exit fullscreen mode

Query with an order on a NOT indexed column

    EXPLAIN SELECT name
    FROM drinks
    ORDER BY description
    LIMIT 5

    Limit (cost=20.16..20.17 rows=5 width=136)
Enter fullscreen mode Exit fullscreen mode

Conclusion

In this blog, we learned how to manage and display data in smaller chunks using SQL. We explored the use of LIMIT and OFFSET with ORDER BY for efficient pagination.

We also discussed the significance of indexing, which can greatly improve query performance, especially when combined with ORDER BY and LIMIT. However, indexing might affect data modification operations.

Finally, we introduced the EXPLAIN keyword to measure query cost. By using these techniques, you can better handle large datasets and enhance your database-driven applications.

. . . . . . . . . . . . . . . . . . . . . . . .