Database optimizing with Hugh datasets with MySQL.

vimuth - Sep 5 - - Dev Community

Efficient database performance is crucial for any application, and optimizing MySQL queries plays a key role in achieving that. As data grows, poorly written queries can cause significant slowdowns, leading to longer response times and higher resource consumption. By understanding how MySQL executes queries and applying optimization techniques like indexing, query restructuring, and proper data types, you can dramatically improve performance. Whether you’re handling small datasets or scaling to millions of records, optimized queries ensure smoother and faster interactions with your database, ultimately improving the user experience and reducing server load.

My database contains an employees table with approximately 7,852,118 rows.

I executed the following query:

SELECT * 
FROM employees
WHERE email = 'sbrakus@example.org'
Enter fullscreen mode Exit fullscreen mode

The query took roughly 17 seconds to execute, though this duration may vary depending on server conditions.

Image description

Optimizing Queries by Selecting Only Relevant Fields

Selecting only the necessary fields, rather than retrieving all columns, can significantly reduce query execution time. Consider the following example:

SELECT email 
FROM employees
WHERE email = 'sbrakus@example.org' 
Enter fullscreen mode Exit fullscreen mode

Image description

Improving Performance with Indexing

One of the most effective ways to speed up queries is by using indexing. Let's start by creating an index on the email column:

CREATE INDEX idx_employees_email ON employees(email);
Enter fullscreen mode Exit fullscreen mode

Now, when we run the same query:

SELECT email 
FROM employees
WHERE email = 'sbrakus@example.org';
Enter fullscreen mode Exit fullscreen mode

The query should execute much faster, as the index allows MySQL to quickly locate the relevant records.

Image description

You’ll notice that the query now executes in just 0.020 seconds.

Addressing Indexing Limitations and Query Optimization

While indexing significantly improved the performance of queries on the email field, it doesn't impact queries that filter on non-indexed fields. For example:

SELECT email 
FROM employees
WHERE last_name = 'Halvorson';
Enter fullscreen mode Exit fullscreen mode

Image description

This query will still be slow because last_name is not indexed. Similarly, fetching all records from the employees table:

SELECT * 
FROM employees;
Enter fullscreen mode Exit fullscreen mode

Can be very time-consuming, especially with a large dataset.

To manage large result sets, consider using limits to restrict the number of rows returned:

SELECT * 
FROM employees 
LIMIT 1000 OFFSET 0;
Enter fullscreen mode Exit fullscreen mode

This approach can help to retrieve data in smaller, more manageable chunks, reducing the load on your server and improving performance.

Image description

Understanding the Limitations of Query Limiting

Using the LIMIT clause to restrict the number of records returned can be effective in many scenarios, but it may not always yield the desired results. For instance:

SELECT * 
FROM employees 
WHERE last_name = 'Halvorson'
LIMIT 1000 OFFSET 0;
Enter fullscreen mode Exit fullscreen mode

In this case, the query took 0.348 seconds to complete. The reduced execution time was due to the fact that 'Halvorson' is a relatively common last name. The query was able to stop early, having found 1,000 matching records before scanning the entire table. However, this approach may not be as effective in cases where the filtered criteria are less common or where data distribution is uneven.

Image description

Consider this query:

SELECT * 
FROM employees 
WHERE last_name = 'david'
LIMIT 1000 OFFSET 0;
Enter fullscreen mode Exit fullscreen mode

Since 'david' does not exist in the employees table, the query had to scan all 7,852,118 records, resulting in a significantly longer execution time.

Image description

Adding an Index

Let's create an index on the last_name field to improve query performance:

CREATE INDEX idx_last_name ON employees(last_name);
Enter fullscreen mode Exit fullscreen mode

With the new index in place, executing the following query:

SELECT * 
FROM employees 
WHERE last_name = 'david'
LIMIT 1000 OFFSET 0;
Enter fullscreen mode Exit fullscreen mode

Runs significantly faster. The index allows MySQL to quickly locate relevant records, reducing the time needed to scan the entire table.

Image description

However, this B-tree index has its limitations. While it performs well with queries where the search string begins with a specific prefix, such as:

SELECT * FROM employees
WHERE last_name LIKE 'david%';
Enter fullscreen mode Exit fullscreen mode

In this case, MySQL can efficiently use the index to find last_name values starting with "david."

Image description

But for LIKE queries with different wildcards, such as:

SELECT * FROM employees
WHERE last_name LIKE '%david';

SELECT * FROM employees
WHERE last_name LIKE '%david%';
Enter fullscreen mode Exit fullscreen mode

Image description

The index may not be used effectively, resulting in slower query performance.

In these cases, MySQL indexes may not be sufficient. I will provide a tutorial on PostgreSQL indexing techniques to address these limitations.

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .