SQL Subqueries in action

Chidiebere Ogujeiofor - Dec 13 '19 - - Dev Community

I have recently been helping out my brother on a very exciting project that requires managing student data and we recently implemented some features using subqueries and CTEs.

Thus, I decided to write this post about subqueries and another about CTEs.

You can generate the data I used in this blogpost from this GitHub Gist.

SQL challenges are all about answering questions. To this effect, each example in the blog post asks a question and then answers it using an SQL statement

Subqueries

A subquery is essentially a SELECT statement that is within another SELECT statement. It may be placed in SELECT, WHERE or FROM clause of their parent SELECT statement.

Subqueries in SELECT clause

A subquery in the SELECT clause is useful when we want to add a column to our data that cannot be easily gotten from the ResultSet. These subqueries must return a single value.

For example, consider the following ResultSet for the student with an id of 9.

SELECT course, score 
    FROM results 
WHERE student_id=9

ORDER BY course;

Enter fullscreen mode Exit fullscreen mode
course score
Arts 80
Biology 84
Chemistry 81
Commerce 82
Computer Science 93
Economics 75
English Language 78
History 84
Literature 79
Mathematics 80
Physical Education 81
Physics 85
Statistics 85

To get the student's average score per course:

SELECT  ROUND(AVG(score),2) AS average                                                                                    FROM results                                                                                                                     WHERE student_id=9;

Enter fullscreen mode Exit fullscreen mode

We get an average of 82.08.

Now say we want to answer the question:

How well did the student with ID of 9 do in each course compared to his/her overall average?

The following query gives us an answer:

SELECT course, score, 
    (
        SELECT ROUND(AVG(score),2) FROM results
        WHERE student_id =9

    )AS student_avg



FROM results

WHERE student_id=9
ORDER BY course
;

Enter fullscreen mode Exit fullscreen mode
course score student_avg
Arts 80 82.08
Biology 84 82.08
Chemistry 81 82.08
Commerce 82 82.08
Computer Science 93 82.08
Economics 75 82.08
English Language 78 82.08
History 84 82.08
Literature 79 82.08
Mathematics 80 82.08
Physical Education 81 82.08
Physics 85 82.08
Statistics 85 82.08

We could also decide to perform a mathematical operation such that we see the difference between the student's score in each course and the student's overall average as follows:

SELECT course, score, 
    ROUND(score - (
        SELECT AVG(score) FROM results
        WHERE student_id =9

    ),2) AS difference_between_score_and_student_average



FROM results

WHERE student_id=9

ORDER BY difference_between_score_and_student_average DESC;

Enter fullscreen mode Exit fullscreen mode
course score difference_between_score_and_student_average
Computer Science 93 10.92
Statistics 85 2.92
Physics 85 2.92
Biology 84 1.92
History 84 1.92
Commerce 82 -0.08
Chemistry 81 -1.08
Physical Education 81 -1.08
Arts 80 -2.08
Mathematics 80 -2.08
Literature 79 -3.08
English Language 78 -4.08
Economics 75 -7.08
Some ground rules

When writing subqueries in the SELECT clause it is important to note the following:

  • The subquery must return a single value. If it returns a table there would be no way of putting in one column thus an error is thrown
  • The WHERE clause of the outer query does not affect the subquery in any way. For instance, if we had omitted the student_id filter in our subquery, that is
SELECT course, score, 
    (
        SELECT ROUND(AVG(score),2) 
                    FROM results

    )AS student_avg



FROM results

WHERE student_id=9

ORDER BY course;

Enter fullscreen mode Exit fullscreen mode

We would have gotten totally different results:

course score student_avg
Arts 80 65.46
Biology 84 65.46
Chemistry 81 65.46
Commerce 82 65.46
Computer Science 93 65.46
Economics 75 65.46
English Language 78 65.46
History 84 65.46
Literature 79 65.46
Mathematics 80 65.46
Physical Education 81 65.46
Physics 85 65.46
Statistics 85 65.46

So the WHERE clause in the outer query does not affect the subquery.

Subqueries in WHERE clause

Adding a subquery in the WHERE clause is useful when we want to filter a ResultSet by some value from another query.

For example, to answer the following question:

In which courses did the student with an ID of 9 scores above his overall average? What was his score in those courses?

If we know that the student average is 82.08 and we can answer that question with the following query:

SELECT course, score 
    FROM results
WHERE student_id=9 AND score > 82.08

ORDER BY course;

Enter fullscreen mode Exit fullscreen mode

And get the following:

course score
Biology 84
Computer Science 93
History 84
Physics 85
Statistics 85

But if we don't know the average beforehand, we can use a subquery to calculate the average and use the value in the where clause immediately:

SELECT course, score 
    FROM results
WHERE student_id=9 AND score > (
  SELECT AVG(score) FROM results
  WHERE student_id=9
)

ORDER BY course;
Enter fullscreen mode Exit fullscreen mode

with the same results but a more reusable query.

When using subqueries in the WHERE clause, we could also return a list of values and filter using the IN operator. For example, we could retrieve courses taken in the first quarter of 2019:

SELECT course, score

FROM results

WHERE student_id=9 AND course IN (
    SELECT course
    Where course_end_date >= '2019-01-01' AND course_end_date <= '2019-03-31'
)
ORDER BY course
;
Enter fullscreen mode Exit fullscreen mode
course score
Biology 84
Economics 75
Physical Education 81
NB

Note that the above example is used to demonstrate using IN clause with a subquery. A more efficient way would have been to just do
WHERE student_id=9 AND course_end_date >= '2019-01-01' AND course_end_date <= '2019-03-31'

Some ground rules

When using subqueries in WHERE clause:

  • the subqueries would have to return a single column
  • if you want to compare a single value ensure that the subquery returns a single row/value.
  • if you want to compare a list of values, ensure that the subquery returns a single column.

Subqueries in FROM/JOIN clause

Let us say we want to answer the following question:

How did each student perform in Literature compared to the student's average on all the courses

One way we can answer this is by using a subquery to get each student's average in each course, that is:

SELECT student_id, ROUND(AVG(score),2) as student_avg
        FROM results 
        GROUP BY results.student_id
Enter fullscreen mode Exit fullscreen mode

and then join that to our results table like so:

SELECT 
    main_result.student_id, 
    main_result.course, 
    score, 
    student_av.student_avg

    FROM results AS main_result
    JOIN (
        SELECT student_id, ROUND(AVG(score),2) as student_avg
        FROM results 
        GROUP BY results.student_id

    ) as student_av
        ON student_av.student_id = main_result.student_id
WHERE course = 'Literature'

ORDER BY score;


Enter fullscreen mode Exit fullscreen mode
student_id course score student_avg
1 Literature 0 42.00
2 Literature 73 69.62
4 Literature 77 65.62
5 Literature 62 69.62
6 Literature 61 64.31
7 Literature 77 75.77
8 Literature 66 66.92
9 Literature 79 82.08
10 Literature 63 70.38
11 Literature 68 60.92
12 Literature 61 50.69
13 Literature 60 68.62
14 Literature 78 75.85
15 Literature 60 60.77
16 Literature 61 68.69
17 Literature 68 72.00
18 Literature 79 66.23
19 Literature 65 71.77
20 Literature 60 52.69
21 Literature 62 44.08
22 Literature 66 61.31
23 Literature 53 57.92
24 Literature 88 86.00

Now we can take it one step further by getting the difference between the student Overall average and his Literature score

SELECT 
    course,
    main_result.student_id, 
    score, 
    student_avg_sub.student_avg,

    score - student_avg_sub.student_avg AS score_diff
    FROM results AS main_result
    JOIN (
        SELECT student_id, ROUND(AVG(score),2) as student_avg
        FROM results 
        GROUP BY results.student_id

    ) as student_avg_sub
        ON student_avg_sub.student_id = main_result.student_id
WHERE course = 'Literature'

ORDER BY score_diff DESC;


Enter fullscreen mode Exit fullscreen mode
course student_id score student_avg score_diff
Literature 21 62 44.08 17.92
Literature 18 79 66.23 12.77
Literature 4 77 65.62 11.38
Literature 12 61 50.69 10.31
Literature 20 60 52.69 7.31
Literature 11 68 60.92 7.08
Literature 22 66 61.31 4.69
Literature 2 73 69.62 3.38
Literature 14 78 75.85 2.15
Literature 24 88 86.00 2.00
Literature 7 77 75.77 1.23
Literature 15 60 60.77 -0.77
Literature 8 66 66.92 -0.92
Literature 9 79 82.08 -3.08
Literature 6 61 64.31 -3.31
Literature 17 68 72.00 -4.00
Literature 23 53 57.92 -4.92
Literature 19 65 71.77 -6.77
Literature 10 63 70.38 -7.38
Literature 5 62 69.62 -7.62
Literature 16 61 68.69 -7.69
Literature 13 60 68.62 -8.62
Literature 1 0 42.00 -42.00

With this, we can see that student with the id of 21 exceeded his overall average the most. Thus we can conclude that this is one of his best courses.

Subqueries in all parts of the main query

Subqueries can also be used in more than one part of the main query. That is, they can be used in the WHERE or FROM or SELECT clauses or a combination of 2 or all.

For example, consider the question:

What was the average score of the student with an ID of 9 and in which courses did he score above this average?

We could run the following:

SELECT course, score, (
    SELECT ROUND(AVG(score),2) FROM results
    WHERE student_id=9
) AS average_score

FROM results

WHERE student_id = 9 AND score > (
    SELECT AVG(score) FROM results
    WHERE student_id=9
)


ORDER BY score;
Enter fullscreen mode Exit fullscreen mode
course score average_score
History 84 82.08
Biology 84 82.08
Physics 85 82.08
Statistics 85 82.08
Computer Science 93 82.08

Looking at the above ResultSet we can see the student's average score and all the courses where he performed above his average.

Other notes about subqueries

Here are some extra features of subqueries:

  • you can have a subquery within another subquery. This is known as a nested subquery.
  • subqueries can access data from the main query. This is known as a correlated subquery.

Disadvantages of Subqueries

All tools/techniques have their downside and subqueries are not excluded from this. Here are some downsides of using subqueries

  • subqueries are executed as separate SELECT statements thus it increases the amount of time our query takes to run. For this reason, it is best to use it in a few places as possible
  • subqueries are quite unreadable and can be difficult to maintain(well we could add a comment that says "don't touch the query below" 😀).
  • if we have two subqueries that do the same thing, they would be executed twice. For example, the query from the previous section actually computes the average of the student twice. This can be very inefficient in large tables. One way around this is to use CTEs

Conclusions

In this blog post, we have seen that:

  • subqueries are SELECT statements within another statement.
  • subqueries are useful for reshaping and filtering our data
  • we can use subqueries in the SELECT, FROM or WHERE clauses of our outer query or a combination of 2 or more.
  • subqueries are executed as separate SELECT statements. For this reason, multiple subqueries can re-compute the same value.
  • it is best to use subqueries sparingly as they can be quite unreadable and difficult to maintain

Subqueries are more like having a bottle in a bottle
Bottle in a Bottle

. . . . . . . . . . . . . . .