IN
and EXISTS
often produce the same result, but when negated and dealing with NULL
values, they behave differently.
Let's see an example, assume we have these two tables:
orders
order_id | customer_id | amount |
---|---|---|
1 | 100 | 50.00 |
2 | 101 | 75.00 |
3 | 102 | 30.00 |
4 | NULL | 20.00 |
customers
customer_id | name |
---|---|
100 | Bob |
101 | Alice |
102 | Martha |
NULL | John |
Ignore the fact that a well formed customers table would need to have the id always specified and ideally as a constraint. This is just an example.
Now, let’s say we want to find orders where the customer_id is not present in the customers table.
Using NOT EXISTS
we would write something like:
SELECT *
FROM orders o
WHERE NOT EXISTS (SELECT 1 FROM customers c WHERE c.customer_id = o.customer_id);
The output, would probably be what we expect:
order_id | customer_id | amount |
---|---|---|
4 | NULL | 20.00 |
Using NOT IN
we would write something like:
SELECT *
FROM orders
WHERE customer_id NOT IN (SELECT customer_id FROM customers);
The output here will be empty!
When we use NOT IN
, SQL checks each value in the orders table against the list of values returned by the subquery. If any value in the subquery result is NULL
, the entire NOT IN
comparison will result in NULL
for each row in the orders table. This is because any comparison with NULL
yields NULL
and NOT IN
needs all comparisons to be TRUE
for a row to be included in the result.
This is not a problem with NOT EXISTS
, because the NOT EXISTS
clause checks for the non-existence of rows that meet the criteria specified in the subquery. It does not perform direct comparisons with NULL
in the same way NOT IN
does. Instead, it simply checks if there are any rows that match the condition. If no such rows exist, the condition is true.