IN and EXISTS often produce the same result, but when negated and dealing with NULL values, they behave differently.

Let's see an example, assume we have these two tables:

orders

order_id	customer_id	amount
1	100	50.00
2	101	75.00
3	102	30.00
4	NULL	20.00

customers

customer_id	name
100	Bob
101	Alice
102	Martha
NULL	John

Ignore the fact that a well formed customers table would need to have the id always specified and ideally as a constraint. This is just an example.

Now, let’s say we want to find orders where the customer_id is not present in the customers table.

Using NOT EXISTS we would write something like:

SELECT *
FROM orders o
WHERE NOT EXISTS (SELECT 1 FROM customers c WHERE c.customer_id = o.customer_id);

The output, would probably be what we expect:

order_id	customer_id	amount
4	NULL	20.00

Using NOT IN we would write something like:

SELECT *
FROM orders
WHERE customer_id NOT IN (SELECT customer_id FROM customers);

The output here will be empty!

When we use NOT IN, SQL checks each value in the orders table against the list of values returned by the subquery. If any value in the subquery result is NULL, the entire NOT IN comparison will result in NULL for each row in the orders table. This is because any comparison with NULL yields NULL and NOT IN needs all comparisons to be TRUE for a row to be included in the result.

This is not a problem with NOT EXISTS, because the NOT EXISTS clause checks for the non-existence of rows that meet the criteria specified in the subquery. It does not perform direct comparisons with NULL in the same way NOT IN does. Instead, it simply checks if there are any rows that match the condition. If no such rows exist, the condition is true.

NOT IN and NOT EXISTS don't always produce the same result