Navicat Blog

Some Top SQL Query Mistakes - Part 1 Apr 11, 2022 by Robert Gravelle

NOT IN Versus NOT EXISTS

There's a term the is commonly thrown around in programming circles called "anti-patterns". It refers to a response to a recurring problem that is not only ineffective, but also risks being highly counterproductive. The term was originally coined in 1995 by computer programmer Andrew Koenig, in his book Design Patterns, as the antithesis of design patterns that are considered to be both reliable and effective.

Although SQL is not really a programmatic language, it turns out that it's equally susceptible to anti-patterns, especially when the query in question is fairly complex. Sometimes mistakes are hard to spot, and do not reveal themselves until the query is thrust into the pressure cooker of the production environment.

With the goal of catching SQL mistakes earlier, the next several blogs will be devoted to highlighting some of the most common culprits. We'll be using MySQL to execute today's examples, but the concepts are equally valid in any flavor of SQL.

NOT IN Versus NOT EXISTS

One common type of SELECT query retrieves data that is not included in a list of values. To illustrate, here are two very simple tables created in Navicat for MySQL 16. The first table contains colors:

colors (24K)

The second table contains products:

products (15K)

What we would like to do is select all of the colors that have not yet been associated to any products. In other words, we need to construct a query that returns only those colors for which there is no product with that color. One might be tempted to employ the NOT IN predicate to fetch the records in question.

We would expect the following query to return two rows (for "black" and "green") when, in fact, an empty result set is returned:

not_in (27K)

The problem? The present of a NULL value in the color column on the products table, which are translated by the NOT IN predicate to:

color NOT IN (Red, Blue, NULL)

OR

NOT(color=Red OR color=Blue OR color=NULL)

The expression "color=NULL" evaluates to UNKNOWN and, what many database developers overlook, is that NOT UNKNOWN also evaluates to UNKNOWN! As a result, all rows are filtered out and the query returns zero rows.

This issue can also surface if requirements change, and a non-nullable column is updated to allow NULLs. Hence, even if a column disallows NULLs in the initial design, you should make sure your queries will continue to work correctly with NULLs, should things ever change.

The simplest solution is to use the EXISTS predicate instead of IN:

not_exists (39K)

Problem solved!

So why does this work? Whereas the IN keyword will compare all values in the corresponding subquery column, EXISTS evaluates true or false. Consequently, using the IN operator, the SQL engine will scan all records fetched from the inner query. On the other hand, if we are using EXISTS, the SQL engine will stop the scanning process as soon as it found a match.

Conclusion

In this first installment on Top SQL Query Mistakes, we learned about how anti-patterns can occur in SELECT queries, starting with the erroneous use of the NOT IN predicate.

If you'd like to give Navicat 16 for MySQL a test drive, you can download a 14 day trial here.

Navicat Blogs
Feed Entries
Blog Archives
Share